Numerous parallels exist between the strategic board game ‘Stratego’ and the operation of a Cyber Operation Center. In ‘Stratego,’ you engage in one-on-one gameplay, but within the Cyber Security Center, you’re navigating multiple games simultaneously. Your objective remains steadfast: safeguard the flag for as long as possible, while adversaries relentlessly pursue its conquest. The distinguishing factor lies in the dynamics of these engagements. While you’re simultaneously managing multiple fronts, your adversaries can opt to focus on one game at a time. At first glance, this might appear unequal, almost like an unfair contest. However, it shouldn’t be.

You know your assets, your adversary doesn’t

The oft-repeated adage, “You can only defend what you know”, underscores a fundamental truth in the realm of cybersecurity. It’s not just about recognizing the technology landscape you’re tasked with protecting; it’s about comprehending the intricate web of connections and interdependencies among the various components within that landscape. This goes beyond the mere registration of assets in a central CMDB; that’s merely the first step. The richness and depth of data that you integrate into and continuously maintain in the CMDB are what empower the Cyber Security Center to craft a robust defense strategy, one that aims to secure the flag for as long as possible.

However, the apprehension about maintaining a CMDB is a sentiment I understand all too well. For most organizations, it’s a genuine challenge, bordering on a nightmare. Without the right approach, keeping a CMDB accurate and up-to-date can be a time-consuming ordeal. But it’s crucial to note that an accurate CMDB is akin to a puzzle waiting to be solved, especially if you approach it with the mindset of a data scientist.

The data scientist possesses the tools and skills to tackle the puzzle efficiently, but here’s the catch: they can’t unravel the entire enigma on their own. It’s a collaborative endeavor that requires cooperation and data inputs from various stakeholders across the organization, each contributing their piece to complete the intricate cybersecurity puzzle.


To establish a robust foundation for asset registration and enhance your cybersecurity efforts, it’s crucial to connect your discovery and vulnerability data to a central data lake. This process not only facilitates a more comprehensive understanding of your organization’s digital landscape but also streamlines data management for improved security.

If you’ve effectively implemented network segmentation, the outcomes of your discovery scans serve as the bedrock for constructing an asset table. Network segmentation aids in categorizing assets within distinct boundaries, providing clarity and control over your network’s various segments.

However, it doesn’t stop there. By integrating data from diverse sources into this same asset table, you pave the way for a comprehensive asset registration system. These additional sources might include configuration management databases, inventory databases, or even data from endpoint protection systems. As you weave this web of data together, a more holistic and accurate picture of your assets emerges.

This consolidated asset registration not only simplifies asset management but also forms the basis for proactive vulnerability assessment and remediation. It allows you to identify potential weak points, prioritize security measures, and respond swiftly to emerging threats. In essence, the central data lake becomes the nexus for knowledge, enabling you to better safeguard your digital infrastructure.

The data obtained from discovery scanning serves as a valuable foundation, yet it does have its limitations. Notably, it often leaves critical information gaps. For instance, it doesn’t automatically provide insights into the specific operating systems in use on the assets, nor does it shed light on the supplementary components or software installed on these assets. This is where vulnerability scanning data plays a pivotal role, stepping in as the cavalry arriving to bolster your arsenal of information.

Authenticated vulnerability data, in particular, emerges as a potent tool for enhancing your asset data. By wielding this data, you can gain a deeper understanding of the assets in your network environment. It not only reveals the intricacies of the operating systems in play but also unveils the additional components, configurations, and potential weak points present on these assets. In essence, authenticated vulnerability data acts as a crucial bridge, bridging the gaps left by discovery scanning and empowering you with comprehensive insights that are indispensable for proactive security measures and decision-making.

However, it’s important to recognize that depending solely on vulnerability scanning data comes with a significant limitation. The drawback lies in the nature of most vulnerability scanning solutions. These tools typically report a component only if it indeed contains a vulnerability. This approach works seamlessly for assets that are centrally managed because you can conveniently augment the asset database by incorporating the output from the central management tool suite into your data lake.

This means that you can effectively enrich your asset database with a comprehensive set of data that includes both the results of vulnerability scans and the centralized management tool suite’s output. By doing so, you gain a more holistic view of your assets and their security status, allowing for more robust and informed decision-making within your cybersecurity efforts. This integration of data sources not only enhances your asset database but also provides a more complete and accurate understanding of your organization’s security posture.

Integrating diverse data sources, such as firewall configurations, anti-malware solutions, and even hypervisor administration software, into a single data lake yields the potential to uncover intricate relationships between a multitude of assets within your network infrastructure. This data consolidation process not only facilitates comprehensive data analysis but also fosters a deeper understanding of how these assets interact and influence one another.

For instance, by combining firewall configuration data, you can gain insights into network traffic patterns, identifying potential vulnerabilities and assessing the effectiveness of security measures. The integration of anti-malware solutions can reveal patterns of malicious activity, helping you proactively detect and respond to threats. Additionally, incorporating hypervisor administration data enables you to examine virtualization environments, identifying resource allocation issues or potential security gaps.

By coalescing these diverse data sources within a data lake, you create a holistic view of your network’s health and security. This comprehensive perspective empowers you to make informed decisions, optimize resource allocation, and enhance cybersecurity by recognizing previously unseen relationships and patterns among assets. It is a significant step towards proactive and efficient network management and security enhancement.

A first iteration of visualizing the digital estate.
A first iteration of visualizing the digital estate.

This concept may initially give the impression that establishing an effective operation requires a sizable organization equipped with a vast array of advanced tools. However, the reality is far more accessible. You can begin on a smaller scale and progressively expand your capabilities. While it’s undeniable that a mature organization and advanced tools can be advantageous, think of them as the finishing touches on a masterpiece. This principle extends to the role of the data scientist who plays a pivotal role in piecing together the puzzle.

Naturally, data scientists might request the most advanced tools to facilitate their work, and rightly so. However, when it comes to the essential tasks of data analysis and the creation of the asset and relationship database, all you truly need are languages like R or Python. These versatile and widely available programming languages offer the fundamental building blocks for data manipulation, analysis, and database management, making them the foundational tools for any organization embarking on the journey of cybersecurity data analysis and asset protection. Start small, harness the power of these foundational tools, and then consider adding more sophisticated tools as your operations grow and mature.

A famous quote by Sun Tzu when he wrote The Art of War.
A famous quote by Sun Tzu when he wrote The Art of War.

In the world of cybersecurity, every move an adversary makes leaves behind a trail of digital breadcrumbs. These breadcrumbs consist of data points and traces scattered across various assets and systems. To uncover and analyze these breadcrumbs, a team of skilled data scientists is indispensable. Their primary task is to harness the extensive data collected from each asset and consolidate it into a central data lake, a repository that houses a wealth of information.

The role of a data scientist in the cybersecurity realm is akin to that of a detective. With the right set of queries and analytical tools, they delve deep into the central data lake to hunt down these elusive digital clues. This meticulous process is what’s known as Cyber Threat Hunting.

Cyber Threat Hunting is not a passive approach to security but an active and dynamic endeavor. It involves proactively seeking out signs of malicious activity or potential threats that might otherwise go unnoticed by conventional security measures. By scrutinizing the breadcrumbs left behind by adversaries, data scientists play a crucial role in identifying and mitigating cybersecurity threats, ultimately safeguarding digital landscapes from potential harm.

Navigating the world of tools and products for data scientists in the quest to tackle complex problems is akin to embarking on a journey with a plethora of choices at your disposal. On one hand, there are indeed a multitude of high-quality, efficient solutions that can significantly aid data scientists in their endeavors. These tools range from robust machine learning libraries to cutting-edge data visualization platforms, data storage and processing frameworks, and everything in between. However, on the flip side, there is an abundance of subpar or ill-suited products that may lead data scientists astray.

The crucial point to emphasize is that in the realm of data science, there is no one-size-fits-all, golden solution that universally addresses all challenges. The absence of a panacea stems from the inherent variability in organizational needs, environments, and the nature of the data being processed. Every organization operates within its unique context, possessing distinct objectives, constraints, and data sources. Consequently, what might prove to be a game-changer for one data science team could be entirely incongruent for another.

The diverse landscape of data science tools and products necessitates careful consideration, evaluation, and customization. Data scientists must meticulously assess their specific requirements, aligning them with the capabilities and adaptability of available solutions. This process underscores the dynamic and ever-evolving nature of data science, where versatility and adaptability are key virtues. The search for the right tools and products becomes, in essence, a bespoke journey, tailored to the individual needs and intricacies of each data science project and organization.

As you delve deeper into the realm of data science, your ability to discern the subtle traces and tactics employed by adversaries should significantly improve. It’s a journey marked by constant learning and adaptation, one that requires an ongoing commitment. You’ll find yourself navigating a complex, ever-evolving landscape where the pursuit of identifying adversaries is a continuous and never-ending project.

Admittedly, the task can be time-consuming and demanding, but it’s a vital endeavor. The consequences of failing to identify and counteract potential threats are severe. In today’s interconnected world, where cyber threats are constantly evolving, a single breach can have devastating consequences. Your organization’s reputation and trust could be tarnished irreparably, with your company’s name etched into the hall of fame of breached companies — a list no one aspires to join. Therefore, the relentless pursuit of improving your data science and cybersecurity capabilities is paramount to safeguarding your organization’s future.

Image © IT Governance
Image © IT Governance

To know your Enemy, you must become your Enemy

In the timeless wisdom of Sun Tzu, as found in his seminal work, ‘The Art of War’, he astutely observes, “Your enemy is everywhere, and it does not limit its attacks to a singular front”. This profound insight holds a mirror to the modern arena of cybersecurity. In this digital age, threats can manifest from countless sources and directions. Therefore, it becomes imperative to establish a robust system of data feeds connected to the Cyber Security Center to stay vigilant.

However, it’s crucial to acknowledge that the sheer quantity of data isn’t necessarily a panacea. Much like any resource, data feeds require effective governance and meticulous maintenance to prevent what can be aptly described as a ‘data explosion’. Without proper control and management, these feeds can easily inundate the Cyber Security Center, overwhelming its capacity to discern meaningful insights and act on them.

To transform this deluge of data into actionable intelligence, the implementation of data science techniques is indispensable. Applying data science to incoming Cyber Threat Intelligence allows the center to filter through the noise, identify patterns, and extract valuable information. It’s through the lens of data science that these seemingly disparate data points coalesce into a comprehensive and coherent picture, enabling the Cyber Security Center to respond proactively to threats and defend against the elusive enemy lurking in the digital shadows.

A fundamental principle in the realm of cybersecurity is that the greater your knowledge and comprehension of your adversary, the more effectively you can devise a robust defensive strategy. This concept extends to the understanding of both the adversary’s current tactics and their underlying motivations.

Just as technology continually advances and evolves, so too does your adversary’s approach. Cyber adversaries are not static entities; they are dynamic and adaptable. They continually hone their techniques, discover new vulnerabilities, and explore novel attack vectors. In a constant game of cat and mouse, they may also completely transform their online personas, shedding their old profiles and adopting new ones.

To stay ahead in this ever-evolving cybersecurity landscape, organizations must not only be well-versed in their current adversaries’ methods but also anticipate the potential shifts in tactics and motives. This anticipation allows for the proactive development of countermeasures and security protocols. Cybersecurity professionals must remain vigilant, learning from each encounter, and applying that knowledge to fortify their defenses against the evolving and dynamic nature of the digital threat landscape.

Know yourself and you will win all battles

In the realm of cybersecurity, much like in traditional warfare, the ever-elusive adversary can lurk in unexpected corners. They might even be concealed within the very fabric of your own company or estate. This brings us to a fundamental question: “Do you genuinely comprehend the intricacies of the company and the environment you are entrusted to protect?” .While it might appear to be a straightforward query, the reality paints a different picture. Both Cyber Threat Hunting and Cyber Threat Intelligence can only be truly effective if they are underpinned by a profound understanding of the company and its operating environment.

To begin, one must delve into the core essence of the business itself. What precisely is the company’s primary function? What inherent risks have been identified as part of its business operations? While the corporate risk register provides a valuable source of intelligence, fostering strong, informal relationships with the business units is equally crucial. This interconnectedness is paramount in times of cyber warfare, where mutual reliance becomes evident.

Consider a scenario in which a team member from the Cyber Security Center urgently recommends, “Shut down that server, as we suspect it has been compromised”. However, the implications of shutting down that server extend beyond the immediate act. Without a comprehensive understanding of the server’s role and its interdependencies, the consequences could potentially be far worse than the initial compromise itself. This underscores the importance of appreciating the function of each asset within the organization, a facet not easily gleaned solely from data analysis. It’s a realm where the business units hold the key, offering insights that analytics alone can’t provide. In the face of evolving cyber threats, such an understanding is indispensable for making informed decisions and crafting a resilient defense strategy.

In the dynamic landscape of business, constant change is the norm. This ever-evolving business environment necessitates adaptability, not only for the company itself but also for those who support it, including data scientists. As a data scientist, your effectiveness is intimately tied to your ability to comprehend the company’s rhythm, which encapsulates its operational cadence and trends.

One straightforward way to gauge this rhythm is by examining network traffic volume. In many settings, this can provide immediate insights into the company’s working hours. However, such an observation only scratches the surface of what’s possible. The intricacies of business processes that occur less frequently are far more challenging to discern within the data. Nevertheless, it is precisely these nuanced patterns that you, as a data scientist, must delve into and understand thoroughly if you aim to distinguish between beneficial and detrimental trends.

For data scientists, it’s not just about recognizing the obvious; it’s about delving into the subtleties of data patterns, as it is within these nuances that the key to separating good and bad patterns often resides. This deeper understanding empowers you to make informed decisions, spot anomalies, and contribute significantly to the company’s success by addressing emerging challenges and opportunities in a dynamic business landscape.


Recommendation

Approaching cybersecurity through the lens of data science is a progressive and insightful strategy. By treating it as a data science problem, organizations can harness the power of data analytics to fortify their defense mechanisms. To execute this approach effectively, it’s imperative to assemble a proficient team of data scientists, experts who specialize in extracting valuable insights from data.

The reason for this shift is clear — many of today’s security challenges are far too intricate to be effectively resolved through conventional methods like simple data queries. This complexity renders traditional technologies like Security Information and Event Management (SIEM) somewhat obsolete in the face of the evolving threat landscape.

In the context of cybersecurity, data science offers several advantages. Data scientists can develop and deploy advanced machine learning algorithms to detect anomalous patterns, identify emerging threats, and mitigate vulnerabilities. They can analyze large datasets to gain a comprehensive understanding of the organization’s digital environment, effectively spotting potential risks and areas that need reinforcement.

Moreover, data science also empowers organizations to implement proactive security measures. Instead of reacting to incidents after they occur, they can predict and prevent security breaches through predictive modeling and continuous monitoring. This shift from a reactive to a proactive stance is essential in today’s dynamic cybersecurity landscape.


Taking a data science approach to cybersecurity can yield significant benefits. It involves leveraging data-driven techniques and analytics to tackle not only the primary objective of maintaining a secure environment but also addressing some critical challenges in the field.

  1. Solving Cybersecurity: When viewed through the lens of data science, cybersecurity becomes a quantifiable problem. It allows for the systematic analysis of vast volumes of data, which includes network traffic, system logs, and user behavior. By applying machine learning algorithms and statistical models, one can identify patterns, anomalies, and potential threats more efficiently. This analytical approach enables a proactive stance against adversaries, helping to thwart attacks before they can cause damage.
  2. Mitigating Alert Fatigue: Alert fatigue, a common issue in cybersecurity, can overwhelm security teams with a flood of notifications. Data science empowers organizations to fine-tune their alerting systems by distinguishing between genuine threats and false alarms. Through the analysis of historical data, machine learning can help in creating more accurate and context-aware alerts. This, in turn, reduces the fatigue of security analysts and ensures that they focus on genuine security incidents.
  3. Reducing Mean-Time-to-Contain: In the context of incident response, data science can significantly reduce the Mean-Time-to-Contain, which is the time taken to isolate and mitigate a security incident. By promptly detecting and classifying threats through automated systems, data-driven approaches enable security teams to respond more swiftly. This rapid response can limit the potential damage caused by an intrusion.
  4. Decreasing Mean-Time-to-Resolve: Data science aids in shortening the Mean-Time-to-Resolve, the time taken to completely remediate an incident. Automated analysis and response capabilities, based on historical data and real-time information, empower organizations to remediate vulnerabilities or breaches more efficiently. This efficiency translates into reduced downtime and decreased financial impact.

Treating cybersecurity as a data science problem empowers organizations to not only enhance their security posture but also streamline their security operations. It enables a proactive approach to threat detection, alleviates alert fatigue, and accelerates incident containment and resolution. Embracing data science in cybersecurity is a crucial step towards maintaining a secure and resilient digital environment.