Data scientists face a tricky task — taking raw data and making it meaningful for security operation teams. Here’s how to bridge the gap.
Today, CISOs and their teams are being asked lots of questions about risk by different types of stakeholders. Many of these questions require security professionals to analyze raw data from multiple sources, then communicate insight about impact exposure or priorities that’s meaningful to people who are not security pros. This goal has many challenges, such as understanding raw data and analyzing it to produce accurate information that’s helpful to a particular person’s decision making context. This is a skill in itself, and one that data scientists are uniquely placed to provide.
Security’s Analysis and Communication Challenge
CISOs often face questions from business or governance, risk management, and compliance stakeholders that operational tools can’t answer. This is either because tools are designed to meet a single operational security need rather than correlate data to answer a business risk question, or because tools are designed to “find bad” and detect when something goes wrong rather than enumerate risk.
As a result, someone in the security team eventually must extract raw data from a technology “Frankenstack,” put it into an analysis tool (spreadsheets by default), and then torture the data for answers to questions that inevitably get more complex over time. This is all before working out how best to communicate the output of data analysis to clearly answer “So what?” and “What now?”
How Data Science Can Help
Asking questions of raw data from one source, let alone multiple sources, isn’t easy.
First you have to understand the data that your security tools put out and any quirks that exist (such as timestamps and field names). In data science, data preparation is one of the most important stages of producing insight. It involves understanding what questions a data set can answer, the limits of the data set (that is, what information is missing or invalid), and looking at other data sets that can improve completeness of analysis where a single data set is not sufficient.
Then comes the job of selecting the most appropriate analysis method to answer the question at hand. Data scientists have a spectrum of methods they can use, which are suitable for extracting different information from data. Data science as a discipline will consider multiple factors to deliver the most meaningful information in the time available, all with appropriate caveats. For example, what is the current state of knowledge on this topic? What does the consumer of analysis want to know? The answers here will set the bar for the complexity of analysis required to learn something new. For example, if a data set hasn’t been analyzed before, simple stats can provide valuable insight quickly. Then there’s the inevitable trade-off between speed to results on one hand and precision on the other. Based on all this, the best analysis method could be simple counts or using a machine learning algorithm.
Finally comes communication. What view of the data does a decision maker need? For example, the view of vulnerability will be different for a CISO who needs insight for a strategic quarterly meeting when compared with a vulnerability manager who needs to prioritize what to fix at a tactical level. While these views will be built from the same raw data, the summary for each requires different caveats, because as you summarize, you inevitably exclude details.
Merging Data Science and Domain Expertise
Data scientists can’t, and shouldn’t, work in a silo away from the security team. Far more value is gained by combining their expertise in understanding, analyzing, and communicating data with the domain expertise of security professionals who understand the problem and the questions that need answering.
As more security departments start working with data scientists, here are three key factors to bear in mind:
- Time: Understanding multiple data sets, applying the most relevant analysis techniques to them, and delivering meaningful insights based on what question needs answering won’t happen overnight. It takes time.
- Domain expertise: There will be gaps in knowledge between your data scientist and your security team. Working in close partnership is critical. Just as you’re getting used to constraints the data scientist has discovered in the data you have, so too is your data scientist coming to grips with new and usually complex log formats in an effort to see what’s possible.
- The needs of your consumers: Communicating and visualizing insight from data requires different analysis for different roles. The CISO, control manager, IT operations, and C-suite all have different needs — and your data scientist must learn about these roles to strike the right balance between conclusions and caveats for each one.
Nik Whitfield is a noted computer scientist and cyber security technology entrepreneur. He founded Panaseer in 2014, a cybersecurity software company that gives businesses unparalleled visibility and insight into their cybersecurity weaknesses. Panaseer announced in November … View Full Bio