Using Machine Learning to Detect Command Line Anomalies

NOTE: This post discusses patent-pending technologies.

Cybersecurity is often a game of cat and mouse — attackers are constantly trying to outsmart defenders. Attackers are keen to try and bypass security mechanisms, working to evade detection, and looking for the latest vulnerabilities to possibly leverage.

We face the same issues and concerns as most organizations. We constantly ask ourselves these questions: How do we help ensure that all assets are protected? How do we help ensure that our employees are as secure as possible from outside threats? How can we help mitigate future emerging threats?

Monitoring and reporting are important factors to success in any incident response program. However, we cannot simply rely upon counts of visits/visitors, endpoints, services, logins/authentications, etc. Malicious activities often come in subtle forms designed to evade technology and skilled professionals. Add this to the large number of monitored events and you have a potential situation in which white-noise could make you deaf to more genuine threats. Even more, today’s monitoring and reporting tools often rely on a static set of rules that trigger alerts when specific conditions are matched. Those rules are based on threat intelligence from multiple sources but also on the experience, ingenuity and maturity of the people behind the tools.

However, returning to the cat and mouse game — attackers will always try to find the next unconventional attack that could bypass security systems and mindset. In this case, how do we better protect ourselves from the unknown? This is where machine learning techniques can help. Machine learning can be applied to assist with a common attack vector — changes and insertions at the command line. Command-line interfaces are frequently used by system administrators, users and applications. Many software products launch console scripts to perform certain tasks such as checking system details or resources (‘net’, ‘wmic’), managing firewall rules, registering services, and so on. However, not all script patterns are common for all applications. Malware writers and more advanced attackers also like to leverage those native system capabilities.

Command lines are an interesting attack vector because they closely resemble human speech: (a) they make use of defined syntax and (b) they enforce “semantics” through ordering and dependencies between consecutive tokens and their role in the final outcome. Given this, depending on the tokens and their order, similar command lines can have different behaviors. For example:

tail -f /usr/local/apache/logs/access_log | grep "11.12.13.14" 
tail -f /usr/local/apache/logs/error_log | grep -v "11.12.13.14"

Since we are dealing with “language” the first thing that comes to mind is employing some natural language processing techniques (NLP) to this use-case. Unfortunately, that will not work. While the basic principles are sound (document clustering/anomaly detection, etc.), it is fairly far-fetched to expect NLP techniques to work on such statements out-of-the-box. Command-lines resemble natural language, but they are definitely not natural language (at least not for humans). The syntax is way more constrained than that of natural language and there are not many open classes.

Clustering based on TF-IDF (term frequency-inverse document frequency) would also not be optimal. The issue is that the “bag of words” neglects the semantics (role) of the command line and its arguments. Consider “rm” and “cp” commands. If you use TF-IDF and measure the distance between each pair of commands, you will end up having really similar scores. However, there is a clear difference between `rm` and `cp`. This still holds for cases where replacing just one parameter or token would generate a totally different behavior.

Unlike the naive implementation of TF-IDF and clustering, BLEU (Bilingual Evaluation Understudy) is able to capture and model the dependencies between adjacent tokens in a command line. BLEU is a measure used in machine translation to establish how well an automated translation system performs against gold standard data. It works by counting occurrences of not just isolated tokens (“bag of words”) but also of bi-grams, tri-grams, four-grams, and so on. The similarity score between two sequences is a weighted and smoothed interpolation between multiple fractions of correctly “translated” n-grams. This score better reflects the syntax restrictions found in command lines.

Next, instead of treating each command line separately, we cluster them together using a different clustering technique than K-means. We prefer threshold-based clustering since it has a complexity of O(n). This enables us to process large amounts of data with less computational power and resources.

What can we do with this?Two things: (a) we can keep track of past clusters and manually curate newly created clusters to check for malicious activity and (b) we can do outlier detection on composite vectors which are designed to help capture dependencies between the user, the processes, and the command-line clusters.

Outlier detection can be handled by standard “outliers” algorithms, like LOF (Local Outlier Factor), by using an “autoencoder” approach. An autoencoder neural network is an unsupervised learning algorithm. The network is trained to reconstruct its inputs which forces the hidden layer to try to learn good representations of the inputs. We use the reconstruction error of the autoencoder to pinpoint events that have a low statistical distribution in our data set/stream. In our implementation we train the autoencoder with information regarding the command line cluster, the parent name, and the process name that generated the event itself.

There are additional details about this project which, for the sake of clarity and simplicity, we have omitted in this post. This process was submitted for patent review under P8075-US and is currently being evaluated. You can also learn more about this project in our recent webcast with the Cloud Security Alliance. We hope this process will help us better detect possible sophisticated threats before they become a more serious issue.

Footer