This post is the seventh of a multi-part series called Advanced Methods to Detect Advanced Cyber Attacks. The series explores advanced investigative analytic searches that analyze network traffic and enable incident responders and security analysts to think and react as fast as the attackers targeting their organization’s network.
Today we’ll review a network traffic search technique that takes full advantage of the historic visibility and rapid querying capability of an advanced network traffic analysis system. This analytic is most relevant for large enterprises and law enforcement agencies that routinely search through large and potentially disparate volumes of network traffic during investigations, but can be useful in other smaller scenarios as well.
The focus for today is the Relay Finder analytic and as you have likely already deduced, it finds relays. Relays are network nodes that are used by attackers to establish lines of communication between nodes in a breached network and their external malicious operator-controlled infrastructure. Any compromised network node can turn into a relay – a server, a laptop, a printer, an MRI machine – anything that connects to the network and has a path out to the internet.
The analytic finds compromised hosts acting as communication relays by using a simple form of graph analysis and the principle of “guilt by association”. Here’s how it works:
- The analytic requires as input a known compromised host that was used or is being used as a relay, and we’ll call this the “known relay”. This forms the root node in our graph, and our search expands out from there.
- The analytic first looks up the list of distinct IP addresses with which that the known relay communicates. We’ll call these hosts the “potential victims”.
- The analytic next looks up all the distinct IP addresses that the potential victims talk to, which are the next set of nodes in the graph or tree. We’ll call these hosts the “potential next relays”.
- Finally the analytic returns back the potential next relays that interact with at least 80% (configurable) of the potential victims. These are the potential next relays of interest because they are interacting with a large segment of the potential victims that communicated with the known compromised relay in #1. So therefore these potential next relays are guilty by association and should be focus areas as the investigation continues.
What is the benefit of going to the trouble of analyzing these machine-to-machine relationships and finding potential relays? It enables network analysts and incident responders to retrace the path that an attacker, or an attacker’s malware, took within a complex network. By connecting dots between nodes based on communication patterns, an advanced network traffic analytics system can help security staff members and investigators find communication pathways that might otherwise go undiscovered because of the sheer volume and complexity of traffic on large networks.
As with some of the previous analytics in this series, false positives are possible. Common external websites that are identified as “potential relays” include Google, Facebook, and LinkedIn because so many end users browse those sites. They will be connected to many nodes in the graph, and at first the algorithm won’t know if they are innocent or malicious, so those types of high traffic sites need to be excluded from this type of search.
That’s how to use traffic analysis and simple graph concepts to identify compromised or malicious nodes by association. It’s a simple and effective technique, and it requires broad historical visibility into what’s actually happening on a network, not just what the audit logs say is happening. Stay tuned for more powerful searches that can reveal malicious behavior as we move through this entire advanced analytics series.
Read the next post in this series: Suspicious Admin Toolkits.