Mapping Botnets

A botnet is collection of compromised computers controlled by an individual or a group of malicious actors. You may have heard the term “owned” thrown around online or in gaming communities. The origin of the word comes from taking control of another computer. If I get you to run a Trojan that places a backdoor on your computer, leverage this backdoor to escalate my privileges on a system, then deescalate your privileges, you no longer own your computer. Your computer is “owned” and administered by someone else, likely remotely. A botnet is just a collection of these compromised machines which work in tandem, commanding and controlling each other, pushing updates, mining cryptocurrency, running portscans, launching DDOS attacks, proxying an attacker’s connection, and hosting payloads for attacks.

The larger the botnet the better but other features like connection speed, hardware, and topology on a network ultimately define a computer’s usefulness in a botnet. Don’t get me wrong, a hacker is not going to picky about the computers that are added. There is a job for every piece of hardware.

The larger the botnet the more attention it draws. The larger botnets can also be leased out on black markets for attacks and other malicious activity. There is a sweet spot where a botnet is powerful enough to be leveraged but low key enough to fly under the radar. Flying under the radar is something Mirai, a botnet from late 2016 did relatively well. Mirai took control of IoT (Internet of Things) devices with weak passwords. These devices included TV boxes, closed circuit TV equipment, home thermostats and other “things”. These devices are set up to run without administration, so once they were owned by an attacker, they were likely to remain owned and under the radar. vDos, a DDOS botnet for hire, holds the record for the largest DDOS botnet. The owners were arrested in Israel in August 2017.

I’ve been dealing with what I suspect to be a botnet on my home network. I got lucky the other day after installing a home firewall. After blocking a suspect connection I was swarmed with thousands of attempted sessions from all over the world. My working theory is that this is some botnet using P2P networking for command and control infrastructure and it was trying to see where the computer it has lost contact with went. I was able to export this 5 minute period of connections to a csv file and plot it on ArcMap. The following map is what was produced.

 

botnet activity 1.png

I’m a firm believer that every problem should be approached, at least hypothetically, through a geographic perspective. By putting this data on a map, an additional perspective is provided that can be analyzed. Looking at this map for the first time was also surprisingly emotional for me. I have been chasing this ghost through the wire since December 2016 and, through the geographic perspective, was finally able to see and size up the possible culprit.

I had to filter out the United States from the dataset because I was running an upload to an Amazon web service which would have added inconsequential coordinates to the United States, skewing the data. This data would later be parsed and included.

Immediately I was drawn to the huge cluster in Europe. If this is truly the botnet I’ve been looking for, Europe would be a good place to start looking. There were 7000 sessions used in the dataset. I’m grateful that Untangle firewall includes longitude and latitude coordinates in the data it produces. This made the data migration easy and painless.

I got lucky again two weeks later when I got another swarm of sessions from what I assume to be the same botnet. This was, again, after I terminated a suspect connection, suggesting that this experiment is repeatable which would provide an avenue for reliable data collection. I then took to the new ArcGIS Pro 2.0 to plot some more maps. With 2 sets of data, analysis could be taken to the next level through comparison.

 

2017-08-15.png
Full Resolution

 

First I have to say that this new ArcGIS interface is beautiful. It’s reminiscent of Microsoft Office due to the ribbon toolbar layout. I found the adjustment period quick and the capability expanded compared to earlier versions and standalone ArcMap. After using ArcMap I was surprised to see how responsive and non-frustrating this was to use. I ran ArcMap on a bleeding-edge system with 16GB of RAM and saw substantial slowdown. I was able to run this suite on an old OptiPlex system with 4GB of RAM with no noticeable slowdown. It is truly a pleasure to work with.

 

botnet_activity_8_13_17_small.png
Full Resolution

 

Using the second set of data I was able to produce the map above. I went ahead and created a huge resolution image so I could see the finer geographic details of the entities involved. This dataset includes the United States because I wasn’t running any background processes at the time the data was collected. I can safely assume this map represents only suspected botnet connections. I was glad to see a similar distribution, with Europe continuing to produce the majority of the connections. The real fun begins when we combine these two datasets but first let’s take a moment to look over the patterns in the above map.

Just by looking at-a-glance we can see there is a disproportionate amount of connections originating in Europe. There seem to be 4 discernable areas of concentration in Europe: The United Kingdom, the Benelux region, The Balkans, and Moscow. Looking at the United States we see a majority of connections coming from the Northeast United States, and across the Saint Lawrence in Canada. Florida is represented, as is the Bay area and Los Angeles. Vancouver, Canada seems to have a strong representation. Connections in South America are concentrated along the mouth of Rio Plata, where the major population centers are, and the coast of Brazil. A lot of Southern American tech operations happen in this region. If there were compromised computers on the continent, this would be an appropriate area to find them.

China seems to be under represented. The last network security maps I made were overwhelmingly populated by Chinese IPs. This map seems to feature only Beijing of the three major coastal cities. The Korean peninsula seems to have a strong representation. Central and Southern Asia are not represented strong except for India and, like China, it would seem to be underrepresented considering the population and amount of internet connected devices in the country.

It turns out Singapore is a large player in the network. However, it’s not inherently apparent given Singapore’s small footprint. These point maps don’t properly represent the volume of connections for some areas where many connections originate from a small area. By using heatmaps we can combine the spatial and volume elements in an interesting way.

Next we’ll look at the combination of these two point databases.

 

botnet_activity_both_days_7_31_17_top_small.png

 

I included the lower resolution map above so the points could be easily seen. A level of detail is lost but it allows it to be easily embedded in resolution sensitive media like this webpage.

The idea here was that, since a majority of the points overlap, a comparison of changes could be made between this two week period. I parsed the United States data from the first dataset so it could be included and compared. By focusing on what dataset is layered on top, we can infer which computers were removed from the botnet, either through being cleaned up or going offline, and computers that were added to the botnet in this two week period. I’m operating under the assumption that this is a P2P botnet, so any membership queries are being performed by almost every entity in the system. I’m also assuming this data represents the entirety of the botnet.

When we put the original dataset created on 7-31-17 above the layer containing the activity on 8-13-17 we’re presented with an opportunity for temporal as well as spatial analysis.

 

botnet_activity_both_days_7_31_17_top.png
Full Resolution

 

By putting the 7-31-17 dataset on top, we’re presented with a temporal perspective in addition to the geographic perspective. Visible purple dots are not included in the first dataset or else they would be overlapped by a green dot. These visible purple dots indicate machines that have presumably been added to the botnet. With more datasets it would be able to track the growth of these networks.

botnet_activity_both_days_8_13_17_top_small.png

Above is a reprojection of the data with the 8-13-17 dataset on the top layer. The temporal perspective has shifted when we change up the ordering. Visible green dots from the first set may indicate machines that are no longer part of the botnet when the second dataset was created. Machines going offline from a botnet is plausible but it’s also possible that the machines were just offline or unable to establish a session. It’s entirely possible that even with a P2P networking scheme, the entire botnet does not ping every system that appears to go offline with every machine on the network. This would seem like a serious security error by the botnet operator. It’s also entirely possible they’re not trying to cover their tracks and employing a “spray and pray” tactic, running the botnet at full capability and not worrying about the consequences. A full resolution image is linked in the caption.

By looking at both sets under the assumption that the entire botnet revealed itself, we can see if the botnet is growing or shrinking. If their are more visible purple dots on the map where green dots are layered on top compared to the visible green dots on the map where purple dots are layered on top, the botnet is growing. If the opposite is true, the botnet is shrinking.

botnet_activity_both_days_8_13_17_top.png
Full Resolution

 

The most interesting features of these comparison maps I’ve found is the predilection for certain countries and regions. Looking at the rotation of computers, we see the Northeast United States and Florida as hotspots for this activity. The reason is not clear, but this serves as a starting point for additional research. It’s important to remember that data reflects population. Major cities all show signs of activity. Major activity concentrations can be empirically defined by normalizing populations. The activity seems to proliferate from areas where activity is already established. Perhaps there is some kind of localized worm activity used for propagation. Let’s take a look at the real elephant in the room; Europe.

 

botnet_activity_both_days_8_13_17_top_europe extent_marked.png

 

The majority of machines seem to be in Europe. There are certain regions that seem to have concentrated activity. They are marked in red above. From left to right; The UK, the Netherlands, and Hungary. There’s also concentrations in Switzerland, Northern Italy, Romania, and Bulgaria.

The main three concentrations pose interesting questions. Why is there so much activity in UK? The Netherlands concentration can be explained by the number of commercial datacenters and VPS operations. A lot of for-rent services operate out of the Netherlands making it a regular on IP address queries.¬† Hungary is interesting and a befuddling find. There is no dominating information systems industry in Hungary like in the Netherlands What do all these countries have in common? Why are the concentrations so specific to borders? Answering these questions will be critical in solving the mystery. Next we’ll try our hand at some spatial analysis.

 

botnet_activity_7_31_17_heatmap_std_deviation_small.png

 

A kernel density map, also known as a heatmap, shows the volume of data in geographic space. This is an appropriate spatial analysis to run alongside the point map because it reveals the volume of connections that may be buried under one point. If one point initiates 100 sessions, it’s still represented as one point. These heatmaps reveal spatial perspectives that the point maps cannot.

 

botnet_activity_7_31_17_heatmap_std_deviation_large.png
Full Resolution

 

Immediately we see some interesting volumes that were hidden in the point map. Moscow lights up in this representation, indicative that many connections came from a small geographic area. By using standard deviation to divide the data, the biggest players show up in red. The circular pattern indicates that many connections come from a small area. There is big representation in Toronto, Canada that wasn’t completely apparent on the other maps. Our focus area of UK and the Netherlands are represented. Peripheral areas like Northern France and Western Germany light up on this map, suggesting concentrated activity, perhaps in the large metro areas. Seoul Korea lights up, suggesting large volumes of connections. There is notable activity in Tokyo. Like I was saying before, Singapore lights up in this map. Singapore is a small city-state that exists on the tip of peninsular Malaysia on the Malacca Strait. Connections here would be difficult to distinguish considering the small square mileage of the city. This raises a peculiar question. Why is this botnet so particular about boundaries? Singapore is crawling with connections but neighboring Malaysia, possibly sharing some of the same internet infrastructure, is quiet on the heatmap.

 

botnet_activity_8_13_17_heatmap_std_deviation_small.png

 

As with the other maps, I created a small and a large resolution version. For these kernel density maps, there are several options to represent the data. I chose to use standard deviation and geometric delineations of the data. Each provide a unique perspective and every additional perspective might reveal something new. The geometric map “smooths” the distribution of data, showing areas that might not have been significant enough to appear in the standard deviation representation.

 

botnet_activity_8_13_17_heatmap_std_deviation_large.png
Full Resolution

 

In the future it might be beneficial to select by country borders and make a chloropleth map to show the number of sessions per country. This would reveal countries with multiple sessions from the same coordinates.

It might also be beneficial to parse the data further and add appropriate symbology and additional maps for data that was present in both sets as well as which points were unique to one set. This set of 3 maps would present the data in an additional spatial context, allowing another perspective for analysis.

As always, I will be on the hunt for additional data. The next step for this project is finding out the condition for reproducing this swarm of connections. If it does turn out to be easily reproducible, the real fun begins. Additional data would be collected at regular intervals and mapped accordingly. With more data comes more realization. Automating the data collection and mapping would be the final step. At some point a geographic perspective would be so apparent, the next steps will become clear.

Until then I’m still on the warpath. Never has research been so personal to me.

Imgur Album

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s