Data provides an illuminating light in the dark in the world of network security. When considering computer forensics assessments, the more data available, the better. The difference between being clueless and having a handle on a situation may depend on one critical datapoint that an administrator may or may not have. When data metrics that accompany malicious activity are missing, performing proper forensics of the situation becomes exponentially more difficult.
Operating a media server in the cloud has taught me a lot about the use and operation of internet facing devices. This is provided by a 3rd party who leases servers in a data center. This machine runs Lubuntu, a distribution of Linux. While I’m not in direct control of the network this server is operating on, I do have a lot of leeway in what data can be collected since it is “internet facing” meaning it connects directly to the WAN, allowing it to be be interacted with as if it was a standalone server.
If you’ve ever managed an internet facing service you’ll be immediately familiar with the amount of attacks targeted at your machine, seemingly out of the blue. These aren’t always manual attempts to gain access or disrupt services. These attempts are normally automated and persistent, meaning someone only has to designate a target and the botnets and other malicious actors, tasked with the heavy lifting, begin a persistent threat, an attack that is capable of operating on its own, persistently, without human interaction.
While learning to operate the server, I found myself face to face with a number of malicious attacks directed at my IP address seeking to brute force the root password in order to establish an SSH connection on the server. This would essentially be an attacker gaining complete control of the server and a strong password is the only thing sanding between the vicious world of the internet and the controlled environment of the server. This list provided a number of IP addresses which, like any good geographer, I was eager to put the data on a map to spatially analyze what part of the world these attacks were coming from to glean some information on who and why these actors were targeting my media server, an entity with little to no tangible value beyond the equipment itself.
This log of unauthorized access attempts can be found in many mainstream Linux distributions in the
folder and by using the following bash command in the terminal it is possible to count how many malicious attempts were made by which unique IP and rank them by count.
rep "Failed password for" /var/log/auth.log | grep -Po "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" \ | sort | uniq -c
this command will allow a system administrator to quickly see which
IP addresses failed to authenticate and how how many times they
failed to do so.
Parsing operations like this allow system administrators to quickly see which IP address failed to authenticate and how many times they failed to do so. This is part of the steps that turn raw data into actionable knowledge. By turning this raw data into interpretable data we actively transforming it’s interpretability and by result its usability.
This list is easily exported to an excel spreadsheet where the IPs can be georeferenced using other sources like abuseipdb.com. Using this service I was able to link each IP address and the number of the access attempts to the geographic location associated with it at the municipal, state, and national level.
After assigning each IP address a count and a geographic location I was ready to put the data on map. Looking over the excel spreadsheet showed some obvious trends out of the gate. China seems to be a majority of the access attempts. I decided to create 3 maps. The first would be based on the city the attack originated from and a surrounding, graduated symbology that expressed the number of attacks that originated from the data point. These would allow me to see at-a-glance where the majority of the attacks globally and spatially originated.
The first map was going to be tricky. Since the georeferecing built-in to ArcMap requires a subscription to the Arc Online service to use, I decided to parse my own data. I grouped all these entries and consolidated them by city. Then went through and manually entered the coordinates for each one. This is something I’d like to find an easier solution for in the future. When working with coordinates, it’s also important to use matching coordinate systems for all features in ArcMap to avoid geographic inaccuracies.
Full resolution – http://i.imgur.com/sY0c7IJ.jpg
Something I’d like to get better at is reconciling the graduated symbology between the editing frame and the data frame. Sometimes size inacuracies can throw off the visualization of the data. This is important to consider when working with graduated symbology, like in this case, where the larger symbols are limited to 100 pts.
The second map included just countries of origination, disregarding the cities metric. This choropleth map was quick to create, requiring just a few tweaks in the spreadsheet. This would provide a quick and concise visualization of the geographic national origins of these attacks in a visually interpretable format. This would be appropriate where just including cities in the metric would be too noisy for the reader.
The following is a graphical representation of the unauthorized access attempts on a media server hosting in the cloud with the IPs resolved to the country of origin. Of the roughly 53,000 access attempts between May 15 and May 17, over 50,000 originated from China.
To represent this chloropleth map I saved the data into a .csv file and imported it into ArcMap. Then came the georeferencing. This was easily done with a join operation with a basemap that lists all the countries. The blank map shapefile was added twice. One for the join and one for that background. During the join operation I removed all the countries I didn’t have a count for. Then I sent this layer to the top layer so all the colorless empty countries would appear behind the countries with data. This is one thing I continue to love and be fascinated with about ArcMap, the number of ways to accomplish a task. You could use a different methodology for every task and find a new approach each time.
Full resolution – http://i.imgur.com/XyqOexM.png
I decided the last map should be the states in China to better represent where attacks were coming from in this area of the world. The data was already assembled so I sorted the excel spreadsheet by the country column and created a new sheet with just the Chinese entries. I was able to refer to the GIS database at Harvard which I wrote about in an earlier article concerning the ChinaX MOOC they offered. This was reassuring considering my familiarity with the source. The excel spreadsheet was then consolidated and a quick join operation to the newly downloaded shapefile is all it took to display the data. A choropleth map would be appropriate for this presentation. I had to double check all the state names to make sure there were no new major provincial changes had been missed by the dataset considering the shapefile was from 1997.
Full resolution – http://i.imgur.com/ZhJpHLM.png
While the data might suggest that the source of the threats are originating from China, the entities with a low number of connections might be the most dangerous. If someone attempts to connect 1 time, they might have a password that they retrieved the means of a Trojan horse or a password leaks. These are the entities that may be worth investigating. All these entries were listed in the abuseipdb database so they all had malicious associations. While these threats aren’t persistent in that they are automated, they might suggest an advanced threat or threat actor.
Some of the data retrieval might be geographically inaccurate. While georeferencing IP addresses has come a long way, it’s still not an entirely empirical solution. Some extra effort might be required to make sure the data is as accurate as possible.
How does this data help? I can turn around and take the most incessant threats and blacklist them on the firewall so they’ll be unable to even attempt to log in. Using this methodology I can begin to create a blacklist of malicious IPs that I can continue building upon in the future. This allows me to geographically create a network of IPs that might be associated with a malicious entity.
The Internet can be a dangerous place, especially for internet facing devices that aren’t protected by a router or other firewall enabled devices. Nothing is impossible to mitigate and understand for a system administrator that is armed with the correct data. The epistemological beauty of geography is the interdisciplinary applications that can be made with almost anything. Even something is insignificant as failed access attempts can be used to paint a data-rich picture.