Mapping the Historic Capitals of Myanmar

It’s been some time since I’ve written about or created GIS content outside of a professional environment. I figured it was time to take on a quick project to get back in the swing of things.

I came across an article on Wikipedia about the rather lengthy list of capitals the current state of Myanmar (Burma) has had in roughly the last millennium. This struck me as a unique opportunity to quickly create a project that explores historical GIS, that is, the analysis of historical data using geographic information systems. This was also an opportunity to explore additional tools in the realm of GIS production. In this case QGIS, the open source alternative to some of the mainstream, proprietary products on the market today, was what I needed exposure to.

Below is the finished product of the data sourced on the Wikipedia article, with graduated symbology related to the time a location spent as the capital proper. Included is an inset of the crowded central region for clarity.


Full resolution


The methodology was straightforward. Getting to know the suite of tools included in the QGIS environment was quick and painless after reading the documentation and looking up answers to questions as they arose.

The first step was importing a basemap for the project. As opposed to the creation of a new document in ArcMAP, QGIS doesn’t present the user with a list of basemaps to choose from out of the gate. There are plugins that allow this. I wanted to get a feel for the program and decided to download a basemap from the selection of maps in this blog. Once the basemap was in place it was time to parse the article and create the tabulated data.

CSV data was the preferred format for the project. It’s quick to write up and easily imported as a delineated data layer. There was a design choice to make when curating the data regarding capitals that existed at the same place in different periods of time in a noncontinuous manner. Since the data was being represented in two dimensions, as opposed to three or three plus time, it would have been messy to include different symbology in the same spot. Offsetting these symbols manually would allow the intricacies of this data to be displayed correctly but for the sake of simplicity I decided to use one symbol for each location, summing the time a location spent as the capital throughout time. Since the symbology was going to be divided 5 different ways, the accuracy of the time wasn’t of extreme importance. Some of the figures were quickly eyeballed, if you notice by crosschecking the data. The integrity of the data suffers in the long run but the end product, as it is displayed in this project, is the same.

Below is an example of the formatted data:




The data was separated into four columns, one for each the name, the latitude, the longitude, and the length of time it was the capital.

Exponentially graduated symbology was something I would have liked to use for this project but it didn’t seem to be possible with the base functionality of QGIS without plugins so linear graduated symbology was used. One of the capitals in the Myinsaing period; Myinsaing, wasn’t included in the data due to insufficient information regarding its location. This city has since been abandoned and while the archaeological site might have been used to represent its location, it was not easy to find. The data for Pinya was included by cross referencing this map manually with the basemap.

Modern country borders were added to put the data into a modern perspective. Major ocean features and country names were included. An inset was included to display the congested central region in a different scope. A legend was included to explain the symbology. Below is picture of the final table of contents for the project. Equal interval was used to delineate the graduated symbology.




One major change between QGIS and ArcMap is the design mode. This is the mode used to organize how the data will be displayed once it’s projected properly in the data view. ArcMap using a feature called design layout to organize data while QGIS uses a feature called print composer. Both include similar functionality but are presented through the framework of their respective clients.

How could this data be useful? Spatially representing this data allows a researcher to quickly look and interpret different characteristics. For example, the early capitals were inland, representing the inland empires they ruled. The capitals around the Andaman Sea represent a different type of state as power became more maritime in nature. This is the at-a-glance functionality maps provide that I often like to cite as an important part of representing data spatially. Including dates would have been an interesting touch to add to the map but, like stated above, intermittent reigns of the same capitals would have been difficult to represent.

I enjoyed working on this quick project and real like it was a good primer for the QGIS environment. Hopefully this knowledge can be applied in future projects as I continue to make content in QGIS.

Mapping Computer Networks

A network map represents the relationship between objects. This representation can be 2-dimensional or 3-dimensional depending on how the data is structured. Network maps are useful for mapping social relationships, supply chains, and, as I’ll demonstrate in this post, computer networks.

Creating maps of cyberspace is inherently unintuitive. The instantaneous and global nature of networks like the internet defy traditional spatial interpretation. By depicting these networks, for example, on a 2-deminsional plane, the relationship between devices in a network become easier to interpret at a glance.

Below is a network topology map I created to illustrate the relationship of devices I personally manage. For the creation of this map I used the free tools from The free component of the tool is limited to 60 elements, including line features.

network map
Full Resolution

The network consists of 8 servers, 2 desktops, 2 laptops, 2 firewalls, and 8 media devices over 2 sites. By using a combination of symbology and labels, each computer and it’s function can be quickly interpreted.

I’d like to take a moment to stress the importance of  what I mean when I say “at a glance” or “on the fly” when referring to data visualizations. Data, in its rawest form, can be difficult to interpret quickly. Visualizations aid the analysis of data by making it more easily interpretable through communication, in terms of presentation, or by analyst in terms of speed and reliability. When I’m referring to elements of data visualizations like maps that contribute to easier data conveyance at a glance, I’m directly addressing things that make the data more communicable in terms of conceptual and spatial accessibility, speed of interpretation, and reliability as related to distinction and the ease of identification.

Stylistically, the above network map is radial in nature, with the internet occupying the space near the center. In networks that use intranets, or private networks, this space might provide a space for the main routers, switches, domains, or any other device that sees the most traffic or performs a key role in the network. The network is split into three parts, all communicating to the other devices through the internet. For this reason the internet becomes the central feature of the map, the backbone of the network. It’s enunciated by its position on the map and since this central position tends to draw the eyes, it’s easier to, you guessed it, interpret at a glance.

Their are 3 sections of the general network structure. We’ll call the line going to the top of the diagram from the internet symbol site A, and the one drawn towards the bottom, site B. The three separate lines drawn from internet symbol going towards the left represent assets that are in the “cloud” or hardware I don’t have physical access to. These machines aren’t on the same network, represented by the separate, non-intersecting lines, but they’re grouped according to the remote nature of their access.

I tried to make the symbology as intuitive as possible, labeling the different devices by their role, technical specifications, operational capacity. For example, the brick wall represents a firewall unit. At the top we see the all-in-one Untangle unit I wrote about in this article (Working with Untangle Firewall). Site A utilizes a two network setup. All the server assets sit behind the firewall and all the personal devices operate off their own router. This is a network security concept called compartmentalization. If a personal device ever became compromised, it could be leveraged against the rest of the network. The server farm is more operationally secure by the extra layer of security provided by the firewall. This also allows the personal devices to bypass firewall rules which might interrupt leisurely “workflows” while at the same time simplifying firewall operation by not requiring additional rules and conditions.

Site B utilizes a different strategy, this Untangle box, featured in this article (Building an Untangle Box) routes and shapes all traffic. However, the traffic is compartmentalized internally by two separate wifi networks and a hardwired network. The server built in this article (Building a 50TB Workstation/Server) operates off of this box via ethernet. Everything that is not handling sensitive operations like SSH work or banking operates on one Wifi with rules tailored specifically for this heightened level of security. Home media and leisure devices use the other wifi. The idea is that if a router ever becomes compromised, it won\t have leverage over all the devices on the network. This is in addition to the routers being in access point mode, sending all traffic to the untangle box for rules and routing. It never hurts to have these fail safes. All traffic going to site B sits behind a firewall, as opposed to site A which sits behind a modem and router combo unit. This is inherently safer considering all traffic must pass through the untangle box as it moves to or from the internet or, theoretically, other devices.

In the cloud there are 3 VPS servers. These host a variety of functions with the core functionality listed beside them on the map. Like mentioned earlier, these servers aren’t on the same network, or even the same country for the matter. This network relationship is related by the individual lines that do not intersect on there way to the internet symbol.

Creating a network consists of a few design element with plenty left up to the author. It’s easy to begin with a radial design in mind, with devices that serve central points in the network at the center. Grouping devices by role or location helps the reader spatially interpret assets on the fly. Using easily understandable symbology and utilizing verbose labeling helps clarify finer details. Like all maps, computer network maps can change and having a program that allows you to update and edit features is useful for making changes.

The future of maps consists of an abundance of cyberspace assets. Being able to map these networks will define a key component in the toolkits of future cartographers.

Working with Vantrue X2 Dashcam and Dashcam Viewer

Dashcams are becoming more and more affordable as they become easier to manufacture and their use becomes more ubiquitous. I had previously used a Rexing R2 dashcam but was looking to get into something with more robust data collection capability. The Rexing R2 served as a good initial exposure for dashcam operation and the associated workflow (storage, editing). I was able to incorporate dashcam operation into my working theory of data curation in that any dashcam data that was collected, even it is not inherently valuable, may prove valuable in the future, and thus, should be stored indefinitely.


As a quick example of the usefulness of this kind of dashcam ubiquity, we can look to the meteor that touched down near Chelyabinsk, Russia in February 2013. Almost all of the footage is from dashcams, which are mandatory in the country to prevent insurance fraud, and CCTVs. I’m not saying I’m likely to catch a meteor coming down to Earth and it’s my responsibility as a dashcam owner to be prepared for that moment, but I’d rather be caught with it on rather than off. The footage can also become the medium for other creative expression.

I enjoy working with the footage, speeding it up and putting it alongside music. Driving is something I enjoy and editing driving footage provides a similar satisfaction. Unfortunately, the Rexing R2 and its fish-eyed convex lens was destined to end badly. The lens protruded beyond the safety of the bezel and all it took was one instance of accidentally setting it lens-side down on an abrasive surface for the lens to be slightly cracked, enough to blemish the picture.

Finding a camera which was immune to this kind of operator error was my first priority. Also important was the incorporation of a GPS unit with exportable data. I find a reasonable solution in the Vantrue X2. It was a steal on Amazon for $99, though seems to be out of stock now. It comes out of the box with 2K filming capability, expertly tailored night vision, 64GB microSD support, and an optional GPS mount. This cam checked all the boxes. Two days later I had it installed and took it for a test drive.

A couple things to consider right off the bat; I do quite a bit of driving on average and I’m not one who wants to dismount the camera and export all the footage several times a week. I also thought it would be irresponsible, since I’m storing this footage for whatever future opportunities might arise, to film in less than the full 2K resolution of the camera’s capability. 64GB SD card capability becomes just “OK” at this point, storing between 6 and 7 hours of data before needing to be hooked up to the computer and moved over. 128 or greater might be something I look for in the future, although I’m definitely not in the market for another camera. The 6 hours hasn’t been a problem except for a handful of times I’ve been driving long distances and found myself needing to offload the footage temporary on a device before delivering it to the storage server. However, the average user will not have these problems if they’re not meticulously hoarding this data. The camera has functionality that allows it to overwrite previous footage when it becomes full. Relying on this rolling recording will always assure you have the last 6 hours of driving footage, no maintenance required.

Armed with the camera and GPS mount, I was ready to collect the data, which came naturally over the following months. The next step in this geographic exploration was to incorporate this data in some sort of map. This led me to the Dashcam Viewer by Earthshine Software. This program extracts the GPS data from the videos, plots them on a map, and allows you to cartographically exam your driving. Dashcam Viewer is available for Windows and Mac. Sadly, there is not an official Linux version, although I haven’t tried emulating it on a Linux machine with Wine.




The first video I thought to make was a realtime video with two maps of different scale, showing where the vehicle is in relation to the surroundings that might not be visible on film. Dashcam viewer includes lines that show differences in relative speed which is a nice touch, and saves time compared to crunching this data manually in something like ArcGIS.

Capturing the map footage required a little ingenuity. I couldn’t save the video of the Dashcam coordinate route so I thought capturing video of the desktop then cropping it to the window in question would the easiest route to get a result. The finer details could be ironed out afterwards. I was able to create the two cropped videos of the maps and using the Filmora editor, was able to combine them with the actual footage. A little editing flare and some music was all it took to combine this rough draft, which served as a proof of concept for future projects.



Next I wanted to move onto timelapse videos so these new map perspectives could be incorporated. The length of the editing process is something I’m still trying to reduce with this workflow. Capturing the 2 maps in realtime using Xsplit to capture the desktop adds 2 times the length of the original footage to the process. For the next project, I wanted to use a 4 hour segment of films. This would require 8 hours of desktop capture, not acceptable for a productive workflow, but for what I am doing in this early proof-of-concept stage, getting the results is more important than the workflow.

I started running into limitation in the Filmora video editor. Editing with multiple video sources was limited, and I couldn’t export the final production in glorious 2K resolution due to the 1080p limit. Filmora isn’t native to Linux which is the ecosystem I’m trying to move all my production towards. Wine emulation is poor. For the future, I’m looking towards Da Vinci Resolve by BlackMagic. This, I assume, is an intermediate video editing application where Filmora is focus more on entry level editing.

The idea for the second project using the new dashcam was based around a 4 hour trip. I captured all the media. Moved it over to my Windows machine to edit with Filmora. To make the editing process easier, I focused on one source at a time. First, I merged all the dashcam footage into one video. This machine is working with a Q6950 processor so all rendering had to be done overnight. Once the dashcam footage was one video, it was easily muted sped up by a factor of ten, then rendered again. This gave me finalized footage I wouldn’t have to edit when piecing all the sources together.

I then booted up dashcam viewer and started the desktop capture of the maps in realtime. This took over 8 hours for both maps. Once the capture was complete, they were put through some quick editing so post-production would just be piecing the sources together. They were sped up 10x and rendered individually with custom resolutions, so they could sit on top of the original footage seamlessly.

The first map was set to “follow” the GPS signal at a small scale. The second map would show a majority of the trip and often the starting point and destination in the same frame. These provided two different perspectives for the footage in case the viewer wants supplementary geographic information.

Syncing all the footage was something that turned out to be more complicated than expected. I originally wanted the final editing procedure to be just piecing together the three sources, the dashcam footage, and the two maps. However, the maps were often out of sync with the footage, and had to be adjusted manually every few minutes. This led to chopping up the footage, creating errors in the maps halfway, thanks to Filmora and operator errors.

Post production included adding the music, adding the song information, and fading in and out where appropriate. The final product is not perfect, as there are map errors in the middle of the video and at the end, but I’m happy with how the workflow and the product ended up.


In the future, I hope to choose a different editor, and see if I can find an additional way to capture and render the maps, with a focus on speed of production. I’d love to find other ways to incorporate GPS information like bearing and speed into the video. Until then, it’s off to add to the every growing collection of dashcam footage.

Building an Untangle Box

A few weeks ago I did a quick write up about the Untangle firewall system my experience installing and using it on a Protectli Vault all-in-one mini PC. Today I’d like to describe a box I set up as an alternative to the model I previously used, the Protectli Vault. For this box I used an old Optiplex 780 purchased on Amazon for $87. I’ve been using the OptiPlex 780 for the starting point in a lot of projects recent due to the fact that it’s modular by nature, easily upgradable, and has components that are powerful enough to tackle any moderately resource intensive modern tasks.



The OptiPlex made a great jump-off for this project. I wanted an untangle box that was a small form factor so it could be easily incorporated into the physical environment where it would be operating but not fquite as small as the Protectli Vault setup I had used before. I tried to keep the budget around around $370, the price of the original Protectli Vault Setup. I wanted to keep the at least as powerful of the Protectli Vault build.

First I took a look at the RAM. The OptiPlex 780 has 4 dual channel DDR3 slots onboard. This is more than capable enough to match the RAM loadout on the Vault. I was able to find an 8GB kit of two DDR3 1600MHz sticks for $56 on Amazon. These sticks were plenty powerful for what I was building. The 7800 came with 4GB of RAM preinstalled, allowing some cost to be recouped. This 4GB might be enough if the amount of services running in the Untangle installation were minimal.

Next was the storage solution. The Vault comes with 120GB of solid state memory so I figured a 2.5″ SSD would be a suitable match for the OptiPlex. I found a SanDisk 120GB SSD, again on Amazon, for $60. This would provide quick read/write speeds for typical Untangle operation and open up the possibility of using disk space for swap operation if the need arose. The 780 comes with a harddrive already installed and they range between 160GB and 250GB. After the SSD installation, these could be salvaged for other projects or to recoup some cost.

Arguably the most important part of this particular build is the network interface. The 780 comes equipped with just the 1 network interface onboard out of the box. This, by itself, isn’t capable of being a functional box. There needs to be at least 2 ethernet ports, one for the internal connection and one for the external connection, for the box to function as a firewall. I decided it would be appropriate incorporate a 4-port 1000Mbps NIC to allow for up to 4 external connections. This one-upped the Vault by allowing an additional connection compared to the 3. I purchased the PRO/1000 Ptquadport from Amazon for $56 (now $50) and, in turn, freed up a 4-port switch I had been using to route local traffic, allowing addition cost reclamation by selling this redundant equipment. The NIC had to be low-profile to accommodate the reduced room in the small form-factor OptiPlex. I decided to additionally include a single port card in the spare PCI slot, bringing the number of external ports to an unprecedented 5.

Finally, I wanted to include a beefy quad-core CPU to again one-up the Protectli Vault. The Q9650 was a work-house Core 2 Quad-core chip in its day and still packs a wallop. This monster can hang with new processing solutions and would be more than enough for this build, theoretically capable of routing over a gigabit of traffic at any time and possibly much more depending on how many local services Untangle is running. I was able to secure one from Amazon for $49. Installing the chip however was tricky.


During the install I periodically powered up the build to ease troubleshooting if problems arose. The assembly did turn out to problematic when I installed the NIC and the new processor. Replacing the CPU was probably this most time intensive step in the process. This process included removing the existing E8500 chip in the OptiPlex, another redudant part that could be sold. The process was made easier by the easily removable heatsink secured by two screws. The hood attached to the heatsink is easily detached from the HDD assembly. Thermal paste was then applied to the new Q9650 and the heatsink was the reattached. The system did not boot, and the OptiPlex was showing the error code “2 3 4”, displayed on the lights at the bottom of the front of the chassis. These lights were accompanied by a solid amber light emanating from the power button, indicative of CPU issues.

Troubleshooting was easy enough. I had a spare OptiPlex 780 laying around that had identical specs andd installed the Q9650 in it after removing it from the Untangle build. Luckily, it booted up, eliminating the possibility that the chip was faulty. I then tried the sparee OptiPlex’s chip, another Q9650, in the new build. This attempt also failed to boot, producing the same error indicators for a faulty chip. This confirmed the problem was local to the new build and narrowed it down to a problem with the board or some part of the CPU assembly. Luckily, the problem was due to how the heatsink was mounted, so no faulty hardware was involved. I attached the heatsink by tightening the screws nearest to the DVD drive first instead of the opposite. This pressure differential most have secured the CPU in an optimal way because the machine booting up properly on the next attempt.


The assembly of all the components was relatively painless apart from the CPU hiccup. With the machine up and running and the software configured, we were off to the races. The physical environment was prepared with a small shelf so the box itself could set out of the way. It was anchored to the wall using some wire to prevent any nudges from sending it crashing to the floor. The build was officially ordained with an Untangle sticker on the case.



The final price was $308, and with current prices, this total is just below $300, putting us about $70 below budget.

OptiPlex 780 $87

2x4GB DDR3 1600MHz RAM $56

SanDisk 120GB SSD $60

4-Port NIC $56

Q9650 Processor $49

Total: $308

If the micro form factor provided by the Protectli Vault isn’t a necessity it is demonstrably proven that a box with a superior CPU and network solution is built for around $70 cheaper. This box can handle anything that will be thrown at it in the foreseeable future and is powerful enough to utilize all of the features in the Untangle software suite. In this scenario the OptiPlex once again proves to be an optimum solution.

Building a 50TB Workstation/Server

Collecting data is a passion of mine. I’ve always enjoyed collecting things. I believe the act of collection is a critical component of the human psyche and experience. The act of maintaining, curating, and growing collections of data is personally and professionally therapeutic and fun. Collecting data, applying it to typical situations is a critical part of approaching everyday life in the 21st century and the better your tools, the better your efficacy. Being able to build these tools yourself puts you in even greater control over your data management solutions and opens the door for unique opportunities to engage with interesting cutting edge technology.


Building servers to accomplish the task of storing all the data I’ve collected over the years is a big priority for me. I don’t like to delete things. I don’t like to delete different versions of the same thing. Having the hardware that is capable of scaling and storing my ever-expanding repository of documents, movies, music, data, pictures, games, books, and programs is very important to me. I’ve seen the detrimental effects on not having this data easily available this year as I’ve had 30TB between the cloud and a physical box at home, something that isn’t particularly integral or useful for my workflow.

Having the data available and easily accessible is only one part of the equation. Security is the second part. Computer operation is always a trade off between convenience and security. When it comes to this bulk storage, I’ve come to the conclusion that my personal needs would be better met by having this server offline. By having this server airgapped, I feel like I would have more control over what is ingested and egressed and would be better situated to deal with malicious threats like ransomware.

The planned server is only one part of the solution. I hope this server can function as a backup and that another server, to be built in the future, would handle all the internet-facing and production activity. This would fulfill the data integrity requirement of an offsite backup, making the data that much more secure in the long run and elicit more peace of mind for the administrator.

I decided to run a non-Windows operating system on this machine. I feel like it would require less maintenance, in the form of updates and daily maintenance, as well as eliminate some of the security woes I’ve had in the past with Windows machines.. I decided I want to utilize the ZFS filesystem for the added control of data integrity and the redundancy operations that are superior to traditional RAID. There is no native ZFS support on Windows. First I looked at OpenIndiana, a Solaris distribution that has ZFS baked in. I was worried about hardware support and expandability in the future so this unfortunately might not be an option. I looked at FreeNAS which is a BSD distribution for network attached storage. I wasn’t exactly sure if it had the capability under the hood I was looking for as a workstation, and since the box wouldn’t be connected to a network, a lot of the functionality would not have been utilized. FreeNAS was also limited by its user interface. While it has a robust web interface, the local desktop environment is lacking for use as a workstation.

Securing the hard drives was my first concern when setting up this build. A  great deal was found in the form of Western Digital Easystore 8TB external hard drives from Best Buy. These external enclosures housed WD80EFAX drives that can be easily “shucked” from the enclosure and used for other projects. These hit the shelves at $159.99 a piece which is about $50 cheaper than the cheapest standalone internal drive on the consumer market. I decided to buy as many of these as I could afford, taking off an extra 10% from opening a Best Buy credit card. This is a storage deal you only see once every few years. These drives do come with some drawbacks.


I started mounting the hard drives in the Nanoxia Deep Silence 1 case and realized that the mounting holes were not in the standard position. I was only able to secure two out of four mounting components in the drive trays. This was concerning because drives that can give and move in their enclosures will have shorter lifespans. This case would have to sit up vertically so hopefully gravity would provide the same service as the two missing tray mount points. The 1 year warrant is also something to consider compared to the 3-year warranty on most barebones drives.

The PSU from a previous build was the put in the tower. Shipped, the Nanoxia DS1 comes with 11 internal 3.5″ slots in the form of two 3 drive cages and one 5 drive cage. One of the 3 drive cages had to be removed for the 750W, modular PSU to be install. This build screams overkill and this PSU is definitely part of that. My reasoning is future-proofing, but it’s also nice to find some use for extra parts laying. The highest load this machine would experience would likely be several hundred watts less than 750. All 8 hard drives spinning up at once does create a load that needs to be considered. In addition to installing the PSU, I went ahead and screwed in the motherboard standoffs and did some early wire management to make installation easier.


The motherboard was dry-fitted, assembled, and tested outside of the case to prevent any troublesome troubleshooting. The CPU and RAM was both easily fitted and popped in respectively. The heatsink was dry-fitted to make sure it successfully fit the AM4 socket, despite only saying AM3 on the box. Thermal paste was then applied to the processor and spread to an even coat with a piece of cardboard before the heatsink was applied for a final time. The 2 sticks of 8GB RAM were double checked to make sure the proper dual channel slots were being utilized. The slots were staggered on this board.


Installing M.2 SSD was interesting to do for the first time. I have never had the pleasure of working with one before. The motherboard includes a special standoff for the M.2 SSD and a screw to secure it in place.


After everything was installed it was time to power on the motherboard assembly. This would be done outside of the case on a static resistant material first. The PSU needs to power the mainboard molex, the 8 pin CPU power and at least the power switch on the case. At first it didn’t display. Luckily the  B350 motherboard comes with 4 debugging lights which indicate what component is preventing the system from posting.


The GPU debug light was on and I did a quick facepalm. I had forgotten that Ryzen series chips did not include integrated GPUs and needed discrete graphics cards in order to display. Luckily I was able to cannibalize a GT 1030 from another computer I had laying around. There is a firepro W4100 on the way for another project that might have to be adopted for this project. The 1030 will do for now. Definitely something to consider. I might not have bought this Ryzen originally since I failed to foresee the cost of a discrete video card. I’m still satisfied with my purchase so far. $300 for 8 cores is a great deal no matter how you slice it. If I decided to use the GTX 1030, I will need to get a full profile bracket so it will be flush with the slots on the back of the machine.


With the motherboard posted and fitted, the IO shield was installed on the back of the case. Wires in the case was further arranged for management later. The DVD optical drive was hooked up. FreeNAS was booted up to try out an OS. The system booted fine into the operating system after installation which is always nice. Installing the OS to the M.2 SSD was humorously fast. Decided to switch over to Ubuntu after seeing the FreeNAS’s lack of a DE. OpenIndiana, my other choice, needed some BSD shell knowledge that I was not particularly in the mood to figure out. “Just Working”™ is something I look for in an OS and Ubuntu should support everything out of the box, has a DE, and can run ZFS.

I then encrypted the disk and encrypted the home folder. These are two basic hardening steps for the OS and Ubuntu offers to perform them during the installation process. No one will be able to boot into the system using a rescue CD, DVD, or USB without the password considering these two encryption options. The M.2 SSD will allow this constant encryption work to be transparent and almost unnoticeable thanks to the 3GB/S read/write speeds, something that might bottleneck performance on other hard drive technologies. The speed of this little device is shocking. An install that can take as long as fifteen minutes was done in less than three, including the time intensive encryption operations. This is a fantastic form factor that makes SATA SSD’s seem like they crawl.


After the basics were up and functioning it was time to connect everything on the board; audio ports, USB ports, HDD lights, power lights, reset switch, fans. The SAS controller card was set to go in next, followed by the HDD array. The SAS card booted up properly the first time and occupied the second PCIE 16 slot on the motherboard. I decided it would be best to installing drives one at a time. This way I could erase the preinstalled partition left over from the WD Easystore software, label the drives, and test them individually. Another issue arose over the form factor of these drives. They would not clear the back of the cage, which only allowed one side of the clips to secure the drive in place, further adding to the instability problems. It would be possible to alleviate this by modifying the cages themselves. This is not something I wanted to jump straight into. After everything was checked out and noted it was time to install the ZFS filesystem.


ZFS has to be downloading from the Ubuntu repository. I wanted to create a whitelist that only allowed communication from the server to the Ubuntu repository. Messing with IP tables was not providing the functionality for URLs I was used to with other solutions like Untangle. I decided it was easier to deal with it on the hardware firewall later. Sudo apt-get install zfs is all it took to get the filesystem utility ready to operate. I still need to explore ZFS as a system. This server will give me a platform to experiment before I bring the 25TB of data down from the Amazon cloud.

The wiring for the drives was an extremely tight fit. There was not enough room for the cable management I wanted to perform. The side panel was barely able to latch into place and even then the panel was bulging were the wires were most crowded. Most of the slack wiring is in the open side of the case. A possible mod for this would be cutting a hole where the crowding and installing some kind of distended chamber for the excess wiring. This is something to consider in the future.

Below is the list of parts and the link to this list on PC part picker.

PC part picker link

There are definitely some things I want to handle with this project in the near future. The case either needs to be modified to allow more cable room, are the drives need to be refitted so they dump cables into the front side of the case. This might also alleviate the crowding against the drive cages.

I want to find a good use for the Ryzen 7. Video capture was one of the first things that came to mind. I’d like to include a capture card in this build, having a second system to capture video greatly increases the intensity of operation that can be done on a primary machine without the processoing overhead of recording on the same machine.

I need to install the 2 hard drive hot swap bays. This will fill up all the remaining 5.5″ slots on the case. Having two hotswap bays makes the ingest process easier, allowing two drives to be ingested or egressed as well as duplication operations.

I’d like to investigate additional uses for the build. It hasn’t been completely put into production so the finer details of operation are still up in the air. This build was one of the most powerful machines I’ve ever had the opportunity to put together. I can’t wait to start to begin sorting and curating the data on this machine and expanding its functionality in the future. Hopefully “Ratnest” has many years of hoarding data ahead of it.


After rereading this post I forgot to mention the 50TB total storage. 8 x 8TB is 64TB of raw storage. This is shrunk to 47GB when using ZFS with 1 drive redundancy.

By reversing the direction of the drives in the cage, I was able to route the cords in a manner that allowed the sides to fit on the case. This mounting technique allowed the drives to clear the back of the cage, alleviating the need for case modification, always a plus when it is not completely necessary.


Being able dto situate these drives in the case and close it without having a visible bulge in the side panel effectively completed this build. It is now operational and should provide enough storage for all the data I’m ingesting for the next couple years at least.

All the dense drives made this these the heaviest build I ever constructed, weighing in at almost 50 pounds, a pound for every Terabyte.

Here’s to hoping for a successful archival workflow in “Ratnest”s future.


Writing a RAID Calculator in Python

RAIDr is a RAID calculator written in python. It accepts input from the user and calculates how a certain configuration of hard drives will be allocated across different RAID configurations. This is the first program I ever wrote and the project that got me interested in programming. It’s not the most efficient, and there are some alternate ways to approach this problem, but I’m happy with the product as it turned out. It is still incomplete, but hopefully, someone can find it useful. The code is commented with thoughts on how it should function and things that need to be done. I’m not a professional python programmer, and my methodology might not be completely “pythonic”, but this was a great project for me to gain exposure to programming and syntactical logic. Any constructive criticism is welcomed.

## Josh Dean
## RAIDr
## Created: 2/14/2017
## Last Edit: 3/21/2017
## Known bugs:

## Global Declarations

hddnumvar = 0
hddsizevar = 0
raidvar = 0
hddwritevar = 0 ## used to mitigate reference error in RAID 10 calculation

## Functions

def hdd_num():
	global hddnumvar
	print ("\nHow many drives are in the array?") ## eventual formatting errors will come from here
	hddnumvar = input() ## necessary if variable is global?
	if hddnumvar == 1:
		print ("Error: Can't create a RAID with 1 disk.")
	elif hddnumvar &gt; 1:
		print hddnumvar, "drives in the array"
		print "----------------------- \n"
		print ("I don't know what you entered but it's incorrect.")

def hdd_size(): ##needs error parsing
	global hddsizevar
	print ("What is the capacity of the drives? (gigabyte)")
	hddsizevar = input() ## possible to use line break with input
	print hddsizevar, "raw GiB per disk"
	print "----------------------- \n"
	print("%s drives in the array of %s GiB each.") % (hddnumvar, hddsizevar) ##there was a return value here, implication, seems to be hanging here with a syntax error?
	##removed the % format for something else, seems to be working single quotations critical for functional syntax, fixed it by including the arguments in parathesis

def raid_prompt(): ##update this to reflect actual raid configurations, calls raid_calculation, all edits and calls should start here
	print ("\n1 - RAID 0")
	print ("2 - RAID 1")
	print ("3 - RAID 5")
	print ("4 - RAID 5E")
	print ("5 - RAID 5EE")
	print ("6 - RAID 6")
	print ("7 - RAID 10")
	print ("8 - RAID 50")
	print ("9 - RAID 60 \n")
	raidvar = input("What raid configuration? \n")

def raid_calculation(raidvar): ## just handles the menu
	if raidvar == 1:
		hddtotal = hddsizevar * hddnumvar ## variables need to go first
		print "\n-----------------------" ## /n doesn't need a space to seperate, bad formatting, best to put this in front
		print ("RAID 0 - Striped Volume")
		print hddnumvar, "drives in the array"
		print hddsizevar, "raw GiB in the array per disk"
		print "%s raw GiB in the array total" % hddtotal
		print "Total of", hddnumvar * hddsizevar, "GiB in the RAID array." ## this need alternative wording throughout the program
		print "%s times write speed" % hddnumvar ## Can I put these two prints on one line? Multiple % variables?
		print "%s times read speed" % hddnumvar
		print "No redundancy"
		print "No hot spare"
		print "----------------------- \n"
	elif raidvar == 2:
		print "\n-----------------------"
		print ("RAID 1 - Mirrored Volume")
		print hddnumvar, "drives in the array"
		print hddsizevar, "raw GiB per disk"
		print "Total of", hddsizevar, "GiB in the array."
		print "%s times read speed" % hddnumvar
		print "No write speed increase"
		hddredunvar = hddnumvar - 1
		print "%s disk redundancy" % hddredunvar
		print "No hot spare"
		print "----------------------- \n"
	elif raidvar == 3:
		if hddnumvar &lt; 3:
			print &quot;\nYou need at least 3 disks to utilize Raid 5&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 5 - Parity&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar - 1) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar - 1
			print &quot;%s times read speed&quot; % hddreadvar
			print &quot;No write speed increase&quot;
			print &quot;1 disk redundancy&quot;
			print &quot;No hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 4:
		if hddnumvar &lt; 4:
			print &quot;\nYou need at least 4 disks to utilize RAID 5E\n&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 5E - Parity + Spare&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar - 2) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar - 1
			print &quot;%s times read speed&quot; % hddreadvar
			print &quot;No write speed increase&quot;
			print &quot;1 disk redundancy&quot;
			print &quot;1 hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 5:
		if hddnumvar &lt; 4:
			print &quot;\nYou need at least 4 disks to utilize RAID 5EE\n&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 5EE - Parity + Spare&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar - 2) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar - 2
			print &quot;%s times read speed&quot; % hddreadvar
			print &quot;No write speed increase&quot;
			print &quot;1 disk redundancy&quot;
			print &quot;2 hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 6:
		if hddnumvar &lt; 4:
			print &quot;\nYou need at least 4 disks to utilize RAID 6\n&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 6 - Double Parity&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar - 2) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar - 2
			print &quot;%s times read speed&quot; % hddreadvar
			print &quot;No write speed increase&quot;
			print &quot;2 disk redundancy&quot;
			print &quot;No hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 7:
		if hddnumvar &lt; 4:
			print &quot;\nYou need at least 4 disks to utilize RAID 10\n&quot;
		elif (hddnumvar % 2 == 1):
			print &quot;\nYou need an even number of disks to utilize RAID 10\n&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 10 - Stripe + Mirror&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar / 2) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar / 2 ## actual write variable calculation
			print &quot;%s times read speed&quot; % hddnumvar
			print &quot;%s write speed increase&quot; % hddreadvar
			print &quot;At least 1 disk redundancy&quot;
			print &quot;No hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 8: ## bookmark, need formulas
		if hddnumvar &lt; 6:
			print &quot;\nYou need at least 6 disks to utilize RAID 50\n&quot;
			print &quot;\n-----------------------&quot;
			print (&quot;RAID 50 - Parity + Stripe&quot;)
			print hddnumvar, &quot;drives in the array&quot;
			print hddsizevar, &quot;raw GiB per disk&quot;
			print &quot;Total of&quot;, (hddnumvar - 2) * hddsizevar, &quot;GiB in the array.&quot;
			hddreadvar = hddnumvar - 2
			##print &quot;%s times read speed&quot; % hddreadvar
			##print &quot;No write speed increase&quot; # Although overall read/write performance is highly dependent on a number of factors, RAID 50 should provide better write performance than RAID 5 alone.
			print &quot;2 disk redundancy&quot;
			print &quot;No hot spare&quot;
			print &quot;----------------------- \n&quot;
	elif raidvar == 9:
		if hddnumvar  9:
		print ("Error: Please select a number between 1 and 9")
	elif raidvar == 0: ## additional error parsing required here
		print ("Error: Please select a number between 1 and 9")
		menu_prompt() ## ubiquitous for all loop items that aren't errors

def disk_num_prompt(): ## this will eventually need to except arguments that are context sensitive for raid type and disk requirements, perhaps handle this is the raid_calculator function
	global hddnumvar
	print "Adjust number of disks?"
	print "1 - Yes"
	print "2 - No"
	disknummenuvar = input()
	if disknummenuvar == 1:
		hddnumvar = input("\nHow many drives are in the array? \n")
		if hddnumvar == 1:
			print "Error: Can't create a RAID with 1 disk."
		elif hddnumvar &gt; 1:
			print "\nUpdated"
			print hddnumvar, "drives in the array" ## displays once for every loop, hdd_num_input for mitigation
			print ("I don't know what you entered but it's incorrect.")
	elif disknummenuvar == 2:
		print("I don't know what you entered but it's incorrect.")

#below is the menu for the end of the selected operations
def menu_prompt(): ## need additional option to go to GiB to GB converter?
	print "1 - RAID menu"
	print "2 - Quit"
	print "3 - Start Over"
	menu = input()
	if menu == 1:
		raid_prompt() ## looping, quit() function is ending script, will need revision
	elif menu == 2:
		print "Cya"
	elif menu == 3:
	elif menu == 0:
		print "Error: Please select 1, 2, or 3 \n"
	elif menu &gt; 3:
		print "Error: Please select 1, 2, or 3 \n"
		print "quit fucking around" ##formatting

def data_transfer_jumpoff(): ##BOOKMARK
	print "What is the transfer speed? Gigabytes, please"
	transfervar = input()
	print"What denominator of data size?"
	print"1 - byte"
	print"2 - kilobyte"
	print"3 - megabyte"
	print"4 - gigiabyte"
	print"5 - terabyte"
	transferunit = input()
	print"How much data?"
	transferamount = input()

## Start Prompt, this needs to be expanded upon
def start(): ##easier way to reset all these variables?
	startmenu = 0
	raid_var = 0 ## should be an inline solution for this in its own function, it just works
	hddnumvar = 0
	hddsizevar = 0
	raidvar = 0
	print("\nChoose an operation") ## line break might cause formatting errors look here first
	print "1 - RAID calculator"
	print "2 - Data Transfer Calculator"
	startmenu = input()
	if startmenu == 1:
		hdd_num() ## these need to be called in a more functional manner
	elif startmenu == 2:
		print "Not supported\n" ## will require edit

#main scripting



The operation of the program begins by calling the start functions. I put this function call at the bottom of the script so it would be easily accessible. start() is the last function before the initial start call. From this menu the user is asked which of two currently implemented operations they wish to perform: RAID calculator or Data Transfer Calculator. Data Transfer Calculator is still a work in progress.



When RAID calculator is selected the user is queried about the number of hard drives and their capacity through the calling of two functions: hddnum and hddsize. These functions would be called upon several times during the session of the program, so I thought it would be appropriate to make them their own functions. These functions read input from the users and set the appropriate variables for use during the calculation.

Next the RAID formats available are listed, one of which the user can choose which calculation they want to perform on the previously set variables. In this instance of the code, RAID 1 through 10 work fine, but the functions for RAID 50 and 60 are missing their capacity calculations since the formulas are not as straightforward. Once the selection is made, the results of the calculations are displayed.

At the end of the operation, users are presented with several options. They can change the variables and recalculate. The RAID calculation itself could be changed. The main menu can also be called in the future to perform a data transfer calculation. In the future, it might be beneficial to pass the size of the array, and allow the calculation of the transfer speed by just asking the user for the connection speed. This data could be appended to the RAID data. It might also be beneficial to include a memory function to remember specific RAID configurations, and read it to a text file than can be loaded on subsequent runs.

Another function that might be useful is the reconciliation of GiB values and GB values. This would help if users are using an NTFS file system. It might also be useful to include other filesystem types in the calculations to get the most accurate numbers possible and maximum compatibility.

Again, this was fun to make and I find myself using it from time to time. There is still a lot of work to do if the program can stand on it’s own. Taking user input comes with an interesting set of problems that could allow certain inputs to change the functions of the program. If the user isn’t intentionally trying to break the program this shouldn’t be an issue, the instructions and commands are very clear when user input is necessary. There is some formulaic work to be done with the two newest RAID formats.

Python was a great language for me to grasp the beginning intricacies of programming. I feel like the capability to make even more intricate programs is possible. Combining the operating structure of something like RAIDr with GIS functions illustrated below would allow easy semi-automatic scripting of tasks. The sky is once again the limit.

Working with GIS and Python

Python is a powerful scripting language that allows users to script repetitive tasks and automate system behaviors. Python is not object oriented, differentiating its development and operation from languages like C++ and Java. The scripting syntax of Python might ease the learning curve for those new to programming concepts. GIS is a great introductory to Python programming. Please excuse any formatting errors. Some indentation was lost when copying over.




ArcGIS has robust support for Python, allowing many GIS methods to be automated for optimized development with the ArcMap software framework. ArcPy is a python module that allows python to directly interface with ArcGIS software, giving the user powerful scripting capabilities within the ESRI software ecospace. ArcMap has a built in script editor which provides a graphical interface users can use to construct scripts without the default python shell. This feature is called Model Builder, and it makes the relationship between python and the ArcGIS toolkit easier to understand and implement for the visual thinker.

I provided examples of my own work that have been written in either Model Builder or in the Python IDE. I tried to keep the examples strictly geographic for this post. These scripts aren’t guaranteed to work flawlessly or gracefully. This is a continued learning experience for me and any constructive criticism is welcome.

Here’s an example of what I’ve found possible using python and ArcPy.


# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# Created on: 2017-03-28 15:18:18.00000
# (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy

# Local variables:
Idaho_Moscow_students = "Idaho_Moscow_students"
Idaho_Moscow_busstops_shp = "H:\\Temp\\Idaho_Moscow_busstops.shp"
busstops_50m_shp = "H:\\Temp\\busstops_50m.shp"
within_50m_shp = "H:\\Temp\\within_50m.shp"

# Process: Buffer
arcpy.Buffer_analysis(Idaho_Moscow_busstops_shp, busstops_50m_shp, "50 Meters", "FULL", "ROUND", "ALL", "", "PLANAR")

# Process: Intersect
arcpy.Intersect_analysis("Idaho_Moscow_students #;H:\\Temp\\busstops_50m.shp #", within_50m_shp, "ALL", "", "INPUT")


The output is tidy and properly commented by default, saving the user the usual time it takes to make the code tidy and functionally legible. It also includes a proper header and the location of map assets. All of this is done on the fly, making sure quality code is produced every time. This is a great reason to use model builder over manually programming in in the IDE.

The script above takes a dataset containing spatial information about students and bus stops in Moscow, Idaho, applies a 50 meter buffer to the bus stops and creates a shapefile of all the students that intersect this buffer. This information can then be reapplied by entities involved in either of these operations, meaning, operations can be applied to this newly created 50m layer on the fly. We can then increment the data using the model builder to create shapefiles for different buffers.

The benefit of this over manually creating the shapefile is the obscene amount of time saved. Depending on how thorough the GIS is, each one of these points might need its own shapefile or aggregation of shapefiles. This script runs the necessary 100 or so scripts to create the spatial assets in a fraction of the time it would take a human.

The script below takes the same concept but changes the variables so the output is 100m instead of 50m. Segments of the code can be changed to augment the operation without starting from scratch. This makes it possible to automate the creation of these scripts, the ultimate goal.


# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# Created on: 2017-03-28 15:19:04.00000
# (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy

# Local variables:
Idaho_Moscow_students = "Idaho_Moscow_students"
Idaho_Moscow_busstops_shp = "H:\\Temp\\Idaho_Moscow_busstops.shp"
busstops_100m_shp = "H:\\Temp\\busstops_100m.shp"
within_100m_shp = "H:\\Temp\\within_100m.shp"

# Process: Buffer
arcpy.Buffer_analysis(Idaho_Moscow_busstops_shp, busstops_100m_shp, "100 Meters", "FULL", "ROUND", "ALL", "", "PLANAR")

# Process: Intersect
arcpy.Intersect_analysis("Idaho_Moscow_students #;H:\\Temp\\busstops_100m.shp #", within_100m_shp, "ALL", "", "INPUT")


This example with a 100m buffer instead of a 50m buffer, can either be changed in model builder itself, manually using the replace function in your favorite text editor, or changed in ArcMap’s model builder. By changing one variable we have another porperly formatted script saving time that would have been spent manually operating the tools in the ArcMap workspace. This can be further developed to take input from the user and running the tools directly through arcpy, allowing for the possibility of “headless” GIS operations without the need to design manually.

This functionality extends to database operations. In the following script shapefiles are created by attributes in a table.


# ---------------------------------------------------------------------------
# Created on: 2017-03-30 09:06:41.00000
# (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy


# Local variables:
airports = "airports"
airports__4_ = airports
Airport_buffer_shp = "H:\\Ex7\\Exercise07\\Challenge\\Airport_buffer.shp"

# Process: Select Layer By Attribute
arcpy.SelectLayerByAttribute_management(airports, "NEW_SELECTION", "\FEATURE\" = 'Airport'")

# Process: Buffer
arcpy.Buffer_analysis(airports__4_, Airport_buffer_shp, "15000 Meters", "FULL", "ROUND", "ALL", "", "PLANAR")


This script finds all attributes labeled “airport” in a dataset and creates a 15km buffer around each one. By integrated SQL queries, data can easily be parsed and presented. All of this code be generated using model builder in the ArcMap client. Efficient scripting comes in the form of the efficient application of the python functional logic with a clear realization of the objective to be achieved.


import arcpy
arcpy.env.workspace = "H:/Exercise12"

def countstringfields():
fields = arcpy.ListFields("H:/Exercise12/streets.shp"," ", {"String"})
namelist = []
for field in fields:



This script counts the number of “string” fields in a table. The function “countstringfields” starts by locating the “String” attribute in the attribute table of a shapefile. Next a list of names in defined. A loop then appends the type “String” to a list. The variable “fields” instructs the loop to run through the entire list of strings, essentially counting them “by hand”. The resultant count is then printed for the user, all out of the ArcMap client. This script can be further developed by introducing variables for the shapefile and datatype read from user input. The proper use of indentation and whitespace is an important part of Python syntax so when things like nested loops are used, special consideration should be taken. Scripts can also be used to update datasets in addition to parsing them.



import arcpy
from arcpy import env
env.workspace = "H:/Ex7/Exercise07/"
fc = "Results/airports.shp"
cursor = arcpy.da.UpdateCursor(fc, ["TOT_ENP"])
for row in cursor:
	if row[0]  a + b) or (a &gt; b + c) or (b &gt; a + c):
	print "Valid: No"
	print "Valid: Yes"


print("Feeding program lists of measurements...")

while count &lt; 4:
	if count is 0:
		print listA
		a, b, c = listA
		count = count + 1
		triangle_type(a, b, c)
		triangle_validity(a, b, c)
	elif count is 1:
		print listB
		a, b, c = listB
		count = count + 1
		triangle_type(a, b, c)
		triangle_validity(a, b, c)
	elif count is 2:
		print listC
		a, b, c = listC
		count = count + 1
		triangle_type(a, b, c)
		triangle_validity(a, b, c)
		print listD
		a, b, c = listD
		count = count + 1
		triangle_type(a, b, c)
		triangle_validity(a, b, c)


The following script was fun to make.

This script accepts input in the form of multiple lists. These lists are preprogrammed in this case but the could read user input or read input from a text file by using the read method in python. The while loop uses a counter to track how many times it has been run. The loop is nested with with some conditional elements.


import csv

yield1999 = []
yield2000 = []
yield2001 = []

f = open('C:\Users\jdean32\Downloads\yield_over_the_years.csv')
csv_f = csv.reader(f)
for row in csv_f:

yield1999 = map(float, yield1999)
yield2000 = map(float, yield2000)
yield2001 = map(float, yield2001)


print("1999: %s") %(yield1999)
print("2000: %s") %(yield2000)
print("2001: %s") %(yield2001)

year1999 = 1999
max_value_1999 = max(yield1999)
min_value_1999 = min(yield1999)
avg_value_1999 = sum(yield1999)/len(yield1999)
print("\nIn %s, %s was the maximum yield, %s was the minimum yield and %s was the average yield.") %(year1999, max_value_1999, min_value_1999, avg_value_1999)

year2000 = 2000
max_value_2000 = max(yield2000)
min_value_2000 = min(yield2000)
avg_value_2000 = sum(yield2000)/len(yield2000)
print("In %s, %s was the maximum yield, %s was the minimum yield and %s was the average yield.") %(year2000, max_value_2000, min_value_2000, avg_value_2000)

year2001 = 2001
max_value_2001 = max(yield2001)
min_value_2001 = min(yield2001)
avg_value_2001 = sum(yield2001)/5
print("In %s, %s was the maximum yield, %s was the minimum yield and %s was the average yield.") %(year2001, max_value_2001, min_value_2001, avg_value_2001)


Like I said before, this was fun to make. Always eager to take the road less traveled I thought of the most obtuse way to make this calculation. The objective of the above script was the read text from a file and compare 3 years of agriculture data. The script then finds which year had the minimum yield, maximum yield, average yield. This is all accomplished with a quick for loop, relying on several sets of variables to make sure the final answers are correct. This program can ingest different kinds of input so changing the text file, or the location where it is looked for will produce different results from the same script. Different data can be automatically ran through this particular operation.


yield1999 = [3.34, 21.8, 1.34, 3.75, 4.81]
yield2000 = [4.07, 4.51, 3.9, 3.63, 3.15]
yield2001 = [4.21, 4.29, 4.64, 4.27, 3.55]

location1 = (yield1999[0] + yield2000[0] + yield2001[0])/3
location2 = (yield1999[1] + yield2000[1] + yield2001[1])/3
location3 = (yield1999[2] + yield2000[2] + yield2001[2])/3
location4 = (yield1999[3] + yield2000[3] + yield2001[3])/3
location5 = (yield1999[4] + yield2000[4] + yield2001[4])/3

locations = [location1, location2, location3, location4, location5]
count = 1

text_fileA = open("C:\Temp\OutputA.txt", "w")

for i in locations:
text_fileA.write(("The average yield at location %s between 1999 and 2001 %.2f\n") %(count, i))
count = count + 1


max1999 = (yield1999.index(max(yield1999))+1)
max2000 = (yield2000.index(max(yield2000))+1)
max2001 = (yield2001.index(max(yield2001))+1)
min1999 = (yield1999.index(min(yield1999))+1)
min2000 = (yield2000.index(min(yield2000))+1)
min2001 = (yield2001.index(min(yield2001))+1)

minmax1999 = [1999, max1999, min1999]
minmax2000 = [2000, max2000, min2000]
minmax2001 = [2001, max2001, min2001]

minmax = [minmax1999, minmax2000, minmax2001]

text_fileB = open("C:\Temp\OutputB.txt", "w")

for i in minmax:
text_fileB.write(("In %s we yielded the least at location %s and the most at location %s.\n") %(i[0], i[2], i[1]))



Another attempt at the agriculture problem. Versioning is something I find useful not only for keeping a record of changes but also for keeping track of progress. This was the 4th versioning of this script and I think it turned out very unorthodox, something I find most intersting about coding: you can find multiple approaches to complete an objective. The two scripts above are similar and approached in different ways. This script uses a for loop to run through a conextually sensitive amount of inputs. The values were hardcoded into the program as variables at the start of the script. They could be read from a file if necessary.

The following script looks for the basin layer in a ArcMap file and clips the soils layer using the basin layer. This produces an area where both the soil layer and the basin layer is present. From this clipped soil layer, the script goes on to select the features from a set of attributes that are "Not Prime Farmland". This is useful for property development where the amount of farmland available is a consideration.



import arcpy

print "Starting"

soils = "H:\\Final_task1\\soils.shp"
basin = "H:\\Final_task1\\basin.shp"
basin_Clip = "C:\\Users\\jdean32\\Documents\\ArcGIS\\Default.gdb\\basin_Clip"
task1_result_shp = "H:\\task1_result.shp"

arcpy.Clip_analysis(soils, basin, basin_Clip, "")

arcpy.Select_analysis(basin_Clip, task1_result_shp, "FARMLNDCL = 'Not prime farmland'")

print "Completed"


The next script clips all feature classes from a folder called "USA" according to the Iowa state boundary. It then places them in a new folder. This is useful if you have country-wide data but only want to present the data from a particular area, in this case Iowa.

The script will automatically read all shapefiles in the USA folder, no matter the amount.



import arcpy

sourceUSA = "H:\\Final_task2\\USA"
sourceIowa = "H:\\Final_task2\\Iowa"
iowaBoundary = "H:\\Final_task2\\Iowa\\IowaBoundary.shp"

arcpy.env.workspace = sourceUSA
fcList = arcpy.ListFeatureClasses()

print "Starting"

for features in fcList:
outputfc = sourceIowa + "\\Iowa" + features
arcpy.Clip_analysis(features, iowaBoundary, outputfc)

print "Completed"


The following script finds the average population for a set of counties in a data. By dividing the total population by the number of counties, the average population is found. This is useful for calculating values in large datasets without doing it by hand.



import arcpy

featureClass = "H:\\Final_task3\\Counties.shp"

row1 = arcpy.SearchCursor(featureClass)
row2 =

avg = 0
totalPop = 0
totalRecords = 0

while row2:
totalPop += row2.POP1990
totalRecords += 1
row2 =

avg = totalPop / totalRecords
print "The average population of the " + str(totalRecords) + " counties is: " + str(avg)


The following is a modified script the calculates the driving distance between two locations. Originally the script calculated the distance between UNCC and uptown. It has been edited to calculate user input. The API is finicky so the variables have to be exact to call the right data from the API. There is reconciliation of user input in the form of replacing spaces with underscores.


## Script Title: Printing data from a URL (webpage)
## Author(s): CoDo
## Date: December 2, 2015

# Import the urllib2 and json libraries
import urllib2
import json
import re

originaddress = raw_input("What is the address?\n")
originstate = raw_input("What is the state?\n")
originzip = raw_input("What is the zipcode\n")
destinationaddress = raw_input("What is the destination address?\n")
destinationstate = raw_input("What is the state?\n")
destinationzip = raw_input("What is the destination zipcode\n")

print originaddress
print originstate
print originzip
print destinationaddress
print destinationstate
print destinationzip

originaddress = originaddress.replace (" ", "+")
destinationaddress = destinationaddress.replace (" ", "+"# Google API key (get it at

google_APIkey = ##removed for security

# Read the response url of our request to get directions from UNCC to the Time Warner Cable Arena
url_address = ',%s+%s&destination=%s,%s+%s&key='% (originaddress, originstate, originzip, destinationaddress, destinationstate, destinationzip) + google_APIkey
##url_address = ',NC+28027&destination=9201+University+City+Blvd,NC+28223
url_sourceCode = urllib2.urlopen(url_address).read()

# Convert the url's source code from a string to a json format (i.e. dictionary type)
directions_info = json.loads(url_sourceCode)

# Extract information from the dictionary holding the information about the directions
origin_name = directions_info['routes'][0]['legs'][0]['start_address']
origin_latlng = directions_info['routes'][0]['legs'][0]['start_location'].values()
destination_name = directions_info['routes'][0]['legs'][0]['end_address']
destination_latlng = directions_info['routes'][0]['legs'][0]['end_location'].values()
distance = directions_info['routes'][0]['legs'][0]['distance']['text']
traveltime = directions_info['routes'][0]['legs'][0]['duration']['value'] / 60

# Print a phrase that summarizes the trip
print "Origin: %s %s \nDestination: %s %s \nEstimated travel time: %s minutes" % (origin_name, origin_latlng, destination_name, destination_latlng, traveltime)


This next script looks for feature classes in a workspace and prints the name of each feature class and the geometry type. This would be useful for parsing datasets and looking for specific features, like polygons.


import arcpy
from arcpy import env
env.workspace = "H:/arcpy_ex6/Exercise06"
fclist = arcpy.ListFeatureClasses()
for fc in fclist:
fcdescribe = arcpy.Describe(fc)
print (fcdescribe.basename + " is a " + str.lower(str(fcdescribe.shapeType)) + " feature class")


This following scrip adds a text field to an attribute table for roads. The feature class is called ferry and is populated by either "yes" or "no" values, depending on the value of the feature field.

This is useful for quickly altering data in an attribute field or dataset without directly interfacing with the ArcMap client.


import arcpy
from arcpy import env
env.workspace = "C:/Users/jdean32/Downloads/Ex7/Exercise07"

fclass = "roads.shp"
nfield = "Ferry"
ftype = "TEXT"
fname = arcpy.ValidateFieldName(nfield)
flist = arcpy.ListFields(fclass)

if fname not in flist:
arcpy.AddField_management(fclass, fname, ftype, "", "", 12)
print "Ferry attribute added."

cursor = arcpy.da.UpdateCursor(fclass, ["FEATURE","FERRY"])

for row in cursor:
if row[0] == "Ferry Crossing":
row[1] = "Yes"
row[1] = "No"
del cursor


The following script uses some familiar functionality as the airport before near the beginning of this article. It first creates a 15,000 meter buffer around airport features in a shapefile. In addition to the buffer around airports, the script creates a 7,500 meter buffer around airports that operate seaplanes. This requires looking at the attribute table for seaplane bases. The end result is two separate buffers. A picture says 1,000 words, by having two buffers we are multiplying the amount of data that can be projected by a cartographic visualization.


# -*- coding: utf-8 -*-
# —————————————————————————
# Created on: 2017-03-30 09:06:41.00000
# (generated by ArcGIS/ModelBuilder)
# Description:
# —————————————————————————

# Import arcpy module
import arcpy

# Local variables:
airports = "airports"
airports__4_ = airports
Airport_buffer_shp = "H:\\Ex7\\Exercise07\\Challenge\\Seaplane_base_buffer.shp"

# Process: Select Layer By Attribute
arcpy.SelectLayerByAttribute_management(airports, "NEW_SELECTION", "\"FEATURE\" = 'Seaplane Base'")

# Process: Buffer
arcpy.Buffer_analysis(airports__4_, Airport_buffer_shp, "7500 Meters", "FULL", "ROUND", "ALL", "", "PLANAR")


Finally, we have a script the looks through a geodatabase, reads the features classes, then copies the polygon features to a new geodatabase. Once again, this makes it easy to parse and migrate data between sets.


import arcpy, os
arcpy.env.workspace = r'H:\arcpy_ex6\Exercise06'
fclass = arcpy.ListFeatureClasses()

outputA = r'H:\arcpy_ex6\Exercise06\testA.gdb'
outputB = r'H:\arcpy_ex6\Exercise06\testB.gdb'

for fc in fclass:
fcdesc = arcpy.Describe(fc).shapeType
outputC = os.path.join(outputA, fc)
arcpy.CopyFeatures_management(fc, outputC)
if fcdesc == 'Polygon':
outputC = os.path.join(outputB, fc)
arcpy.CopyFeatures_management(fc, outputC)


Python is a blessing for geographers who want to automate their work. It's logical but strict syntax allows easily legible code. It's integration with the ArcGIS suite and it's fairly simple syntax make it easy to pick up, for experts and beginners alike. Model builder abstracts the programming process and makes it easier for people who are familiar with GUI interfaces. There is little a geographer with a strong knowledge of python and mathematics can't do.