Mapping YouTube Views

Mapping Youtube Views

YouTube has been an entertainment phenomenon ever since it arrived on the internet in 2006. Its reach is staggering, bringing videos to every corner of the Earth. In every country of the world the word YouTube is synonymous with online entertainment. I’ve always been fascinated by the maps YouTube provided in the “statistics” section of the videos. Every country in the world would be represented on the most popular videos. It’s a shame YouTube has removed these statistics from public. Now it’s only possible to see these stats if the uploader makes them available.

youtube anayltics

Youtube has a great analytics platform for content creators. It has an interactive map built into the creator studio which is great for geographic analysis. There are ways to export this data using the API tools YouTube provides. I thought it would be fun to take this data a creator a couple maps of my own. Instead of using the API I acquired the data the old fashion way: copy and pasting.

I decided to make a map of every country except the United States. Since 95% of my views come from the United States, some methods of separating the data would make other countries almost indistinguishable on a map.

After copy and pasting the lifetime statistics from the interactive map portion of the YouTube analytics page, I added them to an excel spreadsheet and created a .csv document to add to ArcMap. There was limited parsing to be done. All the data was already organized. I removed the fields I wasn’t going to be using like watch time, average view duration, and average percentage viewed. In the future it might be interesting to map these variables but today I’m just going to focus on raw view numbers.

I’m using the template that I used for my WordPress map. It uses a basemap and a borders shapefile from thematicmapping. This easily allows me to join the csv to the shapefile table and we’re quickly off to the cartographic races.

Compared to the WordPress site, my YouTube channel has a much more impressive geographic reach. Out of the 196 countries on Earth, 134 of them have clicked on a video I host on my channel. This is great because it means I’m over halfway to completing my collection of all countries.

The map includes all of the countries except the United States with over 11,000 views. I decided to use 10 natural breaks in the colors to add more variation to the map. Experts say that the average human eye can’t differentiate more than 7 colors on a map. In this case it is purely a design choice.

YoutubeViews_sansUSA

It looks like I have to carry some business cards with me next time I go to Africa. It’s nice to see such a global reach. It feels good to know that, even for a brief second, my videos were part of someone’s life in greater global community.

Mapping WordPress Views

It’s been a year since I started writing this blog. Time, as always, seems to fly by. Blogging here has allowed me to development my writing, communication, and research skills. I thought I’d do something WordPress related to celebrate a year of success and hopefully many more to come. I thought of a quick and easy project to map the geographic locations of visitors to this blog over the last year. It’s always interesting to see what countries people or visiting from and I’m always surprised at the variety.

Data acquisition is simple for this project. WordPress make statistic available so it’s not difficult to acquire the statistics or parse the data since the provided data is pretty solid. The one thing that needs to be done is combining the 2016 and 2017 data into one set since WordPress automatically categorizes visitation statistics by year. Since this blog has only been active for 2016 and 2017, there are only two datasets to combine. This is easily done using a spreadsheet and by having the WordPress statistics available.

The data suggests growth, with 2017 already overtaking the entirety of 2016 in terms of views. It’s also interesting that 2017 is more geographically diverse, consisting of 49 unique countries compared to 31 in 2016. I decided it would be appropriate to create 3 maps, one for 2016, one for 2017, and one combing the two. This would allow one to interpret the differences between the years and see the geographic implications as a whole.

I began by exporting the data into a CSV file to be read by Arcmap. I decided on the blank world map boundaries from thematicmapping.org for a basemap. The previously prepared CSV was then attached to the basemap via the “name” entry which reconciles both data tables with the name of each country. Once the data is on the map it’s over to the quantified symbology to adjust the color scheme and create a choropleth map. I choose to break the data 7 ways and to remove the borders from the country to give it a more natural, pastel look.

In layout view the design touches are added. A title was placed at the top and the map was signed. The legend was added and I used one of the tricks I’ve found useful to format it. First I add the legend with all the default settings and get the positioning correct. After it’s in position I double check that the data components are correct. Then “covert to graphics” is selected to turn the legend into an editable collection of graphic elements. The only downside to this is that it no longer reflects changes in the data so making sure the data is correct before converting is critical. After it’s been converted, selecting “ungroup” will separate each of the graphical elements, allowing the designer to manipulate each individually. I find that this is a personally easier and more intuitive to work with. After editing, the elements can be regrouped and other elements like frames and drop shadows can be added.

Wordpress2016

Full Resolution

Making the 2017 map followed to same methodology.

Wordpress2017

Full Resolution

Combining the two datasets was the only methodological variation when making the final map.

WordpressAll

Full Resolution

At a glance, the trends seem typical. North America is represented in the data as is Europe. There is an unexpected representation in Asia which might be due to the several articles that have been written about China. It’s also neat seeing visitors from South America. The rarest country is New Caledonia, a French Territory in the Pacific about 1000 miles of the coast of eastern Australia.

In the future it would be interesting to create a map that normalizes the number of visitors according to the population of the countries. This would create a map that shows which countries visit at a higher or lower rate per capita. This would illustrate which countries are more drawn to the content on the site.

Here’s to hoping for more geographical variation in the future. Maybe one day all countries will have visited Thoughtworks.

Word Clouds as Data

One of the most fun ways to visualize data is the use of word clouds. A word cloud takes a source of data, whether it’s a word document, webpage, transcript, book, or any other medium that uses words, and presents it in an easily digestable visual manner. We can make a word cloud for Thoughtworks to see the most commonly used words.

download-1
Word cloud of all the posts on Thoughtworks

Using wordclouds.com I created the above word cloud of thoughtworks. We can see, at a glance, that the word “data” is the most used word on the blog, considering its size. The larger words are the words that appear with the highest frequency.

You might be asking “what does this do for me?”. By looking at this picture we can see that this blog talks about data, maps, and several  other common words in the marbleracing entry. The key is “at a glance”. Data visualization makes complicated data that may take in-depth parsing to find the interesting and relevant details easily accesible by presenting it in a way that emphasizes frequencies of, in the case of the word clouds, words. So, by glancing at the word cloud of Thoughtworks, we can quickly interpret what kind of data (words) it might contain.

We can take this further and apply word clouds to chat rooms, message boards, and other social mediums to quickly and visually represent the gist of the conversation. We can use it to visualize articles and glean the main points or subject matter and we could use it to easily create accurate categorical tags for content.

For an example, while watching the youtube playlist for Crash Course: Philosophy I was curious what the main ideas were in the course. This might help me understand, in a general sense, what the playlist is all about.

Since Crash Course is a video series, we’ll have to transcribe it into words to be able to represent the data as a word cloud. Luckily, all the episodes are already transcribed on the Nerdfighteria Wiki. This is another situation where the manual transcription of data would take hours or even days but luckily for us the internet and the curious, hard-working denizens that occupy it have made this data available.

I went ahead and included all of the transcripts in a single word document that can be easily interpreted by the wordclouds.com.

crash-course-philosophy-transcript

Running this document through the word cloud generator produces the following result:

wordcloud-1
Unedited word cloud of Crash Course: Philosophy

Immediately, this word cloud might not provide us any viable data visualization. It includes words that are in the regular conversational canon and the philosophical terms I’m looking for are obscured under their sheer volume. We can edit the word list to create a more relevant word cloud. This world cloud parsing, however, can dimish the visualization  or enhance it depending on what you’re trying to achieve. So let’s careful and thoughtfully eliminate words that don’t contribute to the visualization, particularly conversational words.

Editing the world cloud presents a subjective interpretation of the data. It might be beneficial to arrange a criteria to choose what words are relevant and which are not to create a more objective visualization. I chose to remove conversational words and keep concepts that could be directly tied to philosophy. The frequency of the words is also important to consider as it determines the size of the word.

wordcloud-2
Word cloud of Crash Course: Philosophy after editing the word list

We can see from the visualization that God is the most common word in the series. In fact, the first 10 or so episodes cover religion and the role theology plays in philosophy. This gives a general gist about the content of this series. Since the disparity between frequency is so high, a lot of words were not included to emphasize the difference in frequency. If we manully adjust the word list to reduce the outlying frequencies we will be able to include more words. As always, this data manipulation adds another layer of subjectivity. The visualization moves more into the realm of artistic representation. I decided to classify the count in order to lower the emphasis on the higher frequencies

wordcloud-3
Word cloud of Crash Course: Philosophy after editing the world list and count

 

.

 

While the size descripancy has been addressed, 150 words are still being excluded. I decided to switch to another website to see if I could get better results.

Tagul.com seemed to have some satisfactory functionality. Unfortunately, it doesn’t allow high resolution downloads without payment. I went ahead and manually parsed the data so it would be readable by Tagul and ran the visiualization. I’m pleased with the results.

Word Cloud (1).png
Word cloud produced with Tagul.com with the world list and counts adjusted

Higher resolution here.

I feel like this cloud provides a good mix between size and number of words.

Word clouds are interesting visualizations to provide a unique take on textual data. Their unique aethetics provide an artistic expression for the usually gloomy aspects of data science. Quality data is worthless without proper presentation and the use of graphic arts bridges that gap.

Amazon Order History Reports

Data is powerful professional asset but it’s easy to gloss over its applications outside work or school. Data in the home can be just as useful for domestic decision making and personal auditing.

Amazon is quickly becoming the Walmart of the 2010s and it’s no surprise. Online shopping makes buying and selling easier than it’s ever been. The online interface makes advanced data collection possible in ways that aren’t possible in a traditional retail experience.

ksr-3-amazon-graph-1

Amazon makes some of it’s data available to its users in the form of Order History Reports. Armed with this data customers have greater insight into their purchases. This is far easier than the manual bookkeeping that would have been associated with a traditional shopping experiences. I went ahead and downloaded my entire amazon history from 2012 to 2016 and graphed it in several different ways.

I started by opening up the provided CSV file in Excel and parsed the data for things I thought were relevant. About 12 metrics seemed interesting so I decided to concentrate on those. I used plot.ly to make the graphs and Excel to curate and parse the data.

amazon-purchases
Purchase data split by month
amazon-purchases-yearly
The same purchase data distributed yearly.

I definitely switched to amazon for a lot of things I might have gotten locally or through other online stores. The introduction of Prime free shipping had a big impact on my and millions of others’ shopping preferences. Amazon says on their innovations page: “customers would quickly grasp that they were being offered the best deal in the history of shopping”. The truth of this statement became even more apparent when I realize the impact Amazon was having on the way I shop.

Next I loaded up the order history data in Excel and used the =COUNTIF(range, criteria) command in excel to parse the data and find the count of the different conditions of the items ordered. I used Meta-Chart to create pie charts for presenting the data.

meta-chart
The condition of the items purchased from Amazon
meta-chart-1
The destination for the purchases made on Amazon

I thought tax data might be interesting. Online transactions tax you under a variety of conditions. Some products aren’t taxed at all and some are taxed according to specific state and federal legislature.

tax-rate-per-year
The annual percent of taxes paid on purchases

Amazon didn’t collect sales tax from North Carolina residents at all until 2014. North Carolina sales tax is 4.5% statewide and higher in certain municipalities. In Charlotte the sales tax is 7.25%. Shopping through Amazon mitigates this tax and, in some circustances, a smart shopper with a lot of time on their hands might avoid paying taxes entirely. This is another victory for ecommerce over the traditional shopping experiences. This tax differential might be closed as ecommerce and online shopping gain larger shares of the retain market and legislaters attempt to recoup tax revenue losses.

Finally, just for fun I decided to create a graph of the different catagories of items purchased.

meta-chart-2
Categorized purchases

Online retailers will have better ways of analyzing and presenting this data and whole marketing departments dedicated to managing data and its use. Imagine the capabilities of having not only your own data but the data of millions of other customers. These databases are some of the most powerful tools in the modern world and they are constantly changing how we live our lives, both online and offline.