Data visualization has brought about a sweeping change to the way analysts work with data. And with the data growing exponentially every second, data viz is going to continue to address the pressing need to be able to explore the data creatively, dig for deeper insights and address goals in an engaging manner. It’s a no-brainer that people prefer easy-to-digest visualizations of copious amounts of complex data than to pore over spreadsheets or reports to draw informed conclusions.
Let’s work with this year’s Summer Olympics data as an example. Suppose one were curious about the country-wise rankings at Rio and is presented with the spreadsheet like the one below to make sense of things. Chances are that even after having spent considerable time ploughing through it, results may be inaccurate coupled with other side effects, including but not limited to, an immediate loss of interest, no subsequent data retention and annoyance,even.That’s a serious nay in my books.
On the other hand, meaningful data visualizations promote the easy understanding of data, better data retention and the instant access to trends,anomalies,correlations and patterns. Take for example the visualization in Figure 1A – The data from the spreadsheet shown above has been cleaned, analyzed and visualized so that the results can make sense almost immediately.
(Side note : It’s a real bummer that WordPress.com does not allow embedding from Shiny or Plotly servers. Please click the link to experience the interactivity. Also, plots may take slightly longer to load for slower internet connections so please make allowance for that.)
In the quest for visualizing and interacting with diverse and more often than not, complex data sets, one is supported by a plethora of data plotting techniques – right from the simple data graphics to the more sophisticated and unusual ones.The type of visualizations selected to represent the data and whether or not interactivity and aesthetics are included also has an enormous impact on whether or not the analysis is being communicated accurately and meaningfully. To understand this, let’s look at a static chloropleth vs an interactive chloropleth.
The choropleth map uses a coloring scheme inside defined areas on a map in order to show value levels and indicate the average values of some property in those areas.Figure 2A uses a color encoding to communicate the country-wise medal tally from this year’s Rio Olympics.The biggest benefit of the map is the big picture perspective.By using color density, it quickly illustrates which countries excelled at the games and the ones that did not fare as well. However, the viewer can not gain detailed information, such as in this case, the number of medals per country.
This can be addressed somewhat by the addition of a certain level of interactivity as shown in Figure 2B below.
Now, if we represent the same information using a bar plot, which works with discrete data, the key values are available at a glance, providing a deeper level of detail to the user (And as a bonus, there is no need to know where Côte d’Ivoire (Ivory Coast) is on the world map 😀 )
It is also interesting to note that effective manipulations and re-ordering of graph categories can emphasize certain effects. For e.g. the two bar-plots below use and represent the same data, but which one would you use to find the country that stood 10th overall?
Personally, I find that besides the ‘clean-the-data’ component of analytics, it’s the data visualization that keeps me coming back to perform independent data analysis. With SO many supporting languages and tools out there, it’s a wicked learning curve, but the possibility of revealing underlying patterns and stories is just too exciting and is what keeps me motivated and engaged.Or, if that doesn’t work,I just get really stoked by the prospect of how cool I am going to make the data look!
As per a Harvard Business Review article that I read recently, more data crosses the internet every second than were stored in the entire internet just 20 years ago.For instance, it is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions. A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text. An exabyte is 1,000 times that amount, or one billion gigabytes. Would you rather read one billion gigabytes or see one gigabytes? Try wrapping your brain around that kind of data.Or let data viz help you with that. It’s no happy coincidence that data viz with it’s ‘why read fast when you can visualize faster’ philosophy is increasingly being embraced by companies across all sectors. It just works. Onwards, then!
I am happy to hear feedback and suggestions, so please feel free to leave a comment!
Data source: www.rio2016.com
All visualizations coded using R and hosted on Shiny & Plotly servers.