It Lives!

It Lives!

 

Around about a year and a half ago, I allowed myself to be talked into writing a book on data visualization with Python and JavaScript for O’Reilly press. To be honest, it wasn’t a hard sell. The thought of actually having one of those animal books with my name on it stoked my vanity and then some.

Plus, I honestly believe that we’re on the cusp of a new age of data visualization, with an increasingly powerful and well adapted JavaScript providing the necessary horsepower for a new generation of engaging, interactive dataviz, delivered at pretty much zero cost to anyone with a computer, tablet, smartphone etc..

But while JS, primarily in the shape of the powerhouse D3 visualization library, is delivering the visual goods, it lacks first class (or often any class) tools for gathering, manipulating, mining and analyzing your data, prior to crafting a visualization to show off any gems you discover. That’s where Python comes in. The Python data processing ecosystem has grown at a staggering speed these past few years and whether it’s scraping data from the web or applying cutting edge Machine Learning tools, Python pretty much leads the pack, particularly in terms of ease of use.

Well, thanks to O’Reilly and Fedex, I just got to hold the dead-tree version in my hand, which represents some kind of closure. Writing a book turned out to be just as exhausting as all those blog accounts on the web will tell you, and then some. And whatever the merits of the finished book, it can’t be accused of a lack of ambition.

So the aim of the book is to get you in the Python + JavaScript dataviz ballpark. These two languages represent the most powerful dataviz stack available, in my humble opinion. I wanted to set a data transformation task to show off the key Python + JS dataviz tools, which together form a toolchain, taking you from raw, unrefined data to those beautiful data visualizations you might have caught in the New York Times et. al.

The task I chose, which lends a backbone to the book, is turning a fairly unengaging  Wikipedia list of Nobel prize winners into a modern, interactive dataviz, which uses the same dataset but allows users to drill into it interactively, finding their own nuggets of interest. You can see the resulting visualization on this website, built using D3, the 300lb gorilla of JavaScript data visualization. But before being able to use D3, we need to scrape the data from the web, using Python’s industrial standard Scrapy library, clean and explore it using Pandas, the king on the hill of Python data analysis, and finally deliver the data, using Flask, Python’s pocket-rocket web-server, to build a modern, RESTful API in a few lines of Python. There’s a supporting cast of smaller libraries, like Python’s web-scraping BeautifulSoup, SQLAlchemy (the best Python SQL Object Relational Mapper) and JavaScript’s crossfilter, for lightening fast filtering of large (by browser standards) datasets.

2016-08-01 15.49.13

In glorious technicolor!

I think for most of us in the programming community, O’Reilly represents something of  a seal of approval and has set a high mark for publishing standards, historically,  something akin to our Oxford University Press. My experiences working on the book have been wholly positive and I’ve been blown away by the quality of the people I’ve worked with there, particularly my editors (huge shout out to Meg, Dawn and Kristen!!). I’ve learned so much from that collective team – stuff that, with the usual irony, would have made writing the book so much easier if I’d known it in the first place.

I’ll be writing more in the coming weeks about the book, now there’s something tangible to account for, but in the meantime, if you want to check it out on O’Reilly, you a can find it here at their store or on Amazon.