Quintessential gusts

Explorations in Data-viz, among other things

Tag: python

A Book is Born

It Lives!

It Lives!


Around about a year and a half ago, I allowed myself to be talked into writing a book on data visualization with Python and JavaScript for O’Reilly press. To be honest, it wasn’t a hard sell. The thought of actually having one of those animal books with my name on it stoked my vanity and then some.

Plus, I honestly believe that we’re on the cusp of a new age of data visualization, with an increasingly powerful and well adapted JavaScript providing the necessary horsepower for a new generation of engaging, interactive dataviz, delivered at pretty much zero cost to anyone with a computer, tablet, smartphone etc..

But while JS, primarily in the shape of the powerhouse D3 visualization library, is delivering the visual goods, it lacks first class (or often any class) tools for gathering, manipulating, mining and analyzing your data, prior to crafting a visualization to show off any gems you discover. That’s where Python comes in. The Python data processing ecosystem has grown at a staggering speed these past few years and whether it’s scraping data from the web or applying cutting edge Machine Learning tools, Python pretty much leads the pack, particularly in terms of ease of use.

Well, thanks to O’Reilly and Fedex, I just got to hold the dead-tree version in my hand, which represents some kind of closure. Writing a book turned out to be just as exhausting as all those blog accounts on the web will tell you, and then some. And whatever the merits of the finished book, it can’t be accused of a lack of ambition.

So the aim of the book is to get you in the Python + JavaScript dataviz ballpark. These two languages represent the most powerful dataviz stack available, in my humble opinion. I wanted to set a data transformation task to show off the key Python + JS dataviz tools, which together form a toolchain, taking you from raw, unrefined data to those beautiful data visualizations you might have caught in the New York Times et. al.

The task I chose, which lends a backbone to the book, is turning a fairly unengaging  Wikipedia list of Nobel prize winners into a modern, interactive dataviz, which uses the same dataset but allows users to drill into it interactively, finding their own nuggets of interest. You can see the resulting visualization on this website, built using D3, the 300lb gorilla of JavaScript data visualization. But before being able to use D3, we need to scrape the data from the web, using Python’s industrial standard Scrapy library, clean and explore it using Pandas, the king on the hill of Python data analysis, and finally deliver the data, using Flask, Python’s pocket-rocket web-server, to build a modern, RESTful API in a few lines of Python. There’s a supporting cast of smaller libraries, like Python’s web-scraping BeautifulSoup, SQLAlchemy (the best Python SQL Object Relational Mapper) and JavaScript’s crossfilter, for lightening fast filtering of large (by browser standards) datasets.

2016-08-01 15.49.13

In glorious technicolor!

I think for most of us in the programming community, O’Reilly represents something of  a seal of approval and has set a high mark for publishing standards, historically,  something akin to our Oxford University Press. My experiences working on the book have been wholly positive and I’ve been blown away by the quality of the people I’ve worked with there, particularly my editors (huge shout out to Meg, Dawn and Kristen!!). I’ve learned so much from that collective team – stuff that, with the usual irony, would have made writing the book so much easier if I’d known it in the first place.

I’ll be writing more in the coming weeks about the book, now there’s something tangible to account for, but in the meantime, if you want to check it out on O’Reilly, you a can find it here at their store or on Amazon.

Pydata London 2015

On June 20th I gave a talk at Pydata London, titled ‘Data-visualisation with Python and Javascript: crafting a dataviz toolchain for the web’. The talk used my upcoming book as a starting point, describing the proposed tool-chain, taking raw, scraped data, in the form of Wikipedia’s main Nobel-prize page, and transforming it into a rather more engaging and insightful web-based data-visualisation.   Here I am proudly showing off my cover and my revolutionary and totally made up dataviz axis of exploration-presentation. That middle blob is where all the cool action is going to be taking place, on an internet near you.cover

I gave a talk at last year’s Pydata London, which was one of the key-factors in starting a dialog with O’Reilly and the eventual book-building odyssey. So I owe the Pydata team more than a few shout-outs. No matter how stressful this authorship thing is, it’s an interesting ride. And having to explain things to other people is a great way to up your game. Anyway, just like last year’s conference, the atmosphere was really friendly and unintimidating, and the Bloomberg offices a very impressive and highly functional setting. As an ex-academic research-scientist I’ve done a lot of conferences, and many were rather dry and sober affairs. The Pydata-London guys and gals work hard to keep things upbeat and inclusive – Pydata is a charity and that is very obvious in the decidedly un-corporate feel. So I was genuinely privileged to be there. The talks were of a very high quality too (how mine got through I’ll never know 😉 – with three parallel streams, I had to miss a couple that would normally be bankers. Everything was videoed so I hope to catch them soon on the Pydata channel.

All in all a great day. I had some very insightful feedback on the ground and some tweets that will feed my monstrous ego for a while. It’s very encouraging when people ‘get it’ and to feel that you are actually scratching a genuine itch. My emphasis was very much that playing nicely with Javascript is not an option for any data-visualiser these days, in fact I’d say any programmer involved in data-science etc. should have some basic JS-fu. ‘Javascript is the new English’ will be the title of an upcoming talk 😉 I also think, judging from the conversations I had, that I got one of the key messages across, namely that the threshold to entry for creating Javascripted viz (D3 being the main library here) is very low. The perception of web-dev cruft between the programmer and his programming is my main target:


The talk built up to the Nobel visualisation I’ll be using to teach D3 – and a big shout-out for Maria Goeppert-Mayer, the only woman other than Marie Curie to have won the Nobel prize for Physics.

The slides for my talk are available here

and the visualisation on slide 42 here

Select category:Physics and gender:female to see a woman who should be far more widely known.

I saw some great talks, but particularly enjoyed Russel Winder and Ian Ozsvald’s friendly banter during Russel’s talk ‘Making Computations Execute Very Quickly’ (http://london.pydata.org/schedule/presentation/48/). The talk was more of an interactive coding session but I give big props to anyone prepared to risk that. Mr Winder knows no fear. Back in the day I worked in Evolutionary Robotics, evolving Artificial Neural Networks as robot-controllers. That takes a lot of processing metal and so the talk brought a lot of fond memories back, of trying to push just a few more bits out of a recalcitrant CPU. Russel made a few good points about not relying on the Numpy cut-and-paste to save you. But, as Ian pointed out (and Donald Knuth would agree), and as is stressed in his Performance Python book – first profile the life out of those bottlenecks. And as Donald Knuth would also point out, the choice of algorithm is a far greater factor than anything else. So grok your basic big-O notation.

Once again, thanks to the Pydata-London team for another great conference. And the growth of the regular Pydata London talks has been astonishing – this is a huge area right now, with some really exciting developments and it’s a privilege to be associated.

Oh, and if you fancy getting updates about the book just sign up to the mailing list:



Data Visualization with Python and Javascript


Recently, I was approached by O’Reilly books for a little advice on the lay of the land in Python, Javascript and Web-visualization. Well, one thing lead to another and I ended up agreeing to write a book for O’Reilly on that very subject. There’s a lot of work and the money, as you’ll read all over the web, isn’t stellar (unless you make one of those tech-book hits) but I’ve a lot of O’Reilly books on my shelf and consider them, even in this digital day and age, one of the best crud-filters around. In short, in my head and in my hand, O’Reilly books have weight and I’m more than a little flattered to have been asked.

The festive season probably wasn’t the best time to begin such an adventure but that’s what the clock dictated so, after some frenetic coming-to-terms with new software (O’Reilly’s Atlas publishing platform), workflows, formats and the like, I managed to get the first two draft chapters in last week. I get a short <phew> and it’s on to the next eight I’ve got to write by April. O’Reillys are, understandably, hard taskmasters and I’m suddenly faced with all these writing deadlines and flashbacks to the last year of my PhD.

I aim to be blogging about the book chapters as they take shape and I’ll save a full run down of the books aims for another post but for now I just want to mention a mailing-list I’m starting to try and keep any interested parties up to speed and, more importantly, try and get some feedback as to people’s needs in this area. I’ve a pretty good idea I think but would love some input as to anything I’ve missed, should emphasize, or maybe just leave out all together.

If you want to receive updates on the book or get the opportunity to influence its course please sign-up below.

Subscribe for info on book ‘Dataviz with Python and Javascipt’

* indicates required