Quintessential gusts

Explorations in Data-viz, among other things

Author: kyran dale

A Book is Born

It Lives!

It Lives!

 

Around about a year and a half ago, I allowed myself to be talked into writing a book on data visualization with Python and JavaScript for O’Reilly press. To be honest, it wasn’t a hard sell. The thought of actually having one of those animal books with my name on it stoked my vanity and then some.

Plus, I honestly believe that we’re on the cusp of a new age of data visualization, with an increasingly powerful and well adapted JavaScript providing the necessary horsepower for a new generation of engaging, interactive dataviz, delivered at pretty much zero cost to anyone with a computer, tablet, smartphone etc..

But while JS, primarily in the shape of the powerhouse D3 visualization library, is delivering the visual goods, it lacks first class (or often any class) tools for gathering, manipulating, mining and analyzing your data, prior to crafting a visualization to show off any gems you discover. That’s where Python comes in. The Python data processing ecosystem has grown at a staggering speed these past few years and whether it’s scraping data from the web or applying cutting edge Machine Learning tools, Python pretty much leads the pack, particularly in terms of ease of use.

Well, thanks to O’Reilly and Fedex, I just got to hold the dead-tree version in my hand, which represents some kind of closure. Writing a book turned out to be just as exhausting as all those blog accounts on the web will tell you, and then some. And whatever the merits of the finished book, it can’t be accused of a lack of ambition.

So the aim of the book is to get you in the Python + JavaScript dataviz ballpark. These two languages represent the most powerful dataviz stack available, in my humble opinion. I wanted to set a data transformation task to show off the key Python + JS dataviz tools, which together form a toolchain, taking you from raw, unrefined data to those beautiful data visualizations you might have caught in the New York Times et. al.

The task I chose, which lends a backbone to the book, is turning a fairly unengaging  Wikipedia list of Nobel prize winners into a modern, interactive dataviz, which uses the same dataset but allows users to drill into it interactively, finding their own nuggets of interest. You can see the resulting visualization on this website, built using D3, the 300lb gorilla of JavaScript data visualization. But before being able to use D3, we need to scrape the data from the web, using Python’s industrial standard Scrapy library, clean and explore it using Pandas, the king on the hill of Python data analysis, and finally deliver the data, using Flask, Python’s pocket-rocket web-server, to build a modern, RESTful API in a few lines of Python. There’s a supporting cast of smaller libraries, like Python’s web-scraping BeautifulSoup, SQLAlchemy (the best Python SQL Object Relational Mapper) and JavaScript’s crossfilter, for lightening fast filtering of large (by browser standards) datasets.

2016-08-01 15.49.13

In glorious technicolor!

I think for most of us in the programming community, O’Reilly represents something of  a seal of approval and has set a high mark for publishing standards, historically,  something akin to our Oxford University Press. My experiences working on the book have been wholly positive and I’ve been blown away by the quality of the people I’ve worked with there, particularly my editors (huge shout out to Meg, Dawn and Kristen!!). I’ve learned so much from that collective team – stuff that, with the usual irony, would have made writing the book so much easier if I’d known it in the first place.

I’ll be writing more in the coming weeks about the book, now there’s something tangible to account for, but in the meantime, if you want to check it out on O’Reilly, you a can find it here at their store or on Amazon.

Pydata London 2015

On June 20th I gave a talk at Pydata London, titled ‘Data-visualisation with Python and Javascript: crafting a dataviz toolchain for the web’. The talk used my upcoming book as a starting point, describing the proposed tool-chain, taking raw, scraped data, in the form of Wikipedia’s main Nobel-prize page, and transforming it into a rather more engaging and insightful web-based data-visualisation.   Here I am proudly showing off my cover and my revolutionary and totally made up dataviz axis of exploration-presentation. That middle blob is where all the cool action is going to be taking place, on an internet near you.cover

I gave a talk at last year’s Pydata London, which was one of the key-factors in starting a dialog with O’Reilly and the eventual book-building odyssey. So I owe the Pydata team more than a few shout-outs. No matter how stressful this authorship thing is, it’s an interesting ride. And having to explain things to other people is a great way to up your game. Anyway, just like last year’s conference, the atmosphere was really friendly and unintimidating, and the Bloomberg offices a very impressive and highly functional setting. As an ex-academic research-scientist I’ve done a lot of conferences, and many were rather dry and sober affairs. The Pydata-London guys and gals work hard to keep things upbeat and inclusive – Pydata is a charity and that is very obvious in the decidedly un-corporate feel. So I was genuinely privileged to be there. The talks were of a very high quality too (how mine got through I’ll never know 😉 – with three parallel streams, I had to miss a couple that would normally be bankers. Everything was videoed so I hope to catch them soon on the Pydata channel.

All in all a great day. I had some very insightful feedback on the ground and some tweets that will feed my monstrous ego for a while. It’s very encouraging when people ‘get it’ and to feel that you are actually scratching a genuine itch. My emphasis was very much that playing nicely with Javascript is not an option for any data-visualiser these days, in fact I’d say any programmer involved in data-science etc. should have some basic JS-fu. ‘Javascript is the new English’ will be the title of an upcoming talk 😉 I also think, judging from the conversations I had, that I got one of the key messages across, namely that the threshold to entry for creating Javascripted viz (D3 being the main library here) is very low. The perception of web-dev cruft between the programmer and his programming is my main target:

bridging

The talk built up to the Nobel visualisation I’ll be using to teach D3 – and a big shout-out for Maria Goeppert-Mayer, the only woman other than Marie Curie to have won the Nobel prize for Physics.

The slides for my talk are available here

and the visualisation on slide 42 here

Select category:Physics and gender:female to see a woman who should be far more widely known.

I saw some great talks, but particularly enjoyed Russel Winder and Ian Ozsvald’s friendly banter during Russel’s talk ‘Making Computations Execute Very Quickly’ (http://london.pydata.org/schedule/presentation/48/). The talk was more of an interactive coding session but I give big props to anyone prepared to risk that. Mr Winder knows no fear. Back in the day I worked in Evolutionary Robotics, evolving Artificial Neural Networks as robot-controllers. That takes a lot of processing metal and so the talk brought a lot of fond memories back, of trying to push just a few more bits out of a recalcitrant CPU. Russel made a few good points about not relying on the Numpy cut-and-paste to save you. But, as Ian pointed out (and Donald Knuth would agree), and as is stressed in his Performance Python book – first profile the life out of those bottlenecks. And as Donald Knuth would also point out, the choice of algorithm is a far greater factor than anything else. So grok your basic big-O notation.

Once again, thanks to the Pydata-London team for another great conference. And the growth of the regular Pydata London talks has been astonishing – this is a huge area right now, with some really exciting developments and it’s a privilege to be associated.

Oh, and if you fancy getting updates about the book just sign up to the mailing list:

http://kyrandale.com/blog/data-visualization-python-javascript/

nobelviz

Data Visualization with Python and Javascript

pyjs_dataviz_cover_correct

Recently, I was approached by O’Reilly books for a little advice on the lay of the land in Python, Javascript and Web-visualization. Well, one thing lead to another and I ended up agreeing to write a book for O’Reilly on that very subject. There’s a lot of work and the money, as you’ll read all over the web, isn’t stellar (unless you make one of those tech-book hits) but I’ve a lot of O’Reilly books on my shelf and consider them, even in this digital day and age, one of the best crud-filters around. In short, in my head and in my hand, O’Reilly books have weight and I’m more than a little flattered to have been asked.

The festive season probably wasn’t the best time to begin such an adventure but that’s what the clock dictated so, after some frenetic coming-to-terms with new software (O’Reilly’s Atlas publishing platform), workflows, formats and the like, I managed to get the first two draft chapters in last week. I get a short <phew> and it’s on to the next eight I’ve got to write by April. O’Reillys are, understandably, hard taskmasters and I’m suddenly faced with all these writing deadlines and flashbacks to the last year of my PhD.

I aim to be blogging about the book chapters as they take shape and I’ll save a full run down of the books aims for another post but for now I just want to mention a mailing-list I’m starting to try and keep any interested parties up to speed and, more importantly, try and get some feedback as to people’s needs in this area. I’ve a pretty good idea I think but would love some input as to anything I’ve missed, should emphasize, or maybe just leave out all together.

If you want to receive updates on the book or get the opportunity to influence its course please sign-up below.

Subscribe for info on book ‘Dataviz with Python and Javascipt’

* indicates required





 

 

Building a D3 Plugin

Introduction

Some awkward, common visualization constructs lend themselves to a plugin form, a reusable DOM-stamper that can be applied to an ‘svg’ or ‘div’ tag, with a few parameters and maybe some data. For example take a colorbar or legend-box, often unglamorous visual indicators which are tedious to repeat from scratch. I’ve been trying to make writing such things easier, limiting the initial boilerplate and focussing on the D3 build stage. What follows is my current state of play, with inspiration from Mike Bostock’s article on reusable charts  among others. Very much a work in progress.

The Reusable Core

There are things that pretty much every plugin is going to need: some getter and setter methods for its parameters, access to its container etc.. and, more often than not, an ‘svg’ context:

    // returns chainable getters+setters
    var makeAPIMethod = function(chart, params, method) {
        return function(_){
            if(!arguments.length){
                return params[method];
            }
            params[method] = _;
            return chart;
        };
    };

    kcharts.BasicPlugin = function(params) {
        // the plugin has some default params
        this.params = {
            height:100, width:100, padding: 20, type: 'plugin', data:{}
        };
        // these are extended  
        for(var p in params){
            this.params[p] = params[p];
        }
        // we are going to return a plugin object ready to process a D3 selection 
        var plugin = function(selection) {
            selection.each(function(data) {
                // a few default measurements
                var el, g, height = plugin.height(), width = plugin.width(), padding = plugin.padding();
                // store the data for convenience
                plugin.data(data);
                plugin.container = d3.select(this);
                // we assume the selection is or contains an 'svg' tag
                el = plugin.container.select('svg');
                el = el.node()? el: plugin.container;
                // we'll use an svg group as our plugins container
                plugin.svg = el.selectAll('g.' + plugin.type()).data([data]);
                plugin.gEnter = plugin.svg.enter()
                    .append('g').attr('class', plugin.type());
                // now build the plugin
                plugin.build();
            });
        };
        // create plugin's getter+setter methods based on specfied parameters 
        for(var method in this.params){
            plugin[method] = makeAPIMethod(plugin, this.params, method);
        }
        // placeholder
        plugin.build = function() {};
         
        return plugin;
    };

The core-plugin just uses a dictionary of parameters to produce chainable getter and setter methods, and hooks some convenient variables onto the main object. What these should be will become apparent as one produces more reusable objects. The chief aim is to cut down on boilerplate and make rolling new plugins as intuitive as possible.

Using the core-plugin

To create the colorbar plugin we will specialize a closed BasicPlugin:

  • create a new BasicPlugin (cbar)
  • passing in parameters to for required chainable getter + setter methods (e.g. colorbar height, x/y placement, number-of-bars etc..)
  •  set sensible defaults based on these parameters for use in building (e.g. a standard colorscale for demo purposes)
  • Define a ‘build’ method which will be called by the plugin after the D3 selection

Here we create the ‘cbar’ object with special parameters:

    var COLOR_BARS = 50;
    kcharts.ColorBar = function(){
        // we create a new 'cbar' plugin
        var cbar = new kcharts.BasicPlugin({
            // colorbar parameters for chainable getters+setters
            type: 'colorbar',
            title: '',
            horizontal: false,
            width: 10,
            height: false,
            padding:30,
            barHeight:2,
            x: 0,
            y: 0,
            numBars: COLOR_BARS,
            domain: [0, COLOR_BARS/2, COLOR_BARS],
            crange: [kcharts.COLS.blue, kcharts.COLS.yellow, kcharts.COLS.red],
            range: [0, 100],
            colorScale: d3.scale.linear().interpolate(d3.interpolateHcl),
            labelIndices:false,
            bounds: false
        });

With that done we just need to add a ‘build’ method to our cbar object. This will be called on each D3 selection and will have access to any new data (e.g. cbar.data()) etc..

        cbar.build = function() {
            var cEnter, bars, cbars;
            // change the scales, indices etc. to reflect any parametric
            // changes using the setter methods
            cbar.cbarscale = d3.scale.linear()
                .domain([0, cbar.numBars()-1]).range(cbar.bounds());
            cbar.colorScale()
                .domain(cbar.domain()).range(cbar.crange());
                
            [ ... ] 
            
            // bind new data and append elements if necessary
            cbars = cbar.svg.selectAll('.colorbar')
                .data([d3.range(cbar.numBars())]);
            cEnter = cbars.enter().append('g')
                .attr('class', 'colorbar');
            bars = cEnter.append('g').attr('class', 'bars');

            [ ... ]
            
            // add bar and text groups 
            bars = bars.selectAll('.bar')
                .data(function(d) {
                    return d;
                }).enter()
                .append('g')
                .attr('class', 'bar');
                
            [ ... ]  
            
            // now update the elements' attributes and styles
            cbar.svg.select('.colorbar')
                .attr("transform", "translate(" + cbar.x() + "," + cbar.y() + ")" + crot);

            [ ... ]
                
        };
        // return the cbar object to be called on selections 
        return cbar;
    };

So, create a new plugin object, set some special parameters, define a build method and you should be good to go. One could of course pass some parameters into the ColorBar function and create ever more specialized colorbar prototypes, but the returns would be diminishing I think and the code less transparent. There’s an inevitable trade-off between reducing boiler-plate/cruft and introducing irritating ‘magic’.

Let’s create a few random colobars for demonstration:

var createRandomColorbar = function() {
    // We'll use one of the excellent colorbrewer color-sets
    var cols = ['Set3', 'YlGnBu', 'YlOrRd', 'BrBG', 'PRGn', 'PiYG', 'RdYlBu', 'Spectral', 'Pastel1'],
    col = cols[Math.floor(Math.random() * cols.length)];

    var crange = colorbrewer[col][9],
        numBars = parseInt((0.5 + Math.random()) * 50),
        cbar = new kcharts.ColorBar()
            .title('Foobar values')
            .height((0.5 + Math.random()) * 300)
            .width((0.5 + Math.random()) * 20)
            .numBars(numBars)
            .x(400 + (Math.random()-0.5) * 600)
            .y(400 + (Math.random()-0.5) * 400)
            .horizontal(Math.random() < 0.5?true:false) // half vertical
            .colorScale(d3.scale.quantize()
                        .domain([0,Math.random() * 600]).range(crange))
            .labelIndices(d3.range(0, numBars-1, parseInt(numBars/(5 + Math.random()*5))))
    ;
    return cbar;
};

// create 30 svg contexts on the colorbar div
d3.select('#colorbar').selectAll('.colorbars').data(d3.range(30)).enter().append('svg').attr('class', 'colorbars');
// call a random colorbar on each
d3.selectAll('.colorbars').each(function(d) {
    d3.select(this).call(createRandomColorbar()); 
});

Which should create something like this:

d3-colorbar-plugin

 

 

There are a few bells and whistles one could add but it’s pretty usable and a good starting point. The main thing is that it was easier to write than previous efforts. Hopefully some best-practice will start to filter through allowing one to reduce the necessary scaffolding-code standing in the way of creativity. For a working demo go here, or go see the code at the github repo here.