Quintessential gusts

Explorations in Data-viz, among other things

Category: html5

A Book is Born

It Lives!

It Lives!


Around about a year and a half ago, I allowed myself to be talked into writing a book on data visualization with Python and JavaScript for O’Reilly press. To be honest, it wasn’t a hard sell. The thought of actually having one of those animal books with my name on it stoked my vanity and then some.

Plus, I honestly believe that we’re on the cusp of a new age of data visualization, with an increasingly powerful and well adapted JavaScript providing the necessary horsepower for a new generation of engaging, interactive dataviz, delivered at pretty much zero cost to anyone with a computer, tablet, smartphone etc..

But while JS, primarily in the shape of the powerhouse D3 visualization library, is delivering the visual goods, it lacks first class (or often any class) tools for gathering, manipulating, mining and analyzing your data, prior to crafting a visualization to show off any gems you discover. That’s where Python comes in. The Python data processing ecosystem has grown at a staggering speed these past few years and whether it’s scraping data from the web or applying cutting edge Machine Learning tools, Python pretty much leads the pack, particularly in terms of ease of use.

Well, thanks to O’Reilly and Fedex, I just got to hold the dead-tree version in my hand, which represents some kind of closure. Writing a book turned out to be just as exhausting as all those blog accounts on the web will tell you, and then some. And whatever the merits of the finished book, it can’t be accused of a lack of ambition.

So the aim of the book is to get you in the Python + JavaScript dataviz ballpark. These two languages represent the most powerful dataviz stack available, in my humble opinion. I wanted to set a data transformation task to show off the key Python + JS dataviz tools, which together form a toolchain, taking you from raw, unrefined data to those beautiful data visualizations you might have caught in the New York Times et. al.

The task I chose, which lends a backbone to the book, is turning a fairly unengaging  Wikipedia list of Nobel prize winners into a modern, interactive dataviz, which uses the same dataset but allows users to drill into it interactively, finding their own nuggets of interest. You can see the resulting visualization on this website, built using D3, the 300lb gorilla of JavaScript data visualization. But before being able to use D3, we need to scrape the data from the web, using Python’s industrial standard Scrapy library, clean and explore it using Pandas, the king on the hill of Python data analysis, and finally deliver the data, using Flask, Python’s pocket-rocket web-server, to build a modern, RESTful API in a few lines of Python. There’s a supporting cast of smaller libraries, like Python’s web-scraping BeautifulSoup, SQLAlchemy (the best Python SQL Object Relational Mapper) and JavaScript’s crossfilter, for lightening fast filtering of large (by browser standards) datasets.

2016-08-01 15.49.13

In glorious technicolor!

I think for most of us in the programming community, O’Reilly represents something of  a seal of approval and has set a high mark for publishing standards, historically,  something akin to our Oxford University Press. My experiences working on the book have been wholly positive and I’ve been blown away by the quality of the people I’ve worked with there, particularly my editors (huge shout out to Meg, Dawn and Kristen!!). I’ve learned so much from that collective team – stuff that, with the usual irony, would have made writing the book so much easier if I’d known it in the first place.

I’ll be writing more in the coming weeks about the book, now there’s something tangible to account for, but in the meantime, if you want to check it out on O’Reilly, you a can find it here at their store or on Amazon.

Building a D3 Plugin


Some awkward, common visualization constructs lend themselves to a plugin form, a reusable DOM-stamper that can be applied to an ‘svg’ or ‘div’ tag, with a few parameters and maybe some data. For example take a colorbar or legend-box, often unglamorous visual indicators which are tedious to repeat from scratch. I’ve been trying to make writing such things easier, limiting the initial boilerplate and focussing on the D3 build stage. What follows is my current state of play, with inspiration from Mike Bostock’s article on reusable charts  among others. Very much a work in progress.

The Reusable Core

There are things that pretty much every plugin is going to need: some getter and setter methods for its parameters, access to its container etc.. and, more often than not, an ‘svg’ context:

    // returns chainable getters+setters
    var makeAPIMethod = function(chart, params, method) {
        return function(_){
                return params[method];
            params[method] = _;
            return chart;

    kcharts.BasicPlugin = function(params) {
        // the plugin has some default params
        this.params = {
            height:100, width:100, padding: 20, type: 'plugin', data:{}
        // these are extended  
        for(var p in params){
            this.params[p] = params[p];
        // we are going to return a plugin object ready to process a D3 selection 
        var plugin = function(selection) {
            selection.each(function(data) {
                // a few default measurements
                var el, g, height = plugin.height(), width = plugin.width(), padding = plugin.padding();
                // store the data for convenience
                plugin.container = d3.select(this);
                // we assume the selection is or contains an 'svg' tag
                el = plugin.container.select('svg');
                el = el.node()? el: plugin.container;
                // we'll use an svg group as our plugins container
                plugin.svg = el.selectAll('g.' + plugin.type()).data([data]);
                plugin.gEnter = plugin.svg.enter()
                    .append('g').attr('class', plugin.type());
                // now build the plugin
        // create plugin's getter+setter methods based on specfied parameters 
        for(var method in this.params){
            plugin[method] = makeAPIMethod(plugin, this.params, method);
        // placeholder
        plugin.build = function() {};
        return plugin;

The core-plugin just uses a dictionary of parameters to produce chainable getter and setter methods, and hooks some convenient variables onto the main object. What these should be will become apparent as one produces more reusable objects. The chief aim is to cut down on boilerplate and make rolling new plugins as intuitive as possible.

Using the core-plugin

To create the colorbar plugin we will specialize a closed BasicPlugin:

  • create a new BasicPlugin (cbar)
  • passing in parameters to for required chainable getter + setter methods (e.g. colorbar height, x/y placement, number-of-bars etc..)
  •  set sensible defaults based on these parameters for use in building (e.g. a standard colorscale for demo purposes)
  • Define a ‘build’ method which will be called by the plugin after the D3 selection

Here we create the ‘cbar’ object with special parameters:

    var COLOR_BARS = 50;
    kcharts.ColorBar = function(){
        // we create a new 'cbar' plugin
        var cbar = new kcharts.BasicPlugin({
            // colorbar parameters for chainable getters+setters
            type: 'colorbar',
            title: '',
            horizontal: false,
            width: 10,
            height: false,
            x: 0,
            y: 0,
            numBars: COLOR_BARS,
            domain: [0, COLOR_BARS/2, COLOR_BARS],
            crange: [kcharts.COLS.blue, kcharts.COLS.yellow, kcharts.COLS.red],
            range: [0, 100],
            colorScale: d3.scale.linear().interpolate(d3.interpolateHcl),
            bounds: false

With that done we just need to add a ‘build’ method to our cbar object. This will be called on each D3 selection and will have access to any new data (e.g. cbar.data()) etc..

        cbar.build = function() {
            var cEnter, bars, cbars;
            // change the scales, indices etc. to reflect any parametric
            // changes using the setter methods
            cbar.cbarscale = d3.scale.linear()
                .domain([0, cbar.numBars()-1]).range(cbar.bounds());
            [ ... ] 
            // bind new data and append elements if necessary
            cbars = cbar.svg.selectAll('.colorbar')
            cEnter = cbars.enter().append('g')
                .attr('class', 'colorbar');
            bars = cEnter.append('g').attr('class', 'bars');

            [ ... ]
            // add bar and text groups 
            bars = bars.selectAll('.bar')
                .data(function(d) {
                    return d;
                .attr('class', 'bar');
            [ ... ]  
            // now update the elements' attributes and styles
                .attr("transform", "translate(" + cbar.x() + "," + cbar.y() + ")" + crot);

            [ ... ]
        // return the cbar object to be called on selections 
        return cbar;

So, create a new plugin object, set some special parameters, define a build method and you should be good to go. One could of course pass some parameters into the ColorBar function and create ever more specialized colorbar prototypes, but the returns would be diminishing I think and the code less transparent. There’s an inevitable trade-off between reducing boiler-plate/cruft and introducing irritating ‘magic’.

Let’s create a few random colobars for demonstration:

var createRandomColorbar = function() {
    // We'll use one of the excellent colorbrewer color-sets
    var cols = ['Set3', 'YlGnBu', 'YlOrRd', 'BrBG', 'PRGn', 'PiYG', 'RdYlBu', 'Spectral', 'Pastel1'],
    col = cols[Math.floor(Math.random() * cols.length)];

    var crange = colorbrewer[col][9],
        numBars = parseInt((0.5 + Math.random()) * 50),
        cbar = new kcharts.ColorBar()
            .title('Foobar values')
            .height((0.5 + Math.random()) * 300)
            .width((0.5 + Math.random()) * 20)
            .x(400 + (Math.random()-0.5) * 600)
            .y(400 + (Math.random()-0.5) * 400)
            .horizontal(Math.random() < 0.5?true:false) // half vertical
                        .domain([0,Math.random() * 600]).range(crange))
            .labelIndices(d3.range(0, numBars-1, parseInt(numBars/(5 + Math.random()*5))))
    return cbar;

// create 30 svg contexts on the colorbar div
d3.select('#colorbar').selectAll('.colorbars').data(d3.range(30)).enter().append('svg').attr('class', 'colorbars');
// call a random colorbar on each
d3.selectAll('.colorbars').each(function(d) {

Which should create something like this:




There are a few bells and whistles one could add but it’s pretty usable and a good starting point. The main thing is that it was easier to write than previous efforts. Hopefully some best-practice will start to filter through allowing one to reduce the necessary scaffolding-code standing in the way of creativity. For a working demo go here, or go see the code at the github repo here.