Thursday, October 09, 2014

Custom multiseries trend using Keen.io API

The initial goal was to create a trend of event data related to the time of day or day of week when the event occured. Later on, it seemed like a good idea to display different timeframes on the same trend.

The end result shows a trend, calculating an average value of a metric (buildtime duration, in this example) for all events that occured in the same time interval (day of week, in this example), for different timeframes (last week, month and year, in this example), which are displayed as different series in the same chart, to be able to compare them and visually notice an evolution or an anomally.

This trend is part of the Buildtime Trend project, you can see the code in action here.

Read on to see how it is done.
The Keen.io service and API is used to store, analyse and visualise the event data. I'd like to refer to the Keen.io tutorials on how to create a query and generate a chart.

Generate and group by time intervals

 

First of all, the event data has a timestamp, so in a simplified example, an event would look like this :

  { id: "1234abcd", duration: "97", timestamp: "2014-10-09T18:32:14Z"}

But to group events on time intervals, like day of week, or hour (time of day), the timestamp  has to be split into its components (thanks to Ryan Spraetz of Keen.io for the suggested workaround), for example :

  {id: "1234abcd",
    duration: "97",
    timestamp: {
      isotimestamp: "2014-10-09T18:32:14Z",
      day_of_week: 4,
      hour_24: 18,
      hour_12 : 6,
      hour_AMPM : PM,
      ... 
    }
  }


Look here for the code to split the timestamp (in Python) and a full example of a split timestamp.

A query to group events by day of week, calculating an average value of duration, for all events of the last week, would look like this :

var queryLastWeek = new Keen.Query("average", {
  eventCollection: "builds",
  timeframe: "last_week",
  targetProperty: "duration",
  groupBy: "timestamp.day_of_week"
}); 
 
Using an example from the Keen.io tutorial, you could easily create a chart with one series of data.
If timeframe is changed to 'last_month' or 'last_year', you get the same query for a longer timeframe.

Combine several queries in one chart


So now we have 3 queries : queryLastWeek, queryLastMonth and queryLastYear

Which are passed as parameters to the Keen.io client.run method, where the result of the 3 queries are merged to one array by method mergeSeries (see below). This merged array (chart_data) is passed to keen.Visualisation to draw the chart you can see at the top of this post :
var request = client.run([queryLastWeek, queryLastMonth, queryLastYear], function() {
  series_captions = ["Last week", "Last month", "Last year"];
  index_captions = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"];
  chart_data = mergeSeries(
    this.data,
    index_captions,
    "timestamp.day_of_week",
    series_captions
  );
  // draw chart
  window.chart = new Keen.Visualization(
    {result: chart_data},
    document.getElementById("chart_avg_buildtime_weekday"),
    {
       chartType: "columnchart",
       title: "Average buildtime per day of week",
       chartOptions: {
       vAxis: { title: "duration [s]" },
       hAxis: { title: "Day of week" }
    }
  });
});

You can find the full code here.

Merge data series

First this methods creates a multilevel array with i rows (one for each series, in this example i = 3 (week, month, year)) and j columns (one for each index value in the query, in this example j = 7 : 'Sun' to 'Sat').
Then the methods takes the Keen.io data array, with the results of all queries as a parameter, loops over the result from each query and assigns the values to the corresponding index in a multilevel array. As a result all values corresponding to 'Monday' will be in the same place in the array.
function mergeSeries(data, index_captions, value_caption, series_captions) {
  chart_data = [];
  // create and populate data array
  for (i = 0; i < index_captions.length; i++) {
    chart_data[i]={caption: index_captions[i]};
    // populate all series
    for (j = 0; j < series_captions.length; j++) {
      chart_data[i][series_captions[j]] = 0;
    }
  }
  // loop over all query result sets
  for (j = 0; j < data.length; j++) {
    timeframe_result = data[j].result;
    timeframe_caption = series_captions[j];
    // copy query data into the populated array
    for (i = 0; i < timeframe_result.length; i++) {
      index = parseInt(timeframe_result[i][value_caption])
      chart_data[index][timeframe_caption] = timeframe_result[i]["result"];
    }
  }
  return chart_data;
}

Some improvements

Some ideas to make it more efficiently:
  • A special 'groupby' parameter for timestamps as part of the Keen.io API, would avoid splitting a timestamp and storing all the components in the database
  • Currently, 3 almost identical queries are created to generate the results for the different timeframes. It would be more efficient to repeat the same query several times with only the timeframe changing. Still something to investigate.