Quantcast
Channel: inSCIght
Viewing all articles
Browse latest Browse all 53

Interactive visualization of WordPress blog view statistics.

$
0
0

As a follow-up to our episode on ParaViewWeb, we present visualization of the WordPress statistics for episode page views of our blog. While most listeners subscribe to our RSS feed, the blog page views provide a glimpse into the time course of episode attention.

Hat tip goes to Anthony Scopatz for tracking down a method to grab the WordPress stats. For a summary of the instructions, wget the URL http://stats.wordpress.com/csv.php:

wget -O instructions.txt http://stats.wordpress.com/csv.php

To use the API, you’ll have to sign up for a free Akismet key. The following request was used to get the raw site post views (API key has been replaced):

wget -O stats.csv 'http://stats.wordpress.com/csv.php?api_key=0123456789abc&blog_uri=http://inscight.org&blog_id=0&table=postviews&days=-1'

Our raw data set looks something like this:

"2011-02-19",0,"Home page","http://inscightpodcast.wordpress.com/",37
"2011-02-19",1,"Episode 0: Strata Con & Big Data","http://inscight.org/2011/02/16/episode_0/",23
"2011-02-19",13,"Bio","http://inscight.org/bio/",1
"2011-02-18",0,"Home page","http://inscightpodcast.wordpress.com/",49

Next, the POP (power of Python) is applied to filter out the episode data sets and sort views by day counting from the inaugural day of the blog. To get a more continuous approximation of post attention, we use a kernel density estimation technique (note that in this case it is the same as convolution with a Gaussion, but we use a rule for determining the sigma). Finally, we export the dataset as a VTK image.

#!/usr/bin/env python

import re

from matplotlib.mlab import csv2rec, rec_drop_fields

import numpy as np

from scipy.stats import gaussian_kde

import vtk


# Get our data
stats_filename = 'stats.csv'
stats = csv2rec(stats_filename)


# Only look at episode posts
episode_post_ids = dict()
episode_title_re = re.compile(r'^[Ee]pisode [0-9]+:')
for rec in stats:
    if episode_title_re.search(rec['post_title']):
        episode_post_ids[rec['post_id']] = rec['post_title']

# Sort by chronological order
post_ids = [id for id in episode_post_ids.iterkeys()]
post_ids.sort()

# Put the views per day in a numpy array indexed by episode
start_day = stats[-1]['date'].toordinal()
end_day   = stats[0]['date'].toordinal()
days = end_day - start_day + 1
posts = len(post_ids)
post_views = np.zeros((len(post_ids), days))

# To the day-binned data into an array do the following
#for rec in stats:
    #if rec['post_id'] in post_ids:
        #day_idx = rec['date'].toordinal() - start_day
        #post_id_idx = post_ids.index(rec['post_id'])
        #views = rec['views']
        #post_views[post_id_idx, day_idx] = views

# Gaussian Kernel Density Estimation to smooth the result
post_view_samples = [[] for i in range(posts)]
for rec in stats:
    if rec['post_id'] in post_ids:
        day_idx = rec['date'].toordinal() - start_day
        post_id_idx = post_ids.index(rec['post_id'])
        views = rec['views']
        for ii in range(views):
            post_view_samples[post_id_idx].append(day_idx)

positions = np.arange(days)
for episode in range(len(post_views)):
    episode_views = np.array(post_view_samples[episode], dtype=np.float64)
    kernel = gaussian_kde(episode_views)
    smoothed_views = kernel.evaluate(positions) * len(episode_views)
    post_views[episode,:] = smoothed_views

# Export to a VTK image for analysis with ParaView.
image_importer = vtk.vtkImageImport()
post_views_str = post_views.tostring()
image_importer.CopyImportVoidPointer(post_views_str, len(post_views_str))
image_importer.SetDataScalarTypeToDouble()
image_importer.SetNumberOfScalarComponents(1)
image_importer.SetDataExtent(0, post_views.shape[1] - 1,
        0, post_views.shape[0] - 1,
        0, 0)
image_importer.SetWholeExtent(0, post_views.shape[1] - 1,
        0, post_views.shape[0] - 1,
        0, 0)
image_importer.SetDataSpacing(1.0/7.0, 1.0, 1.0)
image_importer.Update()

writer = vtk.vtkStructuredPointsWriter()
writer.SetInputConnection(image_importer.GetOutputPort())
writer.SetFileName('post_views.vtk')
writer.Update()

Our eyes and visual cortex have an easier time detecting detailed variations in a scalar field when represented as a plot as opposed to a colormap.

inscight views

Insight views. Time progresses from left to right and episode number is in the vertical direction.

Instead of a a line plot for 1D datasets, we can warp the height of grid by its scalar value for a 2D dataset to see finer details in the intensity patterns of an image. In ParaView, this is a simple “File -> Open”, “Filters -> Alphabetical -> Clean To Grid”, “Filters -> Alphabetical -> Warp By Scalar”. Saving a ParaView state file makes it easy to import the result into ParaViewWeb!

Click here for an interactive 3D representation of the dataset!


Filed under: General Interest

Viewing all articles
Browse latest Browse all 53

Trending Articles