As a follow-up to our episode on ParaViewWeb, we present visualization of the WordPress statistics for episode page views of our blog. While most listeners subscribe to our RSS feed, the blog page views provide a glimpse into the time course of episode attention.
Hat tip goes to Anthony Scopatz for tracking down a method to grab the WordPress stats. For a summary of the instructions, wget the URL http://stats.wordpress.com/csv.php:
wget -O instructions.txt http://stats.wordpress.com/csv.php
To use the API, you’ll have to sign up for a free Akismet key. The following request was used to get the raw site post views (API key has been replaced):
wget -O stats.csv 'http://stats.wordpress.com/csv.php?api_key=0123456789abc&blog_uri=http://inscight.org&blog_id=0&table=postviews&days=-1'
Our raw data set looks something like this:
"2011-02-19",0,"Home page","http://inscightpodcast.wordpress.com/",37 "2011-02-19",1,"Episode 0: Strata Con & Big Data","http://inscight.org/2011/02/16/episode_0/",23 "2011-02-19",13,"Bio","http://inscight.org/bio/",1 "2011-02-18",0,"Home page","http://inscightpodcast.wordpress.com/",49
Next, the POP (power of Python) is applied to filter out the episode data sets and sort views by day counting from the inaugural day of the blog. To get a more continuous approximation of post attention, we use a kernel density estimation technique (note that in this case it is the same as convolution with a Gaussion, but we use a rule for determining the sigma). Finally, we export the dataset as a VTK image.
#!/usr/bin/env python import re from matplotlib.mlab import csv2rec, rec_drop_fields import numpy as np from scipy.stats import gaussian_kde import vtk # Get our data stats_filename = 'stats.csv' stats = csv2rec(stats_filename) # Only look at episode posts episode_post_ids = dict() episode_title_re = re.compile(r'^[Ee]pisode [0-9]+:') for rec in stats: if episode_title_re.search(rec['post_title']): episode_post_ids[rec['post_id']] = rec['post_title'] # Sort by chronological order post_ids = [id for id in episode_post_ids.iterkeys()] post_ids.sort() # Put the views per day in a numpy array indexed by episode start_day = stats[-1]['date'].toordinal() end_day = stats[0]['date'].toordinal() days = end_day - start_day + 1 posts = len(post_ids) post_views = np.zeros((len(post_ids), days)) # To the day-binned data into an array do the following #for rec in stats: #if rec['post_id'] in post_ids: #day_idx = rec['date'].toordinal() - start_day #post_id_idx = post_ids.index(rec['post_id']) #views = rec['views'] #post_views[post_id_idx, day_idx] = views # Gaussian Kernel Density Estimation to smooth the result post_view_samples = [[] for i in range(posts)] for rec in stats: if rec['post_id'] in post_ids: day_idx = rec['date'].toordinal() - start_day post_id_idx = post_ids.index(rec['post_id']) views = rec['views'] for ii in range(views): post_view_samples[post_id_idx].append(day_idx) positions = np.arange(days) for episode in range(len(post_views)): episode_views = np.array(post_view_samples[episode], dtype=np.float64) kernel = gaussian_kde(episode_views) smoothed_views = kernel.evaluate(positions) * len(episode_views) post_views[episode,:] = smoothed_views # Export to a VTK image for analysis with ParaView. image_importer = vtk.vtkImageImport() post_views_str = post_views.tostring() image_importer.CopyImportVoidPointer(post_views_str, len(post_views_str)) image_importer.SetDataScalarTypeToDouble() image_importer.SetNumberOfScalarComponents(1) image_importer.SetDataExtent(0, post_views.shape[1] - 1, 0, post_views.shape[0] - 1, 0, 0) image_importer.SetWholeExtent(0, post_views.shape[1] - 1, 0, post_views.shape[0] - 1, 0, 0) image_importer.SetDataSpacing(1.0/7.0, 1.0, 1.0) image_importer.Update() writer = vtk.vtkStructuredPointsWriter() writer.SetInputConnection(image_importer.GetOutputPort()) writer.SetFileName('post_views.vtk') writer.Update()
Our eyes and visual cortex have an easier time detecting detailed variations in a scalar field when represented as a plot as opposed to a colormap.
Instead of a a line plot for 1D datasets, we can warp the height of grid by its scalar value for a 2D dataset to see finer details in the intensity patterns of an image. In ParaView, this is a simple “File -> Open”, “Filters -> Alphabetical -> Clean To Grid”, “Filters -> Alphabetical -> Warp By Scalar”. Saving a ParaView state file makes it easy to import the result into ParaViewWeb!
Click here for an interactive 3D representation of the dataset!
Filed under: General Interest
