Unofficial Python Module of the Week #2: configobj

Welcome to our third instalment of interesting Python modules. Unfortunately I’m a bit late with this section this week – in fact its next week already! The fourth instalment should be along towards the end of the week thus catching me up.

Today we’re going to cover something which isn’t in the standard library, but is nonetheless very useful. The module is configobj which is used for reading from and writing to INI style configuration files files. A simple INI file is shown below:

item1 = value
item2 = value2

[ section1 ]
item1 = value

[[ subsection ]]
item1 = value

In the above we can see the simple use of items, values sections and subsection. Subsections can be nested down as far a you want, but I don’t think most applications will need many more than two or three levels.

Installation

As this module isn’t in the standard library, we need to install it. On most Linux distros it should be in the package repositories, for example on Fedora 14:

$ sudo yum install python-configobj

Windows and Mac users can install from PyPi by following the instructions on the homepage.

Basic Usage

Reading from a configuration file with configobj couldn’t really be any simpler:

import configobj
config = configob.ConfigObj(filename)
myoption = config['item1']
mysectionoption = config['section1']['item1']
mysubsectionoption = config['section1']['subsection']['item1']

Basically, all you need to do is open a ConfigObj object by passing it a filename, then you just read from it as if its a dictionary object. Sections and subsections appear as nested dictionaries. Writing to the file is just as simple:

import configobj
config = configob.ConfigObj(filename)
config['newoption'] = 'new stuff'
config.write()

No surprises here, you just write to it as if it were a dictionary. All you have to do it call the write() method when you’ve finished, in order to sync everything to disk.

That’s pretty much it for basic usage. There is much more you can do with configobj, including advanced stuff like validation of configuration files. Check out the documentation for more info.

Unofficial Python Module of the Week #1: shelve

Here we are, the second Unofficial Python Module of the Week. Yes, the second – we started from zero (obviously!). This week we are covering the shelve module. Shelve provides you with a very simple Python object store. You can use it where you need quick persistent storage of objects between program runs, it’s much less overhead than using a database – even SQLite. Anyway, lets dive straight into it:

>>> import shelve
>>> shelf = shelve.open("myshelf.db", writeback=True)

Here we import the shelve module (its in the standard library, so there’s no installation required). Then we open our persistent object store, supplying the filename that we want to store the objects in and the writeback parameter, which allows mutable objects to be stored more conveniently (otherwise they are only written when an assignment is performed). The writeback parameter also causes data to be cached in memory, which can be quite memory intensive, so you should call shelf.sync() every so on to flush everything to disk.

You can store anything that can be handled by the Python pickle module in a shelf:

>>> shelf['thedict'] = {'one': 1, 'two': 2, 'three': 3}
>>> shelf.sync()

As you can see, using a shelf is just like using a dictionary. The only real limit is that the keys must be strings. You can also read back values from the shelf as with a dictionary:

>>> print(shelf['thedict'])
{'one': 1, 'two': 2, 'three': 3}

That’s just about it, just remember to close the shelf when you’re finished with it:

>>> shelf.close()

If you want to find out more have a look at the official Python docs for shelve and Doug Hellmann’s PyMOTW posting on the subject.

Review: Piwik Analytics Software

If you read my previous post regarding the site overhaul that I’m currently doing you will have seen me mention that I’m now using the Piwik Open Source Analytics Package in place of Google Analytics. Well I’ve had it running for a few days and have played around with it a bit, so I thought I’d review it. I’m going to start with my reasons for moving from GA and then move along and score it on several different criteria:

  • Installation and Setup
  • Site integration
  • User interface
  • Extensibility (API availability)
  • Overall impressions (documentation, community, etc.)

The philosophical argument

As well as the obvious benefit (from a Freedom perspective) of using one less proprietary web service, there is also another reason that I switched away from Google Analytics. Basically, this was privacy. For a while I’ve been using technologies to limit the amount of data which leaks from my browser as I navigate the web, in order to reduce the amount of profiling of my web activities. This isn’t because I have anything to hide. I just don’t like the idea of large companies building up a huge database on me, without my permission. The upshot of this is that I found myself in the slightly hypocritical situation of blocking GA in my own browser, but using it to track others on my site.

The solution was obvious, remove GA from my site. However, I didn’t want to lose the valuable information that it provides me with. Also, I don’t have a problem with site owners collecting data that can help them, just with them sharing it with 3rd parties such as Google, who then build it into their larger profiling efforts. A quick search turned up Piwik which aims to provide a full featured GA replacement that you can run on your own server. Because site owners run their own instances, they remain in charge of their tracking of users, retain ownership of the data and best of all don’t give any data to Google.

With the aim of responsible and unobtrusive tracking in mind I’ve added a page to my site to allow users to Opt-out of the Piwik tracking by means of a cookie. The link is also accessible from the sidebar under the copyright notice. I’m afraid some of the text on that page is pretty difficult to see with my current theme, but I’m working on this. For now just uncheck the check box to opt-out.

Right, on to the main event, the actual review…

1. Installation and Setup

There’s actually not much to say here, which is because installation was ridiculously easy! I just downloaded the zip to my sever (with wget) and unzipped it into my server root directory. This produced a directory called ‘piwik’ and a ‘How to Install Piwik.html’ file, which if you point your browser at it will redirect you to the installation instructions. The rest of the installation was fairly simple, following the instructions I pointed my browser at the ‘/piwik/’ directory of my site and was greeted by the installer. Following this was really easy, you’ll need to create a MySQL database when prompted for the database info, but that’s about as hard as it gets. Towards the end you’ll be prompted to setup your site with Piwik which involves entering a few details about the site, then you’ll be provided with a snippet of JavaScript to add to your site template. Which leads me neatly into the next section…

2. Site Integration

I didn’t copy and paste the JavaScript into my template, instead opting to install the WP-Piwik addon for WordPress. This made the set up easy and also gave me a widget on my WordPress admin dashboard which gives me a nice overview of my site visits. As I already said I was also able to add a widget to the site to enable visitors to opt-out of tracking. This was also simple, just involving a copy and paste of a couple of lines of HTML from one of the settings pages into a WordPress page. Easy!

You can also integrate Piwik widgets with your site, by following the instructions in the documentation, this is a neat feature, especially if you have a custom start page set in your web browser (something which I have yet to get around to making).

I also investigated the campaigns functionality in order to track entries to my site from the RSS feed. This is really simple to use, all you have to do is append the query string ‘?piwik_campaign=NAME’, where NAME is the name of your campaign to the end of a URL, to have it show up under that campaign. I found that I could integrate this with WordPress pretty well by adding the following snippet of code to the functions.php file of my theme:

function piwik_track_feed($url)
{
     $url .= "?piwik_campaign=RSS";
     return $url;
}

add_filter("the_permalink_rss", "piwik_track_feed");

If you now check the URLs in your RSS feeds, they will all have the query string added and clicks will be attributed to the ‘RSS’ campaign in Piwik.

3. User Interface

The Piwik user interface is really nice. I’ve included some screenshots below, so that you can make up your own mind. It’s pretty similar to the GA user interface, only cleaner and all the AJAX stuff makes it feel really responsive. I also love the real time tracking widget, which is something GA totally lacks. The only bad thing about the UI is the requirement of Flash for the graphs. I hate Flash and it doesn’t have a reliable 64-bit Linux version, which means I only have it installed on my netbook. Oh, and before you ask, I tried it with Gnash and it didn’t work!

4. Extensibility

By extensibility, I was primarily interested in API access. There’s certainly no shortage of this with two APIs listed on the documentation page. One API is for performing tracking, which I didn’t need given my usage of the WordPress plugin. I looked instead at the analytics API, which allows you to access all the data through simple HTTP requests. I was able to write a simple Python script to email me my main statistics once a day, in about an hour (including working out how the Python email and smtplib modules work!). Performing an Piwik API call in Python is as simple as:

def getVisits(idSite, period, date):
    url = "%s/index.php?module=API&method=VisitsSummary.get&idSite=%d&perio$
    return json.load(urllib2.urlopen(url))

Of course, as it’s Python its ridiculously simple!

Of course, if you find something that you can’t do with the API (which is unlikely, because it seems to cover everything), the you can access the data in the database – because it’s in YOUR database. You can also back-up and secure your data exactly how you want to. This is something that GA just can’t compete with!

5. Overall Impressions

My impressions of Piwik as a project have been really good. The documentation is excellent and there seems to be a good community behind it. As a product its a pleasure to use, really easy to install and just works. The reliance on flash for the graphs is a bit disappointing, but perhaps this will change in the future as HTML5 matures. Here are the obligatory scores:

  • Installation and Setup – 5/5
  • Site Integration – 4/5
  • User Interface – 3/5
  • Extensibility – 5/5

Overall Score: 4/5

Verdict: If your currently using Google Analytics, stop it! (and use this instead)

Unofficial Python Module of the Week #0: argparse

[EDIT: This series has been renamed so as not to conflict with Doug Hellman’s excellent Python Module of the Week series. See UPMotW #1 for details]

I’m going to try something different today. This is going to be the first in a series (hopefully once weekly) of Python Modules of the Week. This has probably been done to death elsewhere, but I’m using it as an opportunity to learn some more Python and to cement some of these modules in my mind.

Argparse is a Python module for processing command line arguments. It’s new in version 2.7 and is designed to replace the Optparse module. I’m not sure what the rationale for the change was as the two modules look very similar to me. However, it suffices to say that you should use argparse in all new code.

The main class of argparse is called ArgumentParser, it allows you to add arguments to your program and parse the provided arguments at runtime:

parser = argparse.ArgumentParser()
parser.add_argument("-v", "--verbose") # a flag type argument
parser.add_argument("value", type=int) # an integer positional argument
...
args = parser.parse_args() # parse the args from sys.argv
print(args.value) # arguments accessed as data members

You can also add meta-data such as help and descriptions to your arguments via keyword arguments to many of the methods. Take a look at the module documentation for examples. Once nice thing is that when you provide this data, argparse automatically generates the -h and –help flags and populates them with nicely formatted help output using the info you provide.

The argument parsing provided by argparse allows you to create some very complex command line interfaces to your scripts. I especially like the sub-commands framework, which allows each sub-command to exist as its own separate entity with its own options and help information. To create an argument parser with a sub-command is just a few lines:

parser = argparse.ArgumentParser()
subcommands = parser.add_subparsers()
mysubcommand = subcommands.add_parser('mysubcommand')

Of course,each sub-command is an ArgumentParser in its own right, so you can add all the options that you normally would, including command line flags, positional arguments and help information. Also each sub-command will be listed in the help and will also have its own help page which can be accessed via:

./pyprog.py subcommand --help

All in all, this is a very useful Python module to know about. Its ease of use and the number of things it ‘just takes care of’ make it a pleasure to use. Hopefully it’ll have you writing nice command line interfaces to all those hacked together scripts that are currently controlled by commenting bits of them in and out!

Automating routine tasks…

I’ve been thinking more recently about automating routine computing tasks. Using Free Software this is ridiculously easy, usually a combination of Python and Cron does the trick. More difficult I have found is actually working out what to automate. It’s very easy to get into the habit of doing something and then not realise that you do it every day, or every hour when in front of the computer.

One of my major successes in this area has been in terms of feed reading. A while ago I setup rss2email on my server to check my feeds and email the stories to me. Previous to this I actually found it really hard to ever get around to reading news feeds. I think this was mainly because checking news feeds suffers from the ‘extra inbox problem’, its just more stuff to check. Having news stories emailed to me fixes this and I can also read them easily on my phone. Because I use IMAP email, my read/unread status is also synchronised on all my computers.

Now I’m kind of at a brick wall. I don’t know what else I can set up to reduce the number of routine tasks I do. The purpose of this post is to crowd source that problem. I want everyone reading this to get back to me with the things they’ve set up to help them. You can get in touch either through the comments on this post, or via identi.ca. If I get enough responses I’ll write a follow up post detailing some of the best ideas.