My Road to Docker: Traefik

My Road to Docker – Part 1: My Web Stack

This post may contain affiliate links. Please see the disclaimer for more information.

This post is part of a series on this project. Here is the series so far:


I’ve been following and using Docker since it’s early days. I think 0.6 was one of the first releases I tried. That said, I’ve never actually had that much luck getting it deployed on meaningful parts of my infrastructure. Usually, I would start some new deployment of something with the intention of using Docker. Then I’d hit some issue which made it difficult (or at least more difficult than it should be). Either that, or I’d run some Docker containers as a testing deployment for a while and then pull them down. It’s safe to say my road to Docker has been long, winding and full of potholes!

Recently, I’ve been giving my infrastructure choices more thought. My main goals are that I want to reduce maintenance and keep the high level of separation and security I have between my different services. As detailed in my last post on my self hosting setup, I’ve been using LXD fairly successfully. The problem with this is really that LXD containers are more like virtual machines. Each container is a little Linux system which needs to be kept up to date, monitored and generally looked after. When you have one container per service this becomes a chore. In actuality I don’t have one LXD container per service. Instead I group dependent services such as databases with their parent. However, I’ve still ended up with quite a few containers.

Decisions, decisions…

I really like the separation I’m able to have by keeping my LXD containers on separate VLANs. Docker does support this via macvlan, but the last time I played with it I had to manually assign IPs to each container. It looks like Docker will now do this automatically. However, it allocates from it’s own IP pool and won’t use your DHCP server. This means I can’t assign IPs from my Pfsense firewall. Also having one public (LAN) IP per Docker service kinda sucks and I also don’t want to rely on just one project for the security of my system.

Additionally, I have the need to run at least one VM anyway for my virtualised Pfsense firewall. This brings me to the idea of running Docker inside VMs. Each VM can be assigned to the relevant VLAN interface and get it’s IP from DHCP. I also get the benefit of two levels of isolation using different technologies. I haven’t yet decided on whether to use Proxmox or Ubuntu Server+Libvirt+Cockpit for the host systems. Hopefully this will be the subject of a future post.

On To Today

All the above is pretty much background, because I actually don’t want to talk about my locally hosted stuff today. Instead I’m going to talk about my stuff in the cloud – specifically the hosting for this site.

I’ve used Linode since around 2011 when I set up my self hosted mail server. Since this time, I’ve hosted this site on a fairly standard LAMP setup on that same server. The mail server install is now getting on a bit, but being CentOS based it’s all still supported for updates until next year. However, in preparation for re-building the mail server I decided to move the web stuff out onto another server and run it in Docker. This basically gives me the same setup of Docker-on-VM as I have on my local infrastructure. Just the hypervisor UI differs – it’s all KVM underneath.

Since I was moving the site to a new server, I also decided to move closer to my audience. According to my Matomo statistics this is predominantly US based. To this end I span up a new $5 Linode (Nanode), running Ubuntu 18.04 in their New Jersey datacenter. This should also be pretty fast for European visitors (to be honest it’s lightning fast even from NZ).

Getting Going

After doing some pretty standard first 10 minutes on a server stuff, I set about installing Docker. Instead of installing via my usual method of adding the APT repository I decided to see how installing from a Snap would work in production:

A few seconds later Docker was installed. However, I ran into a wrinkle when trying to add myself to the Docker group. Basically, it wouldn’t allow me to run Docker commands without root access (still). The solution is to add the Docker group before installing the Snap, so I removed it, added the group and reinstalled. So full instructions for installing Docker via Snap should more accurately be:

Stacking Containers

I’d resolved to install the blog inside the official WordPress container and connect use mariadb for the database. Since I was moving my install over, it was slightly more complex than it would be for a clean install, but this article put me on the right track.

The main issue I encountered didn’t happen until I had the site running and was trying to get HTTPS running. Due to Let’s Encrypt’s HTTP verification this had to be done after the DNS settings were updated i.e. the site was live. This issue manifested as WordPress serving HTTP URLs for all links and embedded content. This leads to mixed content warnings/blocks in Firefox and Chrome. The issue seems to be common to any setup where HTTPS is handled by a reverse proxy. Basically, in this setup WordPress isn’t aware that it should be using HTTPS and needs to be told. Adding the following to wp-config.php fixes the problem:

I decided to use Traefik as my reverse proxy, which I’m quite pleased with. The automated service discovery is pretty awesome and will really come into it’s own on my other self hosted infrastructure. I also whacked a Varnish cache (from this Docker image) in between. So far I haven’t done much with Varnish, but it’s there for when I get time to tweak it.

Moving Matomo

I also moved my Matomo Analytics instance to the new server using the official Docker image. This gave me much less trouble than WordPress since I just allowed it to use the embedded Matomo version with my configuration file and database from the old install.

I connected Matomo up directly to my Traefik instance, without running it through Varnish, to avoid any potential issues with the cache interfering with my statistics. Traefik just worked with this setup, which was pretty refreshing.

My Road to Docker: Traefik
My Traefik Dashboard

PSA: Docker will Ignore Your Firewall!

Along the way, I ran into another issue. Before I changed the DNS settings to make the site live, I was binding Traefik to port 8080 and accessing it via an SSH tunnel for making changes in WordPress. Since I had configured UFW to block everything (except SSH) when I set up the server, I thought this was a nice secure setup for debugging.

I was wrong.

I only noticed something was off because I happened to have the log output from Traefik open and saw random IPs in the logs. Luckily, the only virtual host I had configured at that point was localhost:8080 so all the unwanted visitors got 404 responses. Needless to say I pulled down all the containers until I could work out what was going on.

This appears to be a known interaction between Docker and any firewall utility (including both UFW and Firewalld). The issue is inherent in the way that Docker uses iptables to route traffic to your containers. Basically, the port forwarding rules go in the NAT chain in iptables. This means incoming traffic is re-routed before it hits the INPUT chain containing the rules from UFW/Firewalld.

I tried the fix suggested – disabling iptables support in Docker, but this completely broke inter-container connectivity (unsurprising, when you think about it). My solution for now is just to be really careful when binding ports. Make sure that any ports you don’t want to give access to the outside world are bound only to 127.0.0.1. There is also the DOCKER-USER iptables chain if you need more flexibility, but it means you need to use raw iptables rules.

This issue is a major security flaw and needs to be given more attention. Breaking administrator expectations of security like this is going to lead to loads of services being exposed to the big bad Internet that really shouldn’t be!

My Full Web Stack

Below is the docker-compose.yml file for my full web stack. I store any secret variables in an env.sh file which I source before running my docker-compose commands.

This is all pretty much as described, however there are two notable points:

  • I went for separate mariadb containers for the databases for WordPress and Matomo. This is because the official mariadb container only supports creation of one database/user via environment variables. I’m not super happy with this arrangement, but it is working well and doesn’t seem to use too much memory.
  • The aliases portion under the network configuration for the Traefik container allow the other containers to route internal requests back to themselves. This helps with things such as the loopback check in WordPress, which will otherwise fail.

That’s pretty much all there is to it!

Conclusion

Overall, I’m really happy with the way this migration has worked out. I’ve even been able to downgrade the Linode plan for the original server to the $5/month plan, since it now has less load. This means I’m paying the same as I was for the single server for two servers, although with half the RAM each. I think I paid an extra $2-3 during the migration period, since that took me a while to compete. Even that isn’t too bad.

I’ve already started on further Docker migrations on some of the infrastructure I have on my home servers. These should be the subject of further posts.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Site Update…

Seen as its the start of a new month I thought I’d post a follow up to my post on the site overhaul. It hasn’t actually been a month yet, but the tracking statistics are based on calendar months. The short story is that visitor numbers are up, with 446 visitors recorded since the Piwik instance went live (13th Jan) until the end of the month. This beats all my previous records and is only for part of a month. My old GA figures were putting me at around 300 visits per month, though this had slipped recently. So, either there is a large difference in the way the two systems measure visitors or my push to actually put up some content is working! The fact that roughly half my visits are coming from search engines and are coming to newer posts on the site supports my theory that the push is working.

Having said that I’ve slacked off a bit recently (this is only my second post this week). This is mainly because I’ve been putting a lot of work into SwallowCatcher, managing to get a release out earlier this week. Hopefully, in the coming weeks I’ll be able to balance both projects.

In terms of the other action points I had for the site:

  • Google Ads are still here. This is primarily because AdBard have not approved my account yet. I’ve even queried this via their support address and had no response, so I don’t know what’s going on there.
  • Flattr is here! And its actually making me some money, with a total of three flattrs last month. Please, if you like my stuff consider flattring me, it’ll help pay for the site and encourage me to produce more content and software.
  • I have a new theme! Although its just an off the shelf one for now, I haven’t modified it.
  • More pages: well, I’ve posted the project section and a project page for SwallowCatcher. There’s still more I want to do and I haven’t got around to doing a proper profile page or online CV yet.
  • More content: well that speaks for itself. The last month has been a time of unprecedented blogging activity for me. I’m really quite enjoying it and its made its way into my mental todo list for each week, which is a good sign.

That’s just about all there is to it. For those that are interested in SwallowCatcher, since I released on Tuesday I’ve been working fixing some of the issues identified in the release notes and those that people have reported to me. I’ve also been using it day to day to download my podcasts and fixing any issues I encounter. In fact, tonight was the first time I’ve opened my laptop in the last two days. The laptop’s primary use for the last few months has been downloading podcasts when I’m at uni. Now SwallowCatcher is just about filling that niche! I’ll hopefully post an officially updated version sometime over the weekend, with the announcement again going out here.

Review: Piwik Analytics Software

If you read my previous post regarding the site overhaul that I’m currently doing you will have seen me mention that I’m now using the Piwik Open Source Analytics Package in place of Google Analytics. Well I’ve had it running for a few days and have played around with it a bit, so I thought I’d review it. I’m going to start with my reasons for moving from GA and then move along and score it on several different criteria:

  • Installation and Setup
  • Site integration
  • User interface
  • Extensibility (API availability)
  • Overall impressions (documentation, community, etc.)

The philosophical argument

As well as the obvious benefit (from a Freedom perspective) of using one less proprietary web service, there is also another reason that I switched away from Google Analytics. Basically, this was privacy. For a while I’ve been using technologies to limit the amount of data which leaks from my browser as I navigate the web, in order to reduce the amount of profiling of my web activities. This isn’t because I have anything to hide. I just don’t like the idea of large companies building up a huge database on me, without my permission. The upshot of this is that I found myself in the slightly hypocritical situation of blocking GA in my own browser, but using it to track others on my site.

The solution was obvious, remove GA from my site. However, I didn’t want to lose the valuable information that it provides me with. Also, I don’t have a problem with site owners collecting data that can help them, just with them sharing it with 3rd parties such as Google, who then build it into their larger profiling efforts. A quick search turned up Piwik which aims to provide a full featured GA replacement that you can run on your own server. Because site owners run their own instances, they remain in charge of their tracking of users, retain ownership of the data and best of all don’t give any data to Google.

With the aim of responsible and unobtrusive tracking in mind I’ve added a page to my site to allow users to Opt-out of the Piwik tracking by means of a cookie. The link is also accessible from the sidebar under the copyright notice. I’m afraid some of the text on that page is pretty difficult to see with my current theme, but I’m working on this. For now just uncheck the check box to opt-out.

Right, on to the main event, the actual review…

1. Installation and Setup

There’s actually not much to say here, which is because installation was ridiculously easy! I just downloaded the zip to my sever (with wget) and unzipped it into my server root directory. This produced a directory called ‘piwik’ and a ‘How to Install Piwik.html’ file, which if you point your browser at it will redirect you to the installation instructions. The rest of the installation was fairly simple, following the instructions I pointed my browser at the ‘/piwik/’ directory of my site and was greeted by the installer. Following this was really easy, you’ll need to create a MySQL database when prompted for the database info, but that’s about as hard as it gets. Towards the end you’ll be prompted to setup your site with Piwik which involves entering a few details about the site, then you’ll be provided with a snippet of JavaScript to add to your site template. Which leads me neatly into the next section…

2. Site Integration

I didn’t copy and paste the JavaScript into my template, instead opting to install the WP-Piwik addon for WordPress. This made the set up easy and also gave me a widget on my WordPress admin dashboard which gives me a nice overview of my site visits. As I already said I was also able to add a widget to the site to enable visitors to opt-out of tracking. This was also simple, just involving a copy and paste of a couple of lines of HTML from one of the settings pages into a WordPress page. Easy!

You can also integrate Piwik widgets with your site, by following the instructions in the documentation, this is a neat feature, especially if you have a custom start page set in your web browser (something which I have yet to get around to making).

I also investigated the campaigns functionality in order to track entries to my site from the RSS feed. This is really simple to use, all you have to do is append the query string ‘?piwik_campaign=NAME’, where NAME is the name of your campaign to the end of a URL, to have it show up under that campaign. I found that I could integrate this with WordPress pretty well by adding the following snippet of code to the functions.php file of my theme:

If you now check the URLs in your RSS feeds, they will all have the query string added and clicks will be attributed to the ‘RSS’ campaign in Piwik.

3. User Interface

The Piwik user interface is really nice. I’ve included some screenshots below, so that you can make up your own mind. It’s pretty similar to the GA user interface, only cleaner and all the AJAX stuff makes it feel really responsive. I also love the real time tracking widget, which is something GA totally lacks. The only bad thing about the UI is the requirement of Flash for the graphs. I hate Flash and it doesn’t have a reliable 64-bit Linux version, which means I only have it installed on my netbook. Oh, and before you ask, I tried it with Gnash and it didn’t work!

4. Extensibility

By extensibility, I was primarily interested in API access. There’s certainly no shortage of this with two APIs listed on the documentation page. One API is for performing tracking, which I didn’t need given my usage of the WordPress plugin. I looked instead at the analytics API, which allows you to access all the data through simple HTTP requests. I was able to write a simple Python script to email me my main statistics once a day, in about an hour (including working out how the Python email and smtplib modules work!). Performing an Piwik API call in Python is as simple as:

Of course, as it’s Python its ridiculously simple!

Of course, if you find something that you can’t do with the API (which is unlikely, because it seems to cover everything), the you can access the data in the database – because it’s in YOUR database. You can also back-up and secure your data exactly how you want to. This is something that GA just can’t compete with!

5. Overall Impressions

My impressions of Piwik as a project have been really good. The documentation is excellent and there seems to be a good community behind it. As a product its a pleasure to use, really easy to install and just works. The reliance on flash for the graphs is a bit disappointing, but perhaps this will change in the future as HTML5 matures. Here are the obligatory scores:

  • Installation and Setup – 5/5
  • Site Integration – 4/5
  • User Interface – 3/5
  • Extensibility – 5/5

Overall Score: 4/5

Verdict: If your currently using Google Analytics, stop it! (and use this instead)

Giving this site an overhaul…

Things are changing…

I’m currently in the process of giving this site an overhaul, with the intention of increasing the number of visitors I get. I’m also going to experiment with various ways to try and support the site (with the eventual aim of making the site self sufficient). So far I’ve changed a couple of things:

  • I’ve added a proper image gallery thanks to the NextGEN Gallery plugin. This isn’t really intended to increase visitor numbers but gives me an easier way to share my photos, without surrendering them to Facebook! You can view my public gallery now, but I still need to upload more photos.
  • I’ve switched away from Google Analytics, to a self hosted Piwik instance. This lets me take control of my tracking data and for visitors to my site it means that your usage data stays between you and me, without Google butting in! I’m also going to add the opt-out button to the side bar for anyone who really doesn’t want to share their data. So far I’m really pleased with Piwik, hopefully I’ll review it when its been running for a bit longer.

Here’s what’s in the pipeline:

  • Removing Google Ads: basically they’re not making me any money and I’m increasingly objecting to the tracking/profiling they do. I’m going to replace them with AdBard, but I’m still waiting for my account to be approved.
  • I also want to experiment with Flattr: which as well as hopefully making me some money, will allow me to give something back to other people producing content I like.
  • A new site theme: probably a modified version of an already available theme. I want something clean and I want to be able to personalise it easily. Suggestions will be gratefully received! (via the comments on this post).
  • More pages: I really need a better about page and I also want to do a digital CV. I’m also thinking about project pages for the coding projects I’m working on.
  • More content: obviously, the secret to getting more visitors lies in having more quality content. I happen to think my content is of good quality and some of it scores pretty highly in related google searches. I just need more of it! This comes down to having the time to blog and stuff to blog about. I’m trying to set aside more time for blogging and think of more topics to blog about. Be prepared to see a wider range of content hopefully appearing soon.

I’m approaching this, as I initially approached blogging, as an experiment. I don’t know how far I’ll get, but I’ll report on my progress as I go. If anyone has anything they want to share or their own experiences from promoting their own site, please feel free to comment.