checkmk docker container

Monitoring All The Things with checkmk

This post may contain affiliate links. Please see the disclaimer for more information.

Once your number of connected devices grows to a certain size, it becomes difficult to keep track of them all manually. At this point you’re going to want to turn to software to do this job for you. Network monitoring software fills this need nicely. The old stable Nagios used to be my go to for this. However, I switched to checkmk in the last couple of years and have found it quite superior. I’ve recently been doing some maintenance and upgrades to my monitoring setup, so now was a good time to write it up.

Checkmk is an Open Source infrastructure and application monitoring tool. It will keep track various attributes of your networked devices and alert you if they fall outside of pre-programmed thresholds. At its core it wraps Nagios, but provides a nicer UI and GUI configuration tool. checkmk also supports autodiscovery of services and checks to be performed on each system. This means you can spin up a monitoring system with great coverage in less than half the time you would spend fiddling around with a Nagios configuration.

Monitoring vs. Metrics

I’m going to take a little time to discuss the modern trend towards “metrics” based monitoring and how it relates to more traditional monitoring approaches. Modern metrics based systems such as Prometheus and Influxdb collect a series of aggregated data points about a running system. Typically these are written to a time series database (Influxdb actually is purely a time series database and requires external data collection tools). This data would then be used to feed some graphing/dashboarding tool, such as Grafana and also to generate alerts.

This is a different approach to traditional monitoring. To my mind metrics gathering is more passive. The system is only asking what the other systems (or typically only the applications on those systems) know about themselves. It is not actually probing the network to check that things are working. Monitoring systems not only collect data from systems, but actually probe the network to create data, for example whether a given service is responding or not.

The metrics gathering approach works well if all you care about are the applications. Obviously the application knows everything about itself, so why not ask it? It’s particularly popular in cloud environments where someone else is caring about the underlying infrastructure. Anywhere you care about physical infrastructure you’d probably be better off with a monitoring system first. Of course, this doesn’t preclude adding a metrics system later!

I think its important not to get these tools mixed up, they serve different purposes, even if there is some overlap.

Let’s Start Monitoring

I’ve had checkmk installed inside an LXD container for some time. In that time it’s been running pretty much flawlessly and alerting me to issues with my network as they come up. I recently decided it was time for an update, since I was still running 1.4.x and 1.5 had been out for a while. In the process I thought I’d move it into Docker, as I’ve been moving everything else. However, I ran into some issues with that.

I had several issues moving my configuration to the new server and getting it to run in the Docker container. Most of these could have been solved if I’d migrated the configuration to the new version (described below) before moving it. I did hit one show stopper issue though. This came about because my Docker data volumes are stored on NFS. Basically it looks like NFS doesn’t support file locking very well. This causes checkmk to throw “Bad file descriptor” errors all over the place.

checkmk nfs error
Oops, looks like checkmk doesn’t like it’s sites directory on NFS

In the end I decided to stick with my LXD container for this system. Eventually this will be migrated to an LXC container when I switch the host server over to Proxmox. The Docker setup is not actually recommended for a full monitoring install anyway, so I don’t feel too bad about this.

This is all a long winded way of saying, go and follow the Linux installation instructions if you are installing checkmk from scratch!

Upgrading My Configuration

Since I already had an existing system I just updated it by installing the new version. In the process checkmk 1.6 was released (good timing!), so I actually did this twice. To do the update I basically just downloaded the latest .deb file and installed it with dpkg -i.

It should be noted that this won’t overwrite your current install. There is a good reason for this, since checkmk has a configuration migration step which must be performed. This can be done using the omd tool. The steps to do so look like this:

$ sudo omd sites # list the monitoring instances we have available
SITE             VERSION          COMMENTS
mysitename       1.5.0p30.cre     default version
$ sudo omd stop mysitename # stop the instance in question
$ sudo omd update mysitename
# at this point omd will update the config, it will ask you
# questions along the way in a curses window, just follow the 
# prompts
$ sudo omd start mysitename # start the instance back up

Assuming that all went OK, you’re good to start actually monitoring other systems. We’ll cover the things I monitor in the next few sections.

Monitoring Standard Linux Systems

This is the easy one and pretty much standard fare for any monitoring system. Just install the checkmk agent for your distribution by following the official instructions and off you go. When you add a new host via WATO (the configuration GUI) checkmk will auto-discover as many services as it can for you. There are also a whole load of plugins that can be deployed to monitor other services.

checkmk linux server
Some of the service checks running against “eomer” my home automation Docker host

In my system, Linux machines are the majority of the hosts. This includes physical host machines, VMs, LXC/LXD containers and Raspberry Pis (since these are just Debian machines really). The only thing I haven’t had much luck with are my Libreelec machines. This seems to be because Libreelec doesn’t include a shell capable of running the agent script. I’ve been wondering if running the agent in a sufficiently privileged Docker container would work. However for now I’m just monitoring them externally (see “Monitoring Other Devices” below).

Monitoring pfSense

Since pfSense is really just FreeBSD underneath the checkmk BSD agent works fine with it. pfSense also supports SNMP out of the box. Therefore you can get more monitoring coverage by using an Agent+SNMP approach. There is a great tutorial over at Open School Solutions, which is what I followed to set all that up.

checkmk pfsense
pfSense will report the status of all its network interfaces among its checks

Monitoring OpenWRT Devices

OpenWRT devices can be monitored just like Linux machines. However you will need to install the OpenWRT specific agent from the agents page of your checkmk install. The agent can be run via either xinetd or SSH just as on a standard Linux machine.

OpenWRT devices will also return some extra information over SNMP if mini_snsmpd is installed on them. This makes the same Agent+SNMP approach adopted for pfSense desirable.

Monitoring Docker Containers

This one is new to me since I’ve just set it up after updating to the new version. The basic idea here is that you set up monitoring on the Docker host as you would a standard Linux machine. However, you also install the mk_docker.py plugin. This will do two things. Firstly it will expose a load of Docker checks on the host the next time you run service discovery. These by themselves are pretty useful. Secondly it will allow you to add each Docker container on the host as a host in it’s own right. The check data for the Docker container host is piggybacked from the agent connection to its Docker host.

At first I thought this was going to be pretty unwieldy, since you have to manually add a host entry for each Docker container. However, checkmk provides us with some tools to make this easier, which I’ll cover below.

Before you finish the setup on the Docker host, I’d recommend that you set the container_id variable in the /etc/checkmk/docker.cfg config file to name. This will allow you to add your container hosts to checkmk by name rather than by hex ID, which is much more human readable. It also avoids the situation where the hex ID changes if the container is destroyed and re-created. This obviously happens during an update of the container.

Adding Docker Container Hosts Quickly

checkmk wato folder
Setting up a host folder in WATO allows you to apply the same initial options to a bunch of hosts quickly

The next thing to do is to create a directory in WATO. Adding a host to a directory automatically populates the host with the configuration template used when the directory was created. This makes it super quick to just run through adding hosts with the same configuration such as our Docker containers.

The third thing to do is to create a new Host Group for your Docker containers. In this way you can easily see separate listings of your containers and other hosts on the monitoring dashboards.

checkmk hostgroup
Creating a host group for my Docker contains allow me to view just the statuses of the containers

With these things in place I’m finding the Docker container monitoring pretty useful in my fairly static environment. Due to the manual steps involved it’s really not going to work in a highly dynamic environment, but it works for my needs.

checkmk docker container
My Gotify container is unhappy – because I artificially stopped it for the screenshot!

Monitoring Windows Machines

Obviously checkmk also has an agent for Windows, so you can monitor those machines too. If you have any. I don’t, so I can’t comment on how well it works!

Monitoring Other Devices

For monitoring other devices that don’t support any of the available agents, you’re pretty much limited to external probing. With this you can get basic information on whether the device is up or down (via ping) and can check the availability of services on any open ports via manual checks.

Manual checks are in fact useful against any of the systems listed above as an external verification that a service is responding. As a minimum I typically implement an SSH check as well as HTTP checks on any open web services.

I have this approach deployed against several devices on my network including the two Libreelec devices mentioned above, my HDHomeRun and various IoT devices. It works pretty well – especially as the usual problem with these devices is that someone has unplugged them!

Conclusion

I hope this has given you a glimpse into my monitoring setup. It’s impossible to describe every part of it in full detail so I’ve opted for presenting only an overview here. Even after nearly two years I’m still adding and changing stuff on this system, so I may write up some of the more interesting details in future.

I’m really happy with my choice of checkmk to drive my monitoring system. It’s a nice blend of the reliability and stability of Nagios, with a layer of UI which makes it much easier to get a fully featured system. The recent upgrades have provided some useful features and it’s great to see it under such active development!

Next Steps

I still have some corners of my network where my monitoring setup doesn’t cover. Mainly just through lack of time to get it all set up. I’ll be looking to Ansible to help with this soon.

I’d also like to add some kind of metrics gathering system, probably Influxdb. This will be for Home Assistant data and metrics from my Traefik proxies (as well as anything else I can feed to it). I’d also like to round out the trifecta of observation systems with a log storage/aggregation system. I just wish there was something more lightweight than an ELK stack!

If you have any comments or improvements to my setup, as always, let me know in the feedback channels. I’m also interested in how others are handling this, so get in touch and let me know.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Loading

ansible gitlab ci

Automating My Infrastructure with Ansible and Gitlab CI: Part 1 – Getting Started

This post may contain affiliate links. Please see the disclaimer for more information.

This is the first part in a multi-part series following my adventures in automating my self-hosting infrastructure with Ansible, running from Gitlab CI. In this post I’ll cover setting up my Ansible project, setting up the remote machines for Ansible/CI deployment, some initial checks in CI and automating of routine updates via our new system.

I’ve used Ansible quite extensively in the past, but with my recent focus on Docker and Gitlab CI I thought it was worth having a clean break. Also my previous Ansible configurations were a complete mess, so it’s a good opportunity to do things better. I’ll still be pulling in parts of my old config where needed to prevent re-inventing the wheel.

Since my ongoing plan is to deploy as many of my applications as possible with Docker and docker-compose, I’ll be focusing mainly in tasks relating to the host machines. Some of these will be set up tasks which will deploy a required state to each machine. Others will be tasks to automate routine maintenance.

Inventory Setup

Before we get started, I’ll link you to the Gitlab repository for this post. You’ll find that some of the files are encrypted with ansible-vault, since they contain sensitive data. Don’t worry though, I’ll go through them as examples, starting with hosts.yml.

The hosts.yml file is my Ansible inventory and contains details of all the machines in my infrastructure. Previously, I’d only made use of inventories in INI format, so the YAML support is a welcome addition. The basic form of my inventory file is as follows:

---
all:
  vars:
      ansible_python_interpreter: /usr/bin/python3
      ansible_user: ci
      ansible_ssh_private_key_file: ./id_rsa
  children:
    vms:
      hosts:
        vm1.mydomain.com:
          ansible_sudo_pass: supersecret
          password_hash: "[hashed version of ansible_sudo_pass]"
          # use this if the hostname doesn't resolve
          #ansible_host: [ip address]
        vm2.mydomain.com:
          ...
    rpis:
      # Uncomment this for running before the CI user is created
      #vars:
      #  ansible_user: pi
      hosts:
        rpi1.mydomain.com:
          ...
    physical:
      hosts:
        phys1.mydomain.com:
          ...

To create this file we need to use the command ansible-vault create hosts.yml. This will ask for a password which you will need when running your playbooks. To edit the file later, replace the create subcommand with edit.

As you can see we start out with the top level group called all. This group has several variables set to configure all the hosts. The first of these ensures that we are using Python 3 on each remote system. The other two set the remote user and the SSH key used for authentication. I’m connecting to all of my systems with a specific user, called ci, which I will handle setting up in the next section.

The remainder of the file specifies the remote hosts in the infrastructure. For now I’ve divided these up into three groups, corresponding to VMs, Raspberry Pis and physical host machines. It’s worth noting that a host can be in multiple groups, so you can pretty much get as complicated as you like.

Each host has several variables associated with it. The first of these is the password for sudo operations. I like each of my hosts to have individual passwords for security purposes. The second variable (password_hash) is a hashed version of the same password which we will use later when setting up the user. These hashes are generated with the mkpasswd command as per the documentation. The final variable (ansible_host) I’m using here is optional and you only need to include it if the hostname of the server in question can’t be resolved via DNS. In this case you can specify an IP address for the server here.

In order to use this inventory file we need to pass the -i flag to Ansible (along with the filename) on every run. Alternatively, we can configure Ansible to use this file by creating an ansible.cfg file in our current directory. To do this we download the template config file and edit the inventory line so it looks like this:

inventory      = hosts.yml

Setting Up the Remote Machines

At this point you should comment out the ansible_user and ansible_ssh_private_key_file lines in your inventory, since we don’t currently have a ci user on any of our machines and we haven’t added the key yet. This we will take care of now – via Ansible itself. I’ve created a playbook which will create that user and set it up for use with our Ansible setup:

---
- hosts: all
  tasks:
    - name: Create CI user
      become: true
      user:
        name: ci
        comment: CI User
        groups: sudo
        append: true
        password: "{{ password_hash }}"

    - name: Create .ssh directory
      become: true
      file:
        path: /home/ci/.ssh
        state: directory
        owner: ci
        group: ci
        mode: 0700

    - name: Copy authorised keys
      become: true
      copy:
        src: ci_authorized_keys
        dest: /home/ci/.ssh/authorized_keys
        owner: ci
        group: ci
        mode: 0600

Basically all we do here is create the user (with the password from the inventory file) and set it up for access via SSH with a key. I’ve put this in the playbooks subdirectory in the name of keeping things organised. You’ll also need a playbooks/files directory in which you should put the ci_authorized_keys file. This file will be copied to .ssh/authorized_keys on the server, so obviously has that format. In order to create your key generate it in the normal way with ssh-keygen and save it locally. Copy the public part into ci_authorized_keys and keep hold of the private part for later (don’t commit it to git though!).

Now we should run that against our servers with the command:

ansible-playbook --ask-vault-pass playbooks/setup.yml

This will prompt for your vault password from earlier and then configure each of your servers with the playbook.

Spinning Up the CI

At this point we have our ci user created and configured, so we should uncomment those lines in our inventory file. We can now perform a connectivity check to check that this worked:

ansible all -m ping

If that works you should see success from each host.

Next comes our base CI setup. I’ve imported some of my standard preflight jobs from my previous CI pipelines, specifically the shellcheck, yamllint and markdownlint jobs. The next stage in the pipeline is a check stage. Here I’m putting jobs which check that Ansible itself is working and that the configuration is valid, but don’t make any changes to the remote systems.

I started off with a generic template for Ansible jobs:

.ansible: &ansible
  image:
    name: python:3.7
    entrypoint: [""]
  before_script:
    - pip install ansible
    - ansible --version
    - echo $ANSIBLE_VAULT_PASSWORD > vault.key
    - echo "$DEPLOYMENT_SSH_KEY" > id_rsa
    - chmod 600 id_rsa
  after_script:
    - rm vault.key id_rsa
  variables:
    ANSIBLE_CONFIG: $CI_PROJECT_DIR/ansible.cfg
    ANSIBLE_VAULT_PASSWORD_FILE: ./vault.key
  tags:
    - ansible

This sets up the CI environment for Ansible to run. We start from a base Python 3.7 image and install Ansible via Pip. This takes a little bit of time doing it on each run so it would probably be better to build a custom image which includes Ansible (all the ones I found were out of date).

The next stage is to set up the authentication tokens. First we write the ANSIBLE_VAULT_PASSWORDvariable to a file, followed by the DEPLOYMENT_SSH_KEY variable. These variables are defined in the Gitlab CI configuration as secrets. The id_rsa file requires its permissions set to 0600 for the SSH connection to succeed. We also make sure to remove both these files after the job completes.

The last thing to set are a couple of environment variables to allow Ansible to pick up our config file (by default it won’t in the CI environment due to a permissions issue). We also need to tell it the vault key file to use.

Check Jobs

I’ve implemented two check jobs in the pipeline. The first of these performs the ping action which we tested before. This is to ensure that we have connectivity to each of our remote machines from the CI runner. The second iterates over each of the YAML files in the playbooks directory and runs them in check mode. This is basically a dry-run. I’d prefer if this was just a syntax/verification check without having to basically run through the whole thing, but there doesn’t seem to be a way to do that in Ansible.

The jobs for both of these are shown below:

ping-hosts:
  <<: *ansible
  stage: check
  script:
    - ansible all -m ping

check-playbooks:
  <<: *ansible
  stage: check
  script:
    - |
      for file in $(find ./playbooks -maxdepth 1 -iname "*.yml"); do
        ansible-playbook $file --check
      done

Before these will run on the CI machine, we need to make a quick modification to our ansible.cfg file. This will allow Ansible to accept the SSH host keys without prompting. Basically you just uncomment the host_key_checking line and ensure it is set to false:

host_key_checking = False

Doing Something Useful

At this stage we have our Ansible environment set up in CI and our remote machines are ready to accept instructions. We’ve also performed some verification steps to give us some confidence that we are doing something sensible. Sounds like we’re ready to make this do something useful!

Managing package updates has always been a pain for me. Once you get beyond a couple of machines, manually logging in and applying updates becomes pretty unmanageable. I’ve traditionally taken care of this on a Saturday morning, sitting down and applying updates to each machine whilst having my morning coffee. The main issue with this is just keeping track of which machines I already updated. Luckily there is a better way!

[[Side note: Yes, before anyone asks, I am aware of the unattended-upgrades package and I have it installed. However, I only use the default configuration of applying security updates automatically. This is with good reason. I want at least some manual intervention in performing other updates, so that if something critical goes wrong, I’m on hand to fix it.]]

With our shiny new Ansible setup encased in a layer of CI, we can quite easily take care of applying package upgrades to a whole fleet of machines. Here’s the playbook to do this (playbooks/upgrades.yml):

---
- hosts: all
  serial: 2
  tasks:
    - name: Perform apt upgrades
      become: true
      when: ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'
      apt:
        update_cache: true
        upgrade: dist
      notify:
        - Reboot

    - name: Perform yum upgrades
      become: true
      when: ansible_distribution == 'CentOS'
      yum:
        name: '*'
        state: latest
      notify:
        - Reboot

  handlers:
    - name: Reboot
      when: "'physical' not in group_names"
      become: true
      reboot:

This is pretty simple, first we apply it to all hosts. Secondly we specify the line serial: 2 to run only on two hosts at a time (to even out the load on my VM hosts). Then we get into the tasks. These basically run an upgrade selectively based upon the OS in question (I still have a couple of CentOS machines knocking around). Each of the update tasks will perform the notify block if anything changes (i.e. any packages got updated). In this case all this does is execute the Reboot handler, which will reboot the machine. The when clause of that handler causes it not to execute if the machine is in the physical group (so that my host machines are not updated). I still need to handle not rebooting the CI runner host, but so far I haven’t added it to this system.

We could take this further, for example snapshotting any VMs before doing the update, if our VM host supports that. For now this gets the immediate job done and is pretty simple.

I’ve added a CI job for this. The key thing to note about this is that it is a manual job, meaning it must be directly triggered from the Gitlab UI. This gives me the manual step I mentioned earlier:

package-upgrades:
  <<: *ansible
  stage: deploy
  script:
    - ansible-playbook playbooks/upgrades.yml
  only:
    refs:
      - master
  when: manual

Now all I have to do on a Saturday morning is click a button and wait for everything to be updated!

ansible gitlab ci
Our finished pipeline. Note the state of the final “package-upgrades” job which indicates a manual job. Gitlab helpfully provides a play button to run it.

Conclusion

I hope you’ve managed to follow along this far. We’ve certainly come a long way, from nothing to an end to end automated system to update all our servers (with a little manual step thrown in for safety).

Of course there is a lot more we can do now that we’ve got the groundwork set up. In the next instalment I’m going to cover installing and configuring software that you want to be deployed across a fleet of machines. In the process we’ll take a look at Ansible roles. I’d also like to have a look at testing my Ansible roles with Molecule, but that will probably have to wait for yet another post.

I’m interested to hear about anyone else’s Ansible setup, feel free to share via the feedback channels!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Loading

internal HTTPS Linode Traefik

Internal HTTPS with Let’s Encrypt, Linode DNS and Traefik

This post may contain affiliate links. Please see the disclaimer for more information.

In my previous article on the Traccar GPS tracking software, I lamented the state of my broken internal HTTPS/TLS setup. I’ve known that using DNS validation for Let’s Encrypt was the way to fix this for some time. However, that required me to migrate my DNS to another provider, because my provider (Namecheap) only allow API access if you have a large enough account. Since I wrote that article, I’ve been investigating this further and have found my solution in the form of Linode‘s Domains Service. As I’m already using Linode for hosting my cloud servers (including this website) this is the perfect option for me.

The application which prompted me getting this sorted was bitwarden_rs, which I really didn’t want to run over plain HTTP and also didn’t want to expose to the outside world. I’d recommend the excellent Self Hosted Home article on this as a getting started point if you’d like to deploy it.

Migrating my DNS to Linode

For a job that I wasn’t looking forward to, this migration couldn’t have gone better. The main issue I encountered was that I wasn’t able to import my records over directly from Namecheap, but I think that’s a problem at their end rather than Linode’s. I ended up copying over the records one by one, which was easier than I expected since I didn’t have as many records as I thought.

One other minor issue that I ran into was the SOA email address that I had to add. Weirdly I never had to specify this with Namecheap. The problem stemmed from the fact that Linode will not allow you to use an address from the same domain for this. I contacted Linode support about this because it seemed contrary to their docs. They responded incredibly quickly (time measured in minutes, over a weekend too) and said that is so you can be notified about problems with your domain. This makes sense, but it is a little annoying. In the end I created another third party (gmail, boo!) address for this and forwarded it back to my main self hosted account. I also added it as a secondary account on my phone, just in case.

Getting an API Token

In order to perform the DNS-01 certificate validation with Linode, your client software needs to create a temporary DNS record. This requires an API token to authenticate to the Linode Domains API. This token can be created from the Linode Manager. The only permission required is read/write access to the Domains service, as per the screenshot below:

internal HTTPS Linode Traefik
Required Linode API token permissions

Setting Up Traefik

I wanted to use Traefik as my reverse proxy for this, given my previous success with it. This was massively complicated by the fact that Traefik 2.0 was released just a few days ago. This release introduces a lot of changes both in concepts and configuration, which make Traefik significantly more complex. I was able to work through these changes, but it took me a while to work it all out!

internal HTTPS Linode Traefik
The Traefik Dashboard has changed quite a bit

The Traefik setup I ended up with was as follows (in docker-compose.yml):

---
version: '3'
 
services:
  traefik:
    image: traefik:v2.0
    command:
      - "--accesslog=true"
      - "--api.insecure=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.mydnschallenge.acme.dnschallenge=true"
      - "--certificatesresolvers.mydnschallenge.acme.dnschallenge.provider=linodev4"
      - "--certificatesresolvers.mydnschallenge.acme.email=insertemailaddress"
      - "--certificatesresolvers.mydnschallenge.acme.storage=/letsencrypt/acme.json"
    volumes:
      - /mnt/docker-data/traefik:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - "80:80"
      - "443:443"
      - "8080:8080"
    environment:
      - LINODE_TOKEN=insertlinodetoken
    networks:
      - external
    restart: always

Obviously, you need to insert your email address and Linode API token in the relevant places. Make sure not to include quotes around the API token, since these will be passed into the container and make the token invalid. This appears to be some weird and surprising behaviour in docker-compose.

This configuration sets up Traefik with a DNS challenge certificate resolver called mydnschallenge. This needs to be specified in the configuration for each service that you want to use it with. This is done by adding following labels to your service in docker-compose.yml:

labels:
      - 'traefik.enable=true'
      - "traefik.http.routers.myservice.rule=Host(`myservice.internal.example.com`)"
      - "traefik.http.routers.myservice.entrypoints=websecure"
      - "traefik.http.routers.myservice.tls.certresolver=mydnschallenge"
      - "traefik.http.services.myservice.loadbalancer.server.port=8080"

These labels first enable Traefik for the container in question. Then we add a new router called myservice and specify that it should be attached to the websecure entrypoint at the hostname given. We then add our mydnschallenge as the TLS certificate resolver. Finally, I specify the backend port on which this service listens – this isn’t required if it just listens on port 80.

With this in place you should be able to successfully get a certificate and then access your service over HTTPS. For me this took a few minutes the first time for the DNS to refresh, but this won’t be an issue for future renewals.

HTTP to HTTPS Redirects with Traefik 2.0

So far our service is available only via HTTPS. If you’ve followed the above configuration for Traefik then it is also listening on port 80 for plain HTTP. However, you will receive a 404 error if you try to access your service via HTTP. What we want to do is the traditional redirect from HTTP to HTTPS.

This turns out to be a little more complex in Traefik 2.0 than in previous versions and requires a little understanding of some new concepts. Traefik 2.0 introduces the concept of middlewares, which can operate on a request as it passes between the router and the backend service. One such middleware is the redirectscheme, which will do exactly what we need. Adding the following labels to your service container will do the trick:

      - "traefik.http.middlewares.myservice_redirect.redirectscheme.scheme=https"
      - "traefik.http.routers.myservice_insecure.rule=Host(`myservice.internal.example.com`)"
      - "traefik.http.routers.myservice_insecure.entrypoints=web"
      - "traefik.http.routers.myservice_insecure.middlewares=myservice_redirect@docker"

Here we create the middleware myservice_redirect before adding another router (myservice_insecure). This router is attached to the web entrypoint at our hostname, making it available on port 80. Finally we add the myservice_redirect middleware to our new router and we are done! It’s pretty simple when you understand it the concepts.

internal HTTPS Linode Traefik
The Traefik dashboard shows a nice pipeline of how your request gets routed

Conclusion

This solves the problem of internal HTTPS perfectly for me! I’m looking forward to migrating other internal services over to this arrangement. Of course, the configuration presented here only works with Traefik and not other software such as Nginx or Mosquitto. There are other options here which support the Linode API for DNS validation. However, since Traefik 2.0 supports arbitrary TCP services I think I’m going to give that a try for my MQTT server. It’s one less piece of software to maintain!

I’m really happy with how my DNS migration went and how well this all works with the Linode Domains service. It’s nice to discover another aspect to a service that I’ve been using for so long.

I hope you find this setup useful, or perhaps you’ve already solved this problem in a different way. Please feel free to let me know in the feedback channels.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Loading

getting started with appdaemon

Getting Started with AppDaemon for Home Assistant

This post may contain affiliate links. Please see the disclaimer for more information.

Continuing on from last weeks post, I was also recently persuaded to try out AppDaemon for Home Assistant (again). I have previously tried out AppDaemon, so that I could use the excellent OccuSim app. I never got as far as writing any apps of my own and I hadn’t reinstalled it in my latest HASS migration. This post is going to detail my first steps in getting started with AppDaemon more seriously.

I should probably start with a run down of what AppDaemon is for anyone that doesn’t know. The AppDaemon website provides a high level description:

AppDaemon is a loosely coupled, multithreaded, sandboxed python execution environment for writing automation apps for home automation projects, and any environment that requires a robust event driven architecture.

In plain English, that basically means it provides an environment for writing home automation rules in Python. The supported home automation platforms are Home Assistant and plain MQTT. For HASS, this forms an alternative to both the built in YAML automation functionality and 3rd party systems such as Node-RED. Since AppDaemon is Python based, it also opens up the entirety of the Python ecosystem for use in your automations.

AppDaemon also provides dashboarding functionality (known as HADashboard). I’ve decided not to use this for now because I currently have no use for it. I also think the dashboards look a little dated next to the shiny new HASS Lovelace UI.

Installation

I installed AppDaemon via Docker by following the tutorial. The install went pretty much as expected. However, I had to clean up the config files from my old install before proceeding. The documentation doesn’t provide an example docker-compose configuration, so here’s mine:

appdaemon:
    image: acockburn/appdaemon:latest
    volumes:
      - /mnt/docker-data/home-assistant:/conf
      - /etc/localtime:/etc/localtime:ro
    depends_on:
      - homeassistant
    restart: always
    networks:
      - internal

I’ve linked the AppDaemon container to an internal network, on which I’ve also placed my HomeAssistant instance. That way AppDaemon can talk to HASS pretty easily.

You’ll note that I’m not passing any environment variables as per the documentation. This is because my configuration is passed only via the appdaemon.yaml file, since it allows me to use secrets:

---
log:
  logfile: STDOUT
  errorfile: STDERR

appdaemon:
  threads: 10
  timezone: 'Pacific/Auckland'
  plugins:
    HASS:
      type: hass
      ha_url: "http://172.17.0.1:8123"
      token: !secret appdaemon_token

You’ll see here that I use the docker0 interface IP to connect to HASS. I tried using the internal hostname (which should be homeassistant on my setup), but it didn’t seem to work. I think this is due to the HASS container being configured with host networking.

Writing My First App

I wanted to test of the capabilities and ease of use of AppDaemon. So, I decided to convert one of my existing automations into app form. I chose my bathroom motion light automation, because it’s reasonably complex but simple enough to complete quickly.

I started out by copying the motion light example from the tutorial. Then I updated it to take configuration parameters for the motion sensor, light and off timeout:

class MotionLight(hass.Hass):                                                                                                                                                                   
                                                                                                                                                                                                
    def initialize(self):                                                                                                                                                                       
        self.motion_sensor = self.args['motion_sensor']                                                                                                                                         
        self.light = self.args['light']                                                                                                                                                         
        self.timeout = self.args['timeout']                                                                                                                                                     
                                                                                                                                                                                                
        self.timer = None                                                                                                                                                                       
        self.listen_state(self.motion_callback, self.motion_sensor, new = "on")                                                                                                                 
                                                                                                                                                                                                
    def set_timer(self):                                                                                                                                                                        
        if self.timer is not None:                                                                                                                                                              
            self.cancel_timer(self.timer)                                                                                                                                                       
        self.timer = self.run_in(self.timeout_callback, self.timeout)                                                                                                                           
                                                                                                                                                                                                
    def is_light_times(self):                                                                                                                                                                   
        return self.now_is_between("sunset - 00:10:00", "sunrise + 00:10:00")

    def motion_callback(self, entity, attribute, old, new, kwargs):
        if self.is_light_times():
            self.turn_on(self.light)
            self.set_timer()

    def timeout_callback(self, kwargs):
        self.timer = None
        self.turn_off(self.light)

I’ve also added a couple of utility methods to manage the timer better and also to specify more complex logic to restrict when the light will come on. Encapsulating both of these in their own methods will allow re-use of them later on.

The timer logic of the example app is particularly problematic in the case of multiple motion events. In the original logic one timer will be set for each motion event. This leads to the light being turned off even if there is still motion in the room. It also caused some general flickering of the light between motion events and triggered callbacks. I mitigate this in the set_timer method here by first cancelling the timer if it is active before starting a new timer with the full timeout.

At this point, we have a fully functional and re-usable motion activated light. We can instantiate as many of these as we would like in our apps/apps.yaml file, like so:

motion_light_1:
  module: motion_lights
  class: MotionLight
  motion_sensor: binary_sensor.motion_sensor_1
  light: light.light_1
  timeout: 120

motion_light_2:
  module: motion_lights
  class: MotionLight
  motion_sensor: binary_sensor.motion_sensor_2
  light: light.light_2
  timeout: 60

...

Note that we haven’t yet recreated the functionality of my original automation. In that automation, the brightness was controlled by the door state. We’ll tackle this next.

Extending the App

Since our previous MotionLight app is just a Python object, we can take advantage of the object orientated capabilities of the Python language to extend it with further functionality. Doing so allows us to maintain the original behaviour for some instances, whilst also customising for more complex functionality.

Our subclassed light looks like this:

class BrightnessControlledMotionLight(MotionLight):                                                                                                                                             
                                                                                                                                                                                                
    def initialize(self):                                                                                                                                                                       
        self.last_door = "Other"                                                                                                                                                                
        for door in self.args['bedroom_doors']:                                                                                                                                                 
            self.listen_state(self.bedroom_door_callback, door, old = "off", new = "on")                                                                                                        
        for door in self.args['other_doors']:                                                                                                                                                   
            self.listen_state(self.other_door_callback, door, old = "off", new = "on")                                                                                                          

        super().initialize()

    def bedroom_door_callback(self, entity, attribute, old, new, kwargs):
        self.last_door = "Bedroom"
        self.log("Last door is: {}".format(self.last_door))

    def other_door_callback(self, entity, attribute, old, new, kwargs):
        self.last_door = "Other"
        self.log("Last door is: {}".format(self.last_door))

    def motion_callback(self, entity, attribute, old, new, kwargs):
        if self.is_light_times():
            if self.get_state(entity=self.light) == "off":
                if self.now_is_between("07:00:00", "20:00:00") or self.last_door != "Bedroom":
                    self.turn_on(self.light, brightness_pct = 100)
                else:
                    self.turn_on(self.light, brightness_pct = 1)
            self.set_timer()

Here we can see that the initialize method loads only the new configuration parameters. The existing parameters from the parent class are loaded from the parent’s initialize method via the super call. The new configuration options are passed as lists, allowing us to specify several bedroom or other doors. In order to set the relevant callbacks I loop over each list and set the callback. The callback is the same for each entry in the list since it only matters what type they are. The specifics of each door are irrelevant.

Next we have the actual callback methods for when the doors open. These just set the internal variable last_door to the relevant value and log it for debugging purposes.

Most of the new logic comes in the motion_callback method. Here I have reused the is_light_times and set_timer methods from the parent class. The remainder of the logic first checks that the light is off and then recreates the operation of the template I used in my original automation. This sets the light to dim if the last door opened was to one of the bedrooms and bright otherwise. There are also some time based restrictions on this for times when I always want the light bright.

The configuration is pretty similar to the previous example, with the addition of the lists for the doors:

brightness_controlled_motion_light:
  module: motion_lights
  class: BrightnessControlledMotionLight
  motion_sensor: binary_sensor.motion_sensor
  light: light.motion_light
  timeout: 120
  bedroom_doors:
    - binary_sensor.bedroom1_door_contact
    - binary_sensor.bedroom2_door_contact
  other_doors:
    - binary_sensor.kitchen_hall_door_contact

Conclusion and Next Steps

The previous automation (or more rightly set of automations) totaled to 78 lines. The Python code for the app is only 56 lines long. However, there is another 11 lines of configuration required. By this measurement, it seems like the two are similar in complexity. However, we now have an easily re-usable implementation for two types of motion controlled lights with the AppDaemon implementation. Further instances can be called into being with only a few lines of simple configuration. Whereas the YAML automation would need to be duplicated wholesale and tweaked to fit.

This power makes me keen to continue with AppDaemon. I’m also keen to integrate it with my CI Pipeline. Although I’m actually thinking of separating it out from my HASS configuration. With this I’d like to try out some more modern Python development tooling, since it’s been quite some time since I’ve had the opportunity to do any serious Python development.

I hope you’ve enjoyed reading this post. For anyone already using AppDaemon, this isn’t anything groundbreaking. However, for those who haven’t tried it or are on the fence, I’d highly recommend you give it a go. Please feel free to show off anything you’ve made in the feedback channels for this post!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Loading

my road to docker smtp

My Road to Docker: Sorting Out SMTP

This post may contain affiliate links. Please see the disclaimer for more information.

This post is part of a series on this project. Here is the series so far:


Having said in my last ‘Road to Docker’ post that I didn’t have any current plans for another post, something came up which warrants a write up here. I managed to solve the problem in question in a Dockerised fashion and I’m quite pleased with the solution. Let’s get into it….

The Backstory

After converting my web stack over to Docker recently, I had installed the WP Mail SMTP plugin in WordPress to handle mail from the site. This was required since WordPress could no longer send via a local mail setup. I configured the plugin to send via my existing mail server. This worked well for a while – then I encountered a speed bump on my road to Docker!

For some reason (I think due to an update of the WordPress container), I just stopped receiving email from the site. Upon investigation it seemed that the TLS connection was unable to be started correctly. I got the following debug log when testing mail via WordPress:

SMTP Debug:

2019-07-30 08:38:11	Connection: opening to mail.webworxshop.com:25, timeout=300, options=array (
                   	                  )
2019-07-30 08:38:11	Connection: opened
2019-07-30 08:38:11	SERVER -> CLIENT: 220 mail.webworxshop.com ESMTP Postfix
2019-07-30 08:38:11	CLIENT -> SERVER: EHLO webworxshop.com
2019-07-30 08:38:11	SERVER -> CLIENT: 250-mail.webworxshop.com
                   	                  250-PIPELINING
                   	                  250-SIZE 30720000
                   	                  250-VRFY
                   	                  250-ETRN
                   	                  250-STARTTLS
                   	                  250-ENHANCEDSTATUSCODES
                   	                  250-8BITMIME
                   	                  250 DSN
2019-07-30 08:38:11	CLIENT -> SERVER: STARTTLS
2019-07-30 08:38:11	SERVER -> CLIENT: 220 2.0.0 Ready to start TLS
2019-07-30 08:38:12	SMTP Error: Could not connect to SMTP host.
2019-07-30 08:38:12	CLIENT -> SERVER: QUIT
2019-07-30 08:38:12	SERVER -> CLIENT: 
2019-07-30 08:38:12	SMTP ERROR: QUIT command failed: 
2019-07-30 08:38:12	Connection: closed
2019-07-30 08:38:12	SMTP Error: Could not connect to SMTP host.

Looking into the logs on the mail server, I found the following corresponding error:

warning: TLS library problem: 27261:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1275:SSL alert number 40:

I tried googling the error, but after trying a few of the suggested fixes, I gave up and decided to solve the problem a different way.

Fixing it, Take 1

My first attempt involved installing Postfix on the host in a smarthost configuration to my main server. This is the same setup as I use on most of my servers for system mail from cron, etc (via a custom Ansible role).

After getting the mail system running and able to send mail from the host, I tried to configure it in WordPress. However, I was unable to connect to the host machine from the container on the Docker host IP address. I investigated this and found that because I was using a private network the IP address was different than the standard Docker interface address.

Trying this address didn’t work either. Perhaps this is a security feature in Docker, or perhaps I was doing it wrong. Either way it pushed me on to a better solution.

Fixing it, Take 2

My next plan involved putting Postfix into a container. This would be put on the same private network as the WordPress container to allow access. I needed to keep the smarthost configuration to talk to the main mailserver. A quick search turned up a suitable image in the form of boky/postfix. This image is intended exactly for this purpose and I was able to set it up without too much trouble.

To spin this up I added the following to my previous docker-compose.yml file:

postfix:
    image: boky/postfix
    ports:
      - "127.0.0.1:587:587"
    environment:
      HOSTNAME: ${POSTFIX_HOSTNAME}
      RELAYHOST: ${POSTFIX_RELAYHOST}
      RELAYHOST_USERNAME: ${POSTFIX_RELAYHOST_USERNAME}
      RELAYHOST_PASSWORD: ${POSTFIX_RELAYHOST_PASSWORD}
      RELAYHOST_TLS_LEVEL: "verify"
      ALLOWED_SENDER_DOMAINS: webworxshop.com
    volumes:
      - /home/rob/docker-data/postfix/spool:/var/spool/postfix
    networks:
      - internal
    restart: always

Pretty simple! As per the previous post I put all the secrets into an env.sh file to keep them separate from the stack.

The mail forwarder is available both on the internal Docker network and on the host system to replace the native mail forwarding setup. Setting this up with WordPress ended up being trivial (see screenshot below). However, I required some further configuration to make the mail on the host system work.

my road to docker smtp
We just use the container name as the hostname and 587 as the port! Authentication and SMTP aren’t required for the local connection.

MSMTP Setup

In order to redirect the system mail from the host via the Dockerised mail forwarder, I had to set up MSMTP. This is pretty effectively documented elsewhere, so I won’t go into details. The only differences in this setup are that we don’t require authentication or TLS to the mail forwarder because it’s only available locally. The mail forwarder itself is already handling authentication and TLS to the main mail server.

For reference, here is the msmtprc file I ended up with:

# Set default values for all following accounts.
defaults
auth           off
tls            off
logfile        /var/log/msmtp/msmtp.log

# Local mailserver
account        local
host           localhost
port           587
from           someone@example.com

# Set a default account
account default : local

aliases         /etc/aliases

Conclusion

Here we’ve seen how to quickly deploy a mail forwarding server for your Dockerised applications. We’ve also configured our host system to also use it, so we don’t need to run two forwarders.

I feel that this is the best solution to this problem (although I still don’t quite know what the original problem was!). It feels like a nicer solution than my original solution of running the mail forwarder locally. It has also resulted in one more Dockerised application!

I’m intending to convert my other Docker servers over to this approach in the near future. For the system mail part, I still need to create an Ansible role to push out the MSMTP configuration. Actually, the whole question of Ansible/host configuration and how it fits with Dockerised services is still something I need to work out. If anyone has any ideas feel free to share in the comments.

As I said in the last post, I don’t have any more ‘Road to Docker’ posts planned in the immediate future. However, the migration is ongoing so there will be more at some point!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Loading