ansible roles

Automating My Infrastructure with Ansible and Gitlab CI: Part 2 – Deploying Stuff with Roles

This post may contain affiliate links. Please see the disclaimer for more information.

In the first post of this series, I got started with Ansible running in Gitlab CI. This involved setting up the basic pipeline, configuring the remote machines for use with our system and making a basic playbook to perform package upgrades. In this post we’re going to build on top of this to create a re-usable Ansible role to deploy some software and configuration to our fleet of servers. We will do this using the power of Ansible roles.

In last week’s post I described my monitoring system, based on checkmk. At the end of the post I briefly mentioned that it would be great to use Ansible to deploy the checkmk agent to all my systems. That’s what I’m going to describe in this post. The role I’ve created for this deploys the checkmk agent from the package download on my checkmk instance and configures it to be accessed via SSH. It also installs a couple of plugins to enable some extra checks on my systems.

A Brief Aside: ansible-lint

In my previous post I set up a job which ran all the playbooks in my repository with the --check flag. This performs a dry run of the playbooks and will alert me to any issues. In that post I mentioned that all I was really looking for was some kind of syntax/sanity checking on the playbooks and didn’t really need the full dry run. Several members of the community stepped forward to suggest ansible-lint – thanks to all those that suggested it!

I’ve now updated my CI configuration to run ansible-lint instead of the check job. The updated job is shown below:

This is a pretty basic use of ansible-lint. All I’m doing is running it on all the playbooks in my playbooks directory. I do skip a single rule (403) with the -x argument. The rule in question is about specifying latest in package installs, which conflicts with my upgrade playbook. Since I’m only tweaking this small thing I just pass this via the CLI rather than creating a config file.

I’ve carried the preflight jobs and the ansible-lint job over to the CI configuration for my new role (described below). Since this is pretty much an exact copy of that of my main repo, I’m not going to explain it any further.

Creating a Base Role

I decided that I wanted my roles self contained in their own git repositories. This keeps everything a bit tidier at the price of a little extra complexity. In my previous Ansible configuration I had all my roles in the same repo and it just got to be a big mess after a while.

To create a role, first initialise it with ansible-galaxy. Then create a new git repo in the resulting directory:

I actually didn’t perform these steps and instead started from a copy of the old role I had for this in my previous configuration. This role has been tidied up and expanded upon for the new setup.

The ansible-galaxy command above will create a set of files and directories which provide a skeleton role. The first thing to do is to edit the README.md and meta/main.yml files for your role. Just update everything to suit what you are doing here, it’s pretty self explanatory. Once you’ve done this all the files can be added to git and committed to create the first version of your role.

Installing the Role

Before I move on to exactly what my role does, I’m going to cover how we will use this role in our main infrastructure project. This is done by creating a requirements.yml file which will list the required roles. These will then be installed by passing the file to ansible-galaxy. Since the installation tool can install from git we will specify the git URL as the installation location. Here are the contents of my requirements.yml file:

Pretty simple. In order to do the installation all we have to do is run the following command:

This will install the required Ansible roles to the playbooks/roles directory in our main project, where our playbooks can find them. The --force flag will ensure that the role always gets updated when we run the command. I’ve added this command in the before_script commands in the CI configuration to enable me to use the role in my CI jobs.

Now the role will be installed where we actually need it. However, we are not yet using it. I’ll come back to this later. Let’s make the role actually do something first!

What the Role Does

The main behaviour of the role is defined in the tasks/main.yml file. This file is rather long, so I won’t reproduce this here. Instead I’ll ask you to open the link and follow along with my description below:

  • The first task creates a checkmk user on the target system. This will be used by checkmk to log in and run the agent.
  • The next task creates a .ssh directory for the checkmk user and sets it’s permissions correctly.
  • Next we create an authorized_keys file for the user. This uses a template file which will restrict what the key can do. The actual key comes from the checkmk_pub_key variable which will be passed in from the main project. The template is as follows:
  • Next are a couple of tasks to install some dependent packages for the rest of the role. There is one task for Apt based systems and another for Yum based systems. I’m not sure if the monitoring-plugins package is actually required. I had it in my previous role and have just copied it over.
  • The two tasks remove the xinetd package on both types of system. Since we are accessing the agent via SSH we don’t need this. I was previously using this package for my agent access so I want to make sure it is removed. This behaviour can be disabled by setting the checkmk_purge_xinetd variable to false.
  • The next task downloads the checkmk agent deb file to the local machine. This is done to account for some of the remote servers not having direct access to the checkmk server. I then upload the file in the following task. The variables checkmk_server, checkmk_site_name and checkmk_agent_deb are used to specify the server address, monitoring instance (site) and deb file name. The address and site name are designed to be externally overridden by the main project.
  • The next two tasks repeat the download and upload process for the RPM version of the agent.
  • We then install the correct agent in the next two tasks.
  • The following task disables the systemd socket file for the agent to stop it being accessible over an unencrypted TCP port. Right now I don’t do this on my CentOS machines because they are too old to have systemd!
  • The final few tasks get in to installing the Apt and Docker plugins on systems that require it. I follow the same process of downloading then uploading the files and make them executable. The Docker plugin requires that the docker Python module be installed, which we achieve via pip. It also requires a config file, which as discussed in my previous post needs to be modified. I keep my modified copy in the repository and just upload it to the correct location.

The variables that are used in this are specified in the vars/main.yml and defaults/main.yml files. The default file contains the variables that should be overridden externally. I don’t specify a default for the SSH public key because I couldn’t think of a sensible value, so this at least must be specified for the role to run.

With all this in place our role is ready to go. Next we should try that from our main project.

Applying the Role

The first thing to do is to configure the role via the variables described above. I did this from my hosts.yml file which is encrypted, but the basic form is as follows:

The public key has to be that which will be used by the checkmk server. As such the private key must be installed on the server. I’ll cover how to set this up in checkmk below.

Next we have the playbook which will apply our role. I’ve opted to create a playbook for applying common roles to all my systems (of which this is the first). This goes in the file playbooks/common.yml:

This is extremely basic, all it does is apply the checkmk_agent role to all servers.

The corresponding CI job is only marginally more complex:

With those two in place a push to the server will start the pipeline and eventually deploy our role to our servers.

ansible roles
Our updated CI pipeline showing the ansible-lint job and the new common-roles job

Configuring Checkmk Agent Access via SSH

Of course the deployment on the remote servers is only one side of the coin. We also need to have our checkmk instance set up to access the agents via SSH. This is documented pretty well in the checkmk documentation. Basically it comes down to putting the private key corresponding to the public key used earlier in a known location on the server and then setting up an “Individual program call instead of agent access” rule in the “Hosts and Service Parameters” page of WATO.

I modified the suggested SSH call to specify the private key and user to use. Here is the command I ended up using in my configuration.

When you create the rule you can apply it to as many hosts as you like. In my setup this is all of them, but you should adjust as you see fit.

ansible roles
The checkmk WATO rule screen for SSH agent access

Conclusion

If you’ve been following along you should now be able to add new hosts to your setup (via hosts.yml) and have the checkmk agent deployed on them automatically. You should also have an understanding of how to create reasonably complex Ansible roles in external repositories and how to use them in your main Ansible project.

There are loads of things about roles that I haven’t covered here (e.g. handlers). The best place to start learning more would be the Ansible roles documentation page. You can then fan out from there on other concepts as they arise.

Next Steps

So far on this adventure I’ve tested my playbooks and roles by just making sure they work against my servers (initially on a non-critical one). It would be nice to have a better way to handle this and to be able to run these tests and verify that the playbook is working from a CI job. I’ll be investigating this for the next part of this series.

The next instalment will probably be delayed by a few weeks. I have something else coming which will take up quite a bit of time. For my regular readers, there will still be blog posts – they just won’t be about Ansible or CI. This is probably a good thing, I’ve been covering a lot of CI stuff recently!

As always please get in contact if you have any feedback or improvements to suggest, or even if you just want to chat about your own Ansible roles.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

ansible gitlab ci

Automating My Infrastructure with Ansible and Gitlab CI: Part 1 – Getting Started

This post may contain affiliate links. Please see the disclaimer for more information.

This is the first part in a multi-part series following my adventures in automating my self-hosting infrastructure with Ansible, running from Gitlab CI. In this post I’ll cover setting up my Ansible project, setting up the remote machines for Ansible/CI deployment, some initial checks in CI and automating of routine updates via our new system.

I’ve used Ansible quite extensively in the past, but with my recent focus on Docker and Gitlab CI I thought it was worth having a clean break. Also my previous Ansible configurations were a complete mess, so it’s a good opportunity to do things better. I’ll still be pulling in parts of my old config where needed to prevent re-inventing the wheel.

Since my ongoing plan is to deploy as many of my applications as possible with Docker and docker-compose, I’ll be focusing mainly in tasks relating to the host machines. Some of these will be set up tasks which will deploy a required state to each machine. Others will be tasks to automate routine maintenance.

Inventory Setup

Before we get started, I’ll link you to the Gitlab repository for this post. You’ll find that some of the files are encrypted with ansible-vault, since they contain sensitive data. Don’t worry though, I’ll go through them as examples, starting with hosts.yml.

The hosts.yml file is my Ansible inventory and contains details of all the machines in my infrastructure. Previously, I’d only made use of inventories in INI format, so the YAML support is a welcome addition. The basic form of my inventory file is as follows:

To create this file we need to use the command ansible-vault create hosts.yml. This will ask for a password which you will need when running your playbooks. To edit the file later, replace the create subcommand with edit.

As you can see we start out with the top level group called all. This group has several variables set to configure all the hosts. The first of these ensures that we are using Python 3 on each remote system. The other two set the remote user and the SSH key used for authentication. I’m connecting to all of my systems with a specific user, called ci, which I will handle setting up in the next section.

The remainder of the file specifies the remote hosts in the infrastructure. For now I’ve divided these up into three groups, corresponding to VMs, Raspberry Pis and physical host machines. It’s worth noting that a host can be in multiple groups, so you can pretty much get as complicated as you like.

Each host has several variables associated with it. The first of these is the password for sudo operations. I like each of my hosts to have individual passwords for security purposes. The second variable (password_hash) is a hashed version of the same password which we will use later when setting up the user. These hashes are generated with the mkpasswd command as per the documentation. The final variable (ansible_host) I’m using here is optional and you only need to include it if the hostname of the server in question can’t be resolved via DNS. In this case you can specify an IP address for the server here.

In order to use this inventory file we need to pass the -i flag to Ansible (along with the filename) on every run. Alternatively, we can configure Ansible to use this file by creating an ansible.cfg file in our current directory. To do this we download the template config file and edit the inventory line so it looks like this:

Setting Up the Remote Machines

At this point you should comment out the ansible_user and ansible_ssh_private_key_file lines in your inventory, since we don’t currently have a ci user on any of our machines and we haven’t added the key yet. This we will take care of now – via Ansible itself. I’ve created a playbook which will create that user and set it up for use with our Ansible setup:

Basically all we do here is create the user (with the password from the inventory file) and set it up for access via SSH with a key. I’ve put this in the playbooks subdirectory in the name of keeping things organised. You’ll also need a playbooks/files directory in which you should put the ci_authorized_keys file. This file will be copied to .ssh/authorized_keys on the server, so obviously has that format. In order to create your key generate it in the normal way with ssh-keygen and save it locally. Copy the public part into ci_authorized_keys and keep hold of the private part for later (don’t commit it to git though!).

Now we should run that against our servers with the command:

This will prompt for your vault password from earlier and then configure each of your servers with the playbook.

Spinning Up the CI

At this point we have our ci user created and configured, so we should uncomment those lines in our inventory file. We can now perform a connectivity check to check that this worked:

If that works you should see success from each host.

Next comes our base CI setup. I’ve imported some of my standard preflight jobs from my previous CI pipelines, specifically the shellcheck, yamllint and markdownlint jobs. The next stage in the pipeline is a check stage. Here I’m putting jobs which check that Ansible itself is working and that the configuration is valid, but don’t make any changes to the remote systems.

I started off with a generic template for Ansible jobs:

This sets up the CI environment for Ansible to run. We start from a base Python 3.7 image and install Ansible via Pip. This takes a little bit of time doing it on each run so it would probably be better to build a custom image which includes Ansible (all the ones I found were out of date).

The next stage is to set up the authentication tokens. First we write the ANSIBLE_VAULT_PASSWORDvariable to a file, followed by the DEPLOYMENT_SSH_KEY variable. These variables are defined in the Gitlab CI configuration as secrets. The id_rsa file requires its permissions set to 0600 for the SSH connection to succeed. We also make sure to remove both these files after the job completes.

The last thing to set are a couple of environment variables to allow Ansible to pick up our config file (by default it won’t in the CI environment due to a permissions issue). We also need to tell it the vault key file to use.

Check Jobs

I’ve implemented two check jobs in the pipeline. The first of these performs the ping action which we tested before. This is to ensure that we have connectivity to each of our remote machines from the CI runner. The second iterates over each of the YAML files in the playbooks directory and runs them in check mode. This is basically a dry-run. I’d prefer if this was just a syntax/verification check without having to basically run through the whole thing, but there doesn’t seem to be a way to do that in Ansible.

The jobs for both of these are shown below:

Before these will run on the CI machine, we need to make a quick modification to our ansible.cfg file. This will allow Ansible to accept the SSH host keys without prompting. Basically you just uncomment the host_key_checking line and ensure it is set to false:

Doing Something Useful

At this stage we have our Ansible environment set up in CI and our remote machines are ready to accept instructions. We’ve also performed some verification steps to give us some confidence that we are doing something sensible. Sounds like we’re ready to make this do something useful!

Managing package updates has always been a pain for me. Once you get beyond a couple of machines, manually logging in and applying updates becomes pretty unmanageable. I’ve traditionally taken care of this on a Saturday morning, sitting down and applying updates to each machine whilst having my morning coffee. The main issue with this is just keeping track of which machines I already updated. Luckily there is a better way!

[[Side note: Yes, before anyone asks, I am aware of the unattended-upgrades package and I have it installed. However, I only use the default configuration of applying security updates automatically. This is with good reason. I want at least some manual intervention in performing other updates, so that if something critical goes wrong, I’m on hand to fix it.]]

With our shiny new Ansible setup encased in a layer of CI, we can quite easily take care of applying package upgrades to a whole fleet of machines. Here’s the playbook to do this (playbooks/upgrades.yml):

This is pretty simple, first we apply it to all hosts. Secondly we specify the line serial: 2 to run only on two hosts at a time (to even out the load on my VM hosts). Then we get into the tasks. These basically run an upgrade selectively based upon the OS in question (I still have a couple of CentOS machines knocking around). Each of the update tasks will perform the notify block if anything changes (i.e. any packages got updated). In this case all this does is execute the Reboot handler, which will reboot the machine. The when clause of that handler causes it not to execute if the machine is in the physical group (so that my host machines are not updated). I still need to handle not rebooting the CI runner host, but so far I haven’t added it to this system.

We could take this further, for example snapshotting any VMs before doing the update, if our VM host supports that. For now this gets the immediate job done and is pretty simple.

I’ve added a CI job for this. The key thing to note about this is that it is a manual job, meaning it must be directly triggered from the Gitlab UI. This gives me the manual step I mentioned earlier:

Now all I have to do on a Saturday morning is click a button and wait for everything to be updated!

ansible gitlab ci
Our finished pipeline. Note the state of the final “package-upgrades” job which indicates a manual job. Gitlab helpfully provides a play button to run it.

Conclusion

I hope you’ve managed to follow along this far. We’ve certainly come a long way, from nothing to an end to end automated system to update all our servers (with a little manual step thrown in for safety).

Of course there is a lot more we can do now that we’ve got the groundwork set up. In the next instalment I’m going to cover installing and configuring software that you want to be deployed across a fleet of machines. In the process we’ll take a look at Ansible roles. I’d also like to have a look at testing my Ansible roles with Molecule, but that will probably have to wait for yet another post.

I’m interested to hear about anyone else’s Ansible setup, feel free to share via the feedback channels!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

continuous integration home assistant

Continuous Integration for Home Assistant, ESPHome and AppDaemon

This post may contain affiliate links. Please see the disclaimer for more information.

Recently I set up continuous integration and deployment from my Home Assistant configuration. This setup has been nothing short of awesome! It’s liberated me from worrying about editing my configuration – all I do is git push and relax. Either HASS will notify me when it restarts or I’ll get an email from Gitlab telling me the pipeline failed.

I wanted to take this configuration further and expand it to other parts of my Home Automation infrastructure. In this post I’ll cover expanding it to perform deployments of my HA stack with Docker, building and deploying to ESPHome devices and unit testing and deploying my AppDaemon apps.

Let’s get on with it!

Automating Docker Deployment

I’d originally held off doing this because I wasn’t looking forward to building custom Docker images in Gitlab CI. However, I managed to complete the original pipeline without having to add any extra dependencies to the HASS containers (such as git which I thought may be required). This makes the job of deploying my HA stack much easier, especially as I already had it mostly scripted. The first step was to add my update.sh script to my repo and tweak it to suit:

This is a pretty simple modification to my previous script. The main additions are that I use the -p argument to set the project name used by docker-compose. By default this is taken from the directory name, but I wanted it to match the name of my previous project even though the directory has changed from ha to home-assistant. The other main modification is that I’ve added the --remove-orphans argument to clean up any lingering containers. This is useful if I remove a container from the docker-compose.yml file. In addition I’ve removed the apt commands and cleaned up the script a bit so that it passes my shellcheck job.

The next step was simply to add the docker-compose.yml file to the repo. Then I continued by editing the CI configuration.

Updated Home Assistant CI Jobs

I first split up my previous deployment job into two jobs. The first of these is the main deployment job which pulls the new configuration. The second restarts HASS. The restart job goes in a new pipeline stage and will only be run when the docker-compose.yml or update.sh files haven’t changed:

I then added another job (again in another pipeline stage) which performs our Docker deployment. This will be run only when either the docker-compose.yml or update.sh files changes:

continuous integration home assistant
A full pipeline run with a deployment of the Docker containers running in the final stage.

With that in place I can now redeploy my HA stack by modifying either of those files, committing to git and pushing. In order to facilitate HASS updates with this workflow, I changed the tag of the HASS Docker image to the explicit version number. That way I can simply update the version number and redeploy for each new release.

Continuous Integration for ESPHome

Inspired by the previous configs I have seen for checking ESPHome files, I wanted to implement the same checks. However, I wanted to go further and have a full continuous deployment setup which would build the relevant firmware when its configuration was changed and send an OTA update to the corresponding device. As it turned out this was relatively easy.

I started out by importing my ESPHome configs into Git, which I hadn’t previously done. You can find the resulting repository on Gitlab. For the CI configuration I first copied over the markdownlint and yamllint jobs from my Home Assistant CI configuration.

I then borrowed the ESPHome config check jobs from Frenck’s configuration. These check against both the current release of ESPHome and the next beta release. The beta release job is allowed to fail and is designed only to provide a heads up for potential future issues.

Then I came to implement the build and deployment job. Traditionally these would be performed in separate steps, but since ESPHome can do this in a single step with it’s run subcommand I decided to do it the easy way. This also removes the requirement to manage build artifacts between steps. I created the following template job to manage this:

Most of the complexity here is in unlocking the git-crypt repository so that we can read the encrypted secrets file. I opted to store the git-crypt key in the repository, encrypted with openssl. The passphrase used for openssl is in turn stored in a Gitlab variable, in this case $OPENSSL_PASSPHRASE. Once the decryption of the key is complete, we can unlock the repo and get on with things. We remove the key after we are done in the after_script step.

Per-Device Jobs

Using the template configuration, I then created a job for each device I want to deploy to. These jobs are executed only when the corresponding YAML file (or secrets.yaml) is changed. This ensures that I only update devices that I need to on each run. The general form of these jobs is:

Of course you need to replace my_device with the name of your device file.

continuous integration home assistant
A run of the ESPHome pipeline with deployments to two devices

With these jobs in place I have a full end-to-end pipeline for ESPHome, which lints and checks my configuration before deploying it only to devices which need updating. Nice! You can check out the full pipeline configuration on Gitlab. I now no longer have need to run the ESPHome dashboard, so I’ve removed it from my server.

Continuous Integration for AppDaemon

I mentioned previously that I wanted to split out my AppDaemon apps and configuration into a separate repo from my HASS config. I did this as a prerequisite step of this setup and you can again find the new repo on Gitlab.

The inspiration for this configuration came mostly to @bachya on the HASS forum, whose post in reply to my earlier setup provided most of the details. Thanks for sharing!

I started out by copying across the now ubiquitous markdownlint and yamllint jobs. I then added jobs for pylint, mypy, flake8 and black:

Although this ends up being very verbose, I decided to implement these all as separate jobs so that I get individual pass/fail states for each. I’m also pretty sure the mypy job doesn’t do anything right now, because I’m not using any type hints in my Python code. However, the job is there for when I start adding those.

Unit Testing AppDaemon

Another thing that @bachya introduced me to was Appdaemontestframework. This provides a pytest based framework for unit testing your AppDaemon apps. Although I’m still working on the unit tests for my so far pretty minimal AD setup I did manage to get the framework up and running, which was a little tricky. I had some issues with setting up the initial configuration for the app, but I managed to work it out eventually.

The unit testing CI job is pretty simple:

All we do here is install the requirements that I need for the tests and then call py.test. Easy!

The deployment job for AppDaemon was also trivial, since it is pretty much a copy of the HASS one. Since AD detects changes to your apps automatically, there’s no need to restart. For more details you can check out the full CI pipeline on Gitlab.

continuous integration home assistant
A run of the AppDaemon pipeline – lots of preflight checks here!

Conclusion

Phew, that was a lot of work, but it was all the logical follow on from work I’d done before or that others had done. I now have a full set of CI pipelines for the three main components of my home automation setup. I’m really happy with each of them, but especially the ESPHome pipeline. As an embedded engineer in my day job I find it really cool that I can update a YAML file locally, commit/push it and then my CI takes over and ends up flashing a physical device! That this is even possible is a testament to all the pieces of software used.

Next Steps

I’m keen to keep going with CI as a means of automating my operations. I think my next target will be sprucing up my Ansible configurations and running them automatically from CI. Stay tuned for that in the hopefully near future!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

home assistant gitlab ci

Continuous Integration/Deployment for Home Assistant with Gitlab CI

This post may contain affiliate links. Please see the disclaimer for more information.

One of the best things about writing this blog is the interactions I have with other people. Of course it’s always nice to get feedback, whether it’s positive or (constructively) negative. It’s even better to see what similar projects other people are undertaking. Sometimes comments even start me off in a different direction than I had been taking.

This project was inspired by one such conversation. This started out rather gruff but actually ended up being really positive and motivated me to go further with an approach that I’d mostly dropped. The conversation in question was in relation to my recent “Seven Home Assistant Tips and Best Practices” post. Specifically it was around testing and deploying your Home Assistant config with Continuous Integration (via Gitlab CI).

I’m already familiar with Continuous Integration and Continuous Deployment through work. I had developed a minimal setup for validating my HASS config. However, I’d previously given up on it due to the slowness of Gitlab’s shared CI runners. What follows is my new attempt. Thanks to /u/automate_the_things and all the other commenters on that thread for the inspiration/persuasion to do this!

Setting Up a Local Runner

Ideally, I’d like to fully self host my own Gitlab instance. However, the recommended RAM requirements are between 4 and 8GB, which is a little ridiculous. Perhaps when I upgrade my server I’ll be able to spare enough RAM for this. For now I’m just running a local runner and connecting it to the cloud version of Gitlab.

I decided to go with deploying the runner as a Docker container and also executing my jobs also within their own containers. This fits with my recent Docker shenanigans and I’m familiar with this setup having deployed it for work. CI systems are one area where I’ve always felt that using containers makes sense, even before my recent Docker adventures. Each build running in it’s own container removes a lot of the complexity in managing build environments for different projects. It also means that your build machines only require Docker.

I set up my runner in a new VM on my main server. Then I pretty much just followed the official instructions to install the runner. I did convert the docker run command to a minimal docker-compose.yml file for ease of reproduction however:

Once that was done, I finished by getting the runner registered and connected to my project.

home assistant gitlab ci
Once your runner is active it should show up in your Gitlab runners page (accessible via Settings->CI/CD->Runners)

Build Pipeline Configuration

In my research for this project, I came across Frenck’s Gitlab CI configuration, which is really awesome (thanks @Frenck). I’ve based mine heavily on his with some tweaks for my configuration and environment. The finished pipeline runs the following jobs:

  • shellcheck – this job performs various checks on any shell scripts in the repository
  • yamllint – this job performs a full lint check of all my YAML files. Since I’ve never run this before it threw up loads of errors. I started fixing a few of these, but eventually marked the job with allow_failure: true to allow the pipeline to continue even with these errors. I’ll work on fixing these issues bit by bit over the next few weeks. I also remove some files which are encrypted with git-crypt
  • jsonlint – pretty much the same as yamllint, but for JSON files. Any files that are encrypted are excluded from this check.
  • markdownlint – similar as the previous jobs, but for checking of markdown files (such as the README.md file)
  • ha-latest – checks the HASS configuration against the current release of Home Assistant
  • ha-rc – runs a HASS configuration check against the next release candidate for Home Assistant
  • ha-dev – checks the HASS configuration against the development version of Home Assistant. Both of these jobs are configured to allow failure. This is just intended to give me advanced warning of and breaking configuration that may prevent HASS from starting up in a future release.
  • deploy – deploys the configuration to my HASS server. I’ll discuss this in more detail below.
home assistant gitlab ci
My full pipeline (note the yellow status of the failing yamllint job)

You can find the finished configuration in my hass-config repository.

Deployment Approaches

There are several ways I could have done the deployment, which may suit different scenarios:

  • We could use the Home Assistant Gitlab CI sensor to poll the pipeline status. We would then trigger an automation to pull down the configuration when the pipeline passes. I haven’t tried this approach. However, it is potentially useful if your HASS server and Gitlab runner are on different networks and your HASS server is not publicly available.
  • We could use a pipeline webhook from Gitlab to a webhook handler in HASS. This would trigger an automation similar to that above. Again, I haven’t tried this. It would be useful if your Home Assistant instance and Gitlab CI runner are on different networks. However, it would require your HASS instance to be publicly available.
  • Similar to the approach above you could trigger the HASS webhook handler from a job in your Gitlab runner directly with CURL (rather than using the built in webhooks). This has the advantage over the previous two approaches that it gives you an explicit deploy stage in your pipeline. This is turn gives you the ability to track deployments via environments. It’s also potentially a lot simpler, since it would only be triggered if the previous stages in the pipeline passed. You also wouldn’t have to parse the JSON payload.
  • The approach I have taken is to deploy directly from the runner container via SSH. This is because my runner and HASS machines are running in the same network and so I can easily SSH between them without having to open any ports through my firewall. It also centralises all the deployment logic in the CI configuration, without any HASS automations needed.

My Deployment Job

As per my CI configuration, the deployment job is as follows:

As you can see, this job runs in a plain Alpine Linux container and is deploying the the home-assistant environment. This allows me to track what versions were deployed and when from the Gitlab UI.

The before_script portion installs a couple of dependencies which we need for later and pulls the (password-less) SSH key we need for logging into the HASS server from the project variables. This is stored in the $DEPLOYMENT_SSH_KEY variable in the Gitlab configuration. The resulting file must have it’s permissions set to 600 to allow the SSH client to use it.

Moving on to the script portion. The first step performs the actual deployment of the repository to the server via SSH. Here we use our SSH key that we wrote out above. The public portion of this is installed on the HASS server as for the ci user. We also disable strict host key checking to prevent the SSH client prompting us to accept the fingerprint.

home assistant gitlab ci
The Gitlab CI variables page (accessible via Settings->CI/CD->Variables)

The SSH command connects to the server specified in $DEPLOYMENT_SSH_LOGIN, which is again set in the Gitlab variables configuration. This has the form ci@<hass host IP>. It should be noted here that the Alpine container defaults to using Google’s DNS. This means that resolving internal hostnames for your network will fail. I’m using the IP addresses for now to get around this.

Remote Control Commands

The SSH command sends a sequence of commands to be run on the HASS server. These commands are as follows:

  • cd /mnt/docker-data/home-assistant – change directory to the configuration directory on the server
  • git fetch – fetch all the new stuff via git
  • git checkout $CI_COMMIT_SHA – checkout the exact commit that we are running a pipeline for via one of Gitlab’s built in variables

This arrangement of commands allows me to control exactly what gets deployed to the server for each pipeline run. In this way we won’t accidentally deploy the wrong version if new code is checked into the server whilst our pipeline is running.

In order for the git fetch command to work another password-less SSH key is required. This time this is for the ci user on the HASS system. The public portion of this is installed as a deploy key for the project in Gitlab. I suppose it’s equally valid to pull changes via HTTPS (for public repos), but since the remote on my repository was already set up to use SSH I decided to continue using it.

Restarting Home Assistant from Gitlab CI

The second command in our script section is to restart Home Assistant after the configuration has been updated. Here we use CURL to call the homeassistant.restart service as per the API docs. The Home Assistant authentication token and URL are stored in Gitlab CI variables again.

Finally, we enter the after_script section, which will be executed even in the case that one of the above commands fails. Here we simply delete the id_rsa SSH key file.

I’ve restricted my deploy job to run only on pushes to the master branch. This allows me to use other branches in the repo as I please without having them deployed to the server accidentally. I’ve also used a tag of hass to prevent running on runners not intended for this job. Since I only have one runner right now this isn’t a concern, but having it in place makes things easier if/when I add more runners.

Conclusion

I’m really pleased with how this CI pipeline has turned out. However, I’m still a little concerned at how long all these steps take. The latest pipeline took 6 minutes and 42 seconds to run (plus the time it takes HASS to restart). This isn’t very much if it’s just a fire and forget operation. It is however, a long time if I am trying to iterate on my configuration. I’m going to play around with the runner configuration to see if I can get this down further. I also want to investigate options for testing on my local machine. In my previous attempt at this my builds could sit for longer than this waiting for one of Gitlab’s shared runners. So I’ve at least made progress in the speed department.

Next Steps

In terms of further improvements, I’d like better notifications of the pipeline progress and notifications when HASS competes it’s restart. I will implement these with Gotify. Right now I only get an email if the build fails from Gitlab. I’m also going to integrate the pipeline status into Home Assistant with the previously mentioned Gitlab CI sensor. I’m even tempted to turn one of my smart lights into a build light to alert me to problems!

I also want to take my use of CI in my infrastructure further. My next target will be building some modified Docker images for a couple of my services in CI. Deployment of the relevant Docker stacks is another thing I’d like to try out. I’ve not had chance to play with building containers via Gitlab CI before so that will be interesting.

I hope that you’ve enjoyed this post and that it’s inspired you to give something similar a go. Thanks to this pipeline I’ll never have to worry about HASS being unable to start due to a broken configuration again. I’m open to further improvements to this process. Please feel free to share any you may have via the feedback channels. Thanks for reading!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Monthly Update: February 2017

Hello again. It’s been a busy (and short) month so I don’t have much to update on. Most of my work this month has gone into refactoring my Home Assistant configuration into something which is publicly sharable. This has mostly involved splitting the configuration into more logical chunks than the few monolithic files I had previously and extracting secrets out into a file protected by git-crypt. I’ve also been updating and improving aspects of my config as I go, particularly the automations. I’m not quite ready to share this since I still have a couple of things to clean up and also need to actually deploy and test the new configuration. Hopefully this will be posted on gitlab during March, with an accompanying blog post here.

I’ve also been working on another Home Assistant related task, which was to get AppDaemon working. This was specifically so I could run Occusim, which provides occupancy simulation (turns things on and off when you’re not there) for Home Assistant. It didn’t take me long to get this up and running, but my first live test of Occusim it didn’t work, due to me not removing the test command properly. Now that I’ve fixed that issue, it works great.

I think that’s pretty much it for now. Hopefully there will be more to share next month.