ansible gitlab ci

Automating My Infrastructure with Ansible and Gitlab CI: Part 1 – Getting Started

This post may contain affiliate links. Please see the disclaimer for more information.

This is the first part in a multi-part series following my adventures in automating my self-hosting infrastructure with Ansible, running from Gitlab CI. In this post I’ll cover setting up my Ansible project, setting up the remote machines for Ansible/CI deployment, some initial checks in CI and automating of routine updates via our new system.

I’ve used Ansible quite extensively in the past, but with my recent focus on Docker and Gitlab CI I thought it was worth having a clean break. Also my previous Ansible configurations were a complete mess, so it’s a good opportunity to do things better. I’ll still be pulling in parts of my old config where needed to prevent re-inventing the wheel.

Since my ongoing plan is to deploy as many of my applications as possible with Docker and docker-compose, I’ll be focusing mainly in tasks relating to the host machines. Some of these will be set up tasks which will deploy a required state to each machine. Others will be tasks to automate routine maintenance.

Inventory Setup

Before we get started, I’ll link you to the Gitlab repository for this post. You’ll find that some of the files are encrypted with ansible-vault, since they contain sensitive data. Don’t worry though, I’ll go through them as examples, starting with hosts.yml.

The hosts.yml file is my Ansible inventory and contains details of all the machines in my infrastructure. Previously, I’d only made use of inventories in INI format, so the YAML support is a welcome addition. The basic form of my inventory file is as follows:

To create this file we need to use the command ansible-vault create hosts.yml. This will ask for a password which you will need when running your playbooks. To edit the file later, replace the create subcommand with edit.

As you can see we start out with the top level group called all. This group has several variables set to configure all the hosts. The first of these ensures that we are using Python 3 on each remote system. The other two set the remote user and the SSH key used for authentication. I’m connecting to all of my systems with a specific user, called ci, which I will handle setting up in the next section.

The remainder of the file specifies the remote hosts in the infrastructure. For now I’ve divided these up into three groups, corresponding to VMs, Raspberry Pis and physical host machines. It’s worth noting that a host can be in multiple groups, so you can pretty much get as complicated as you like.

Each host has several variables associated with it. The first of these is the password for sudo operations. I like each of my hosts to have individual passwords for security purposes. The second variable (password_hash) is a hashed version of the same password which we will use later when setting up the user. These hashes are generated with the mkpasswd command as per the documentation. The final variable (ansible_host) I’m using here is optional and you only need to include it if the hostname of the server in question can’t be resolved via DNS. In this case you can specify an IP address for the server here.

In order to use this inventory file we need to pass the -i flag to Ansible (along with the filename) on every run. Alternatively, we can configure Ansible to use this file by creating an ansible.cfg file in our current directory. To do this we download the template config file and edit the inventory line so it looks like this:

Setting Up the Remote Machines

At this point you should comment out the ansible_user and ansible_ssh_private_key_file lines in your inventory, since we don’t currently have a ci user on any of our machines and we haven’t added the key yet. This we will take care of now – via Ansible itself. I’ve created a playbook which will create that user and set it up for use with our Ansible setup:

Basically all we do here is create the user (with the password from the inventory file) and set it up for access via SSH with a key. I’ve put this in the playbooks subdirectory in the name of keeping things organised. You’ll also need a playbooks/files directory in which you should put the ci_authorized_keys file. This file will be copied to .ssh/authorized_keys on the server, so obviously has that format. In order to create your key generate it in the normal way with ssh-keygen and save it locally. Copy the public part into ci_authorized_keys and keep hold of the private part for later (don’t commit it to git though!).

Now we should run that against our servers with the command:

This will prompt for your vault password from earlier and then configure each of your servers with the playbook.

Spinning Up the CI

At this point we have our ci user created and configured, so we should uncomment those lines in our inventory file. We can now perform a connectivity check to check that this worked:

If that works you should see success from each host.

Next comes our base CI setup. I’ve imported some of my standard preflight jobs from my previous CI pipelines, specifically the shellcheck, yamllint and markdownlint jobs. The next stage in the pipeline is a check stage. Here I’m putting jobs which check that Ansible itself is working and that the configuration is valid, but don’t make any changes to the remote systems.

I started off with a generic template for Ansible jobs:

This sets up the CI environment for Ansible to run. We start from a base Python 3.7 image and install Ansible via Pip. This takes a little bit of time doing it on each run so it would probably be better to build a custom image which includes Ansible (all the ones I found were out of date).

The next stage is to set up the authentication tokens. First we write the ANSIBLE_VAULT_PASSWORDvariable to a file, followed by the DEPLOYMENT_SSH_KEY variable. These variables are defined in the Gitlab CI configuration as secrets. The id_rsa file requires its permissions set to 0600 for the SSH connection to succeed. We also make sure to remove both these files after the job completes.

The last thing to set are a couple of environment variables to allow Ansible to pick up our config file (by default it won’t in the CI environment due to a permissions issue). We also need to tell it the vault key file to use.

Check Jobs

I’ve implemented two check jobs in the pipeline. The first of these performs the ping action which we tested before. This is to ensure that we have connectivity to each of our remote machines from the CI runner. The second iterates over each of the YAML files in the playbooks directory and runs them in check mode. This is basically a dry-run. I’d prefer if this was just a syntax/verification check without having to basically run through the whole thing, but there doesn’t seem to be a way to do that in Ansible.

The jobs for both of these are shown below:

Before these will run on the CI machine, we need to make a quick modification to our ansible.cfg file. This will allow Ansible to accept the SSH host keys without prompting. Basically you just uncomment the host_key_checking line and ensure it is set to false:

Doing Something Useful

At this stage we have our Ansible environment set up in CI and our remote machines are ready to accept instructions. We’ve also performed some verification steps to give us some confidence that we are doing something sensible. Sounds like we’re ready to make this do something useful!

Managing package updates has always been a pain for me. Once you get beyond a couple of machines, manually logging in and applying updates becomes pretty unmanageable. I’ve traditionally taken care of this on a Saturday morning, sitting down and applying updates to each machine whilst having my morning coffee. The main issue with this is just keeping track of which machines I already updated. Luckily there is a better way!

[[Side note: Yes, before anyone asks, I am aware of the unattended-upgrades package and I have it installed. However, I only use the default configuration of applying security updates automatically. This is with good reason. I want at least some manual intervention in performing other updates, so that if something critical goes wrong, I’m on hand to fix it.]]

With our shiny new Ansible setup encased in a layer of CI, we can quite easily take care of applying package upgrades to a whole fleet of machines. Here’s the playbook to do this (playbooks/upgrades.yml):

This is pretty simple, first we apply it to all hosts. Secondly we specify the line serial: 2 to run only on two hosts at a time (to even out the load on my VM hosts). Then we get into the tasks. These basically run an upgrade selectively based upon the OS in question (I still have a couple of CentOS machines knocking around). Each of the update tasks will perform the notify block if anything changes (i.e. any packages got updated). In this case all this does is execute the Reboot handler, which will reboot the machine. The when clause of that handler causes it not to execute if the machine is in the physical group (so that my host machines are not updated). I still need to handle not rebooting the CI runner host, but so far I haven’t added it to this system.

We could take this further, for example snapshotting any VMs before doing the update, if our VM host supports that. For now this gets the immediate job done and is pretty simple.

I’ve added a CI job for this. The key thing to note about this is that it is a manual job, meaning it must be directly triggered from the Gitlab UI. This gives me the manual step I mentioned earlier:

Now all I have to do on a Saturday morning is click a button and wait for everything to be updated!

ansible gitlab ci
Our finished pipeline. Note the state of the final “package-upgrades” job which indicates a manual job. Gitlab helpfully provides a play button to run it.

Conclusion

I hope you’ve managed to follow along this far. We’ve certainly come a long way, from nothing to an end to end automated system to update all our servers (with a little manual step thrown in for safety).

Of course there is a lot more we can do now that we’ve got the groundwork set up. In the next instalment I’m going to cover installing and configuring software that you want to be deployed across a fleet of machines. In the process we’ll take a look at Ansible roles. I’d also like to have a look at testing my Ansible roles with Molecule, but that will probably have to wait for yet another post.

I’m interested to hear about anyone else’s Ansible setup, feel free to share via the feedback channels!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

internal HTTPS Linode Traefik

Internal HTTPS with Let’s Encrypt, Linode DNS and Traefik

This post may contain affiliate links. Please see the disclaimer for more information.

In my previous article on the Traccar GPS tracking software, I lamented the state of my broken internal HTTPS/TLS setup. I’ve known that using DNS validation for Let’s Encrypt was the way to fix this for some time. However, that required me to migrate my DNS to another provider, because my provider (Namecheap) only allow API access if you have a large enough account. Since I wrote that article, I’ve been investigating this further and have found my solution in the form of Linode‘s Domains Service. As I’m already using Linode for hosting my cloud servers (including this website) this is the perfect option for me.

The application which prompted me getting this sorted was bitwarden_rs, which I really didn’t want to run over plain HTTP and also didn’t want to expose to the outside world. I’d recommend the excellent Self Hosted Home article on this as a getting started point if you’d like to deploy it.

Migrating my DNS to Linode

For a job that I wasn’t looking forward to, this migration couldn’t have gone better. The main issue I encountered was that I wasn’t able to import my records over directly from Namecheap, but I think that’s a problem at their end rather than Linode’s. I ended up copying over the records one by one, which was easier than I expected since I didn’t have as many records as I thought.

One other minor issue that I ran into was the SOA email address that I had to add. Weirdly I never had to specify this with Namecheap. The problem stemmed from the fact that Linode will not allow you to use an address from the same domain for this. I contacted Linode support about this because it seemed contrary to their docs. They responded incredibly quickly (time measured in minutes, over a weekend too) and said that is so you can be notified about problems with your domain. This makes sense, but it is a little annoying. In the end I created another third party (gmail, boo!) address for this and forwarded it back to my main self hosted account. I also added it as a secondary account on my phone, just in case.

Getting an API Token

In order to perform the DNS-01 certificate validation with Linode, your client software needs to create a temporary DNS record. This requires an API token to authenticate to the Linode Domains API. This token can be created from the Linode Manager. The only permission required is read/write access to the Domains service, as per the screenshot below:

internal HTTPS Linode Traefik
Required Linode API token permissions

Setting Up Traefik

I wanted to use Traefik as my reverse proxy for this, given my previous success with it. This was massively complicated by the fact that Traefik 2.0 was released just a few days ago. This release introduces a lot of changes both in concepts and configuration, which make Traefik significantly more complex. I was able to work through these changes, but it took me a while to work it all out!

internal HTTPS Linode Traefik
The Traefik Dashboard has changed quite a bit

The Traefik setup I ended up with was as follows (in docker-compose.yml):

Obviously, you need to insert your email address and Linode API token in the relevant places. Make sure not to include quotes around the API token, since these will be passed into the container and make the token invalid. This appears to be some weird and surprising behaviour in docker-compose.

This configuration sets up Traefik with a DNS challenge certificate resolver called mydnschallenge. This needs to be specified in the configuration for each service that you want to use it with. This is done by adding following labels to your service in docker-compose.yml:

These labels first enable Traefik for the container in question. Then we add a new router called myservice and specify that it should be attached to the websecure entrypoint at the hostname given. We then add our mydnschallenge as the TLS certificate resolver. Finally, I specify the backend port on which this service listens – this isn’t required if it just listens on port 80.

With this in place you should be able to successfully get a certificate and then access your service over HTTPS. For me this took a few minutes the first time for the DNS to refresh, but this won’t be an issue for future renewals.

HTTP to HTTPS Redirects with Traefik 2.0

So far our service is available only via HTTPS. If you’ve followed the above configuration for Traefik then it is also listening on port 80 for plain HTTP. However, you will receive a 404 error if you try to access your service via HTTP. What we want to do is the traditional redirect from HTTP to HTTPS.

This turns out to be a little more complex in Traefik 2.0 than in previous versions and requires a little understanding of some new concepts. Traefik 2.0 introduces the concept of middlewares, which can operate on a request as it passes between the router and the backend service. One such middleware is the redirectscheme, which will do exactly what we need. Adding the following labels to your service container will do the trick:

Here we create the middleware myservice_redirect before adding another router (myservice_insecure). This router is attached to the web entrypoint at our hostname, making it available on port 80. Finally we add the myservice_redirect middleware to our new router and we are done! It’s pretty simple when you understand it the concepts.

internal HTTPS Linode Traefik
The Traefik dashboard shows a nice pipeline of how your request gets routed

Conclusion

This solves the problem of internal HTTPS perfectly for me! I’m looking forward to migrating other internal services over to this arrangement. Of course, the configuration presented here only works with Traefik and not other software such as Nginx or Mosquitto. There are other options here which support the Linode API for DNS validation. However, since Traefik 2.0 supports arbitrary TCP services I think I’m going to give that a try for my MQTT server. It’s one less piece of software to maintain!

I’m really happy with how my DNS migration went and how well this all works with the Linode Domains service. It’s nice to discover another aspect to a service that I’ve been using for so long.

I hope you find this setup useful, or perhaps you’ve already solved this problem in a different way. Please feel free to let me know in the feedback channels.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

home assistant rubbish collection

Quick Project: Follow up to my Home Assistant Rubbish Collection Panel

This post may contain affiliate links. Please see the disclaimer for more information.

Last month, I wrote a quick post about the Home Assistant Rubbish Collection panel I made for the Lovelace UI. Well, it looks like amaximus was inspired to create his own custom card to do a similar thing. [NOTE: the author of this card hasn’t contacted me directly (I came across it via HACS), I’m only claiming to be the inspiration based on the relative dates].

This card has a couple of cool capabilities that my previous panel didn’t have. Specifically, you can set colour coded icons for your different bins and the icon will change style and colour (to red) when the bin is due to go out in the next day. You can also hide the card completely if a bin is not due to go out within X days.

home assistant rubbish collection
Cards for all four of my sensors, with colour coded icons
home assistant rubbish collection
If the collection is the next day the icons change to red

Installation and Setup

I installed the card by adding a git submodule to my configuration repository in the www/plugins directory, but you can also install directly from HACS. I’m switching over to adding all my custom components and cards as submodules in order to make my config more easily deployable.

After installation, you need to add the path to the garbage-collection-card.js file in the resources section of your Lovelace UI config:

Once that’s done you can add cards to the UI. I just put mine in a vertical stack to group them together:

…and that’s it! If you want to hide a card for bins that don’t need to go out soon use hide_before: x (where x is the number of days). I’ll probably use this to hide bins that don’t need to go out in the current week, but I wanted to show all the cards in the screenshots 😉

Conclusion

I think this is a great improvement on my previous panel, so I’m going to stick with it. Thanks to the author another contributors for taking the time to make it!

It’s kinda cool to think that this blog may have inspired someone else to go out and write some code! If you are inspired to make and share something as a result of one of my posts, please get in contact! Your work will most likely get featured in a future blog post.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

pfsense proxmox open vswitch

Virtualised pfSense on Proxmox with Open vSwitch

This post may contain affiliate links. Please see the disclaimer for more information.

In my recent post about my networking setup I mentioned that my firewall is a virtualised pfSense system running on a Proxmox host. In the comments to that post I was also asked if I was making use of Open vSwitch. Since the answer is that I use Open vSwitch in my pfSense/Proxmox setup, I thought I’d write up my setup for those that are interested.

I’ve actually been meaning to write this up for a long time. I’ve had this setup running since shortly after we moved into this house. On the one hand this means that the setup is pretty battle tested. All of the inter-VLAN and Internet bound traffic on my network runs through this and it’s been running pretty flawlessly for nearly two years.

On the other hand, given the length of time that has elapsed since I set this up and the writing of this post it means that this will be more like archeological exploration than documentation! I’m unlikely to remember every detail or the issues I encountered along the way. As such this post will pretty much document the state of the setup as I can extract it from the running system! Basically, you should only use this post as a rough guide and go away and do your own research. I’ll apologise for this incompleteness in advance. If you try this please let me know of anything I’ve missed and I’ll update the post with extra details.

How’s this going to work?

The basic premise of this whole thing is a Proxmox host with two physical NICs. One of these is the LAN port on which the host will have it’s internal IP. The second is the WAN port, which is assigned directly to the pfSense VM. In my case this is complicated by networking setup required by our Fibre connections here in NZ. These require a connection to the Fibre ONT on VLAN10 over which a PPPoE session to the ISP is established.

Since the WAN interface is directly assigned to the VM, this is all handled internally to pfSense. This means that that the host machine is not exposed to the external network. [OK, for the purists among you, this isn’t strictly true. The host will be exposed at lower levels of the network stack to allow it to forward packets through to the VM. However, since it doesn’t have an IP address on that interface it won’t be accessible from the Internet. I’m sure someone out there will tell me why this is all kinds of horrible.]

On the LAN side we create an Open vSwitch switch and add the LAN interface as a VLAN trunk on it. Another (virtual) trunk interface goes into the pfSense VM and becomes it’s LAN interface. This is analogous to just having another physical switch between the host and the VM. The purpose of this extra complexity is that it allows us to connect other VMs on the host into the vSwitch. These can be in on multiple different VLANs if required.

Hopefully the diagram below makes this somewhat clearer:

pfsense proxmox open vswitch
Sometimes a picture really is worth a thousand words

The Proxmox Host

The Proxmox host itself is a Dual Ethernet Haswell based mini-computer from AliExpress. I’ve been really happy with this as a platfrom aside from the fact that I would have spec’d it with more than 4GB of RAM if I’d been intending to run Proxmox initially. I also added an extra 120GB SSD drive on the internal SATA port for VM storage.

I started out with this host running pfSense natively, which also worked fine. One thing I did find is that when I switched over to Proxmox (Linux based) from pfSense (FreeBSD based) it ran much cooler. I guess that’s just down the the Linux kernel’s better hardware support.

This host is still running Proxmox 5.4 since I haven’t had time to upgrade it to 6.0 yet. This system is pretty much as close to “production” as it gets for me, since the Internet is used all the time!

Proxmox Network Setup

Proxmox enumerates the two NICs as ens1 (LAN) and enp1s0 (WAN). With the WAN port, I created a simple Linux Bridge vmbr1 to allow it to be added to the pfSense VM.

On the LAN side, I created an “OVS Bridge” port and added an “OVS IntPort” named admin which will be the primary interface to the host machine. As such, this interface is assigned a static IP and is assigned to the VLAN that we want the host to be on.

pfsense proxmox open vswitch
The network setup in Proxmox

I have to give kudos here to the Proxmox developers. They’ve made the Open vSwitch setup here pretty much trivial! For what I would consider advanced functionality it’s just as easy as configuring any other network.

A note should also be given here as to what’s going to happen when you configure this. By design Proxmox doesn’t apply any networking changes until you reboot. This is pretty useful to prevent yourself getting locked out. If you are connected directly on the LAN interface (with a static IP) you should make sure that everything is correct before rebooting. After the reboot, reconfigure your local interface to the VLAN you chose in the setup and a static IP. You should then be able to access the Proxmox web interface again.

Setting Up pfSense

The pfSense installation was fairly standard. The only change I ended up making was to change the default CPU type to enable AES-NI instructions. This took a little bit of experimentation and looking up the capabilities of various processors, but I finally settled on the “Westmere” processor.

pfsense proxmox open vswitch
I selected a “Westmere” processor in order to make AES-NI instructions available to the VM

After setting this architecture in the VM settings, rebooting pfSense shows both the correct CPU architecture and that AES-NI is available. It seems that this is probably less important than it was when I set up the system, since Netgate have now decided that AES-NI will not be required for pfSense 2.5.0.

pfsense proxmox open vswitch
AES-NI successfully enabled

One other thing is that you should disable hardware checksum offloading to work with the virtio drivers, as per the official documentation. Before you do this the network will be very sluggish.

Once the pfSense installation was complete I restored from a backup of my previous setup. This made the task of setting up my interfaces significantly easier. However, I’ll go through the networking aspects anyway for those who may be setting up a new system.

pfSense Networking

Luckily for us the pfSense tool to assign interfaces allows us to also set up the VLANs. This is useful to set up a minimal configuration to get you access to the web interface. Basically you want to set up the VLAN for your main LAN segment. Then you can set up the pfSense LAN interface on this VLAN with a static IP. If you’re using a fibre connection similar to mine you can skip the WAN setup for now. Once the “Assign Interfaces” wizard is complete you should have access to the Web Configurator.

The next step was to setup my WAN connection. I first added a VLAN with tag 10 on the vtnet0 device which is the device that corresponds to the physical WAN bridge as enumerated by pfSense. I added a corresponding interface for this and then added a PPPoE interface using the details provided by my ISP. This is then assigned to the WAN interface via the “Interface Assignments” page.

In terms of setting up the local networks, you can pretty much set up whatever VLANs you would like at this point. Take a look at my previous post for inspiration.

Conclusion

As stated earlier, I’ve found this setup to be very stable in production and it’s even made my hardware run cooler. Having my firewall virtualised has also had several other benefits for me. Firstly, I can backup and snapshot the firewall VM at will. I no longer need to worry about an update or bad configuration hosing my firewall. I just snapshot before doing anything major and roll back if anything goes wrong.

The second major benefit is that I can run extra VMs and containers on the host machine, which I couldn’t when it was a dedicated firewall. I’ve used this to implement my small DMZ for Internet facing services. This has the added benefit that DMZ traffic only transits the vSwitch internal to the host and doesn’t have to be shuttled back and forth over the physical network infrastructure. This is much faster, since the virtualised interfaces should (in theory) be 10GBps. However, this is somewhat irrelevant when the upstream Fibre connection is only 100Mbps.

As always, I’m keen to receive feedback and constructive criticism of this setup. Please feel free to get in contact via the feedback channels.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

continuous integration home assistant

Continuous Integration for Home Assistant, ESPHome and AppDaemon

This post may contain affiliate links. Please see the disclaimer for more information.

Recently I set up continuous integration and deployment from my Home Assistant configuration. This setup has been nothing short of awesome! It’s liberated me from worrying about editing my configuration – all I do is git push and relax. Either HASS will notify me when it restarts or I’ll get an email from Gitlab telling me the pipeline failed.

I wanted to take this configuration further and expand it to other parts of my Home Automation infrastructure. In this post I’ll cover expanding it to perform deployments of my HA stack with Docker, building and deploying to ESPHome devices and unit testing and deploying my AppDaemon apps.

Let’s get on with it!

Automating Docker Deployment

I’d originally held off doing this because I wasn’t looking forward to building custom Docker images in Gitlab CI. However, I managed to complete the original pipeline without having to add any extra dependencies to the HASS containers (such as git which I thought may be required). This makes the job of deploying my HA stack much easier, especially as I already had it mostly scripted. The first step was to add my update.sh script to my repo and tweak it to suit:

This is a pretty simple modification to my previous script. The main additions are that I use the -p argument to set the project name used by docker-compose. By default this is taken from the directory name, but I wanted it to match the name of my previous project even though the directory has changed from ha to home-assistant. The other main modification is that I’ve added the --remove-orphans argument to clean up any lingering containers. This is useful if I remove a container from the docker-compose.yml file. In addition I’ve removed the apt commands and cleaned up the script a bit so that it passes my shellcheck job.

The next step was simply to add the docker-compose.yml file to the repo. Then I continued by editing the CI configuration.

Updated Home Assistant CI Jobs

I first split up my previous deployment job into two jobs. The first of these is the main deployment job which pulls the new configuration. The second restarts HASS. The restart job goes in a new pipeline stage and will only be run when the docker-compose.yml or update.sh files haven’t changed:

I then added another job (again in another pipeline stage) which performs our Docker deployment. This will be run only when either the docker-compose.yml or update.sh files changes:

continuous integration home assistant
A full pipeline run with a deployment of the Docker containers running in the final stage.

With that in place I can now redeploy my HA stack by modifying either of those files, committing to git and pushing. In order to facilitate HASS updates with this workflow, I changed the tag of the HASS Docker image to the explicit version number. That way I can simply update the version number and redeploy for each new release.

Continuous Integration for ESPHome

Inspired by the previous configs I have seen for checking ESPHome files, I wanted to implement the same checks. However, I wanted to go further and have a full continuous deployment setup which would build the relevant firmware when its configuration was changed and send an OTA update to the corresponding device. As it turned out this was relatively easy.

I started out by importing my ESPHome configs into Git, which I hadn’t previously done. You can find the resulting repository on Gitlab. For the CI configuration I first copied over the markdownlint and yamllint jobs from my Home Assistant CI configuration.

I then borrowed the ESPHome config check jobs from Frenck’s configuration. These check against both the current release of ESPHome and the next beta release. The beta release job is allowed to fail and is designed only to provide a heads up for potential future issues.

Then I came to implement the build and deployment job. Traditionally these would be performed in separate steps, but since ESPHome can do this in a single step with it’s run subcommand I decided to do it the easy way. This also removes the requirement to manage build artifacts between steps. I created the following template job to manage this:

Most of the complexity here is in unlocking the git-crypt repository so that we can read the encrypted secrets file. I opted to store the git-crypt key in the repository, encrypted with openssl. The passphrase used for openssl is in turn stored in a Gitlab variable, in this case $OPENSSL_PASSPHRASE. Once the decryption of the key is complete, we can unlock the repo and get on with things. We remove the key after we are done in the after_script step.

Per-Device Jobs

Using the template configuration, I then created a job for each device I want to deploy to. These jobs are executed only when the corresponding YAML file (or secrets.yaml) is changed. This ensures that I only update devices that I need to on each run. The general form of these jobs is:

Of course you need to replace my_device with the name of your device file.

continuous integration home assistant
A run of the ESPHome pipeline with deployments to two devices

With these jobs in place I have a full end-to-end pipeline for ESPHome, which lints and checks my configuration before deploying it only to devices which need updating. Nice! You can check out the full pipeline configuration on Gitlab. I now no longer have need to run the ESPHome dashboard, so I’ve removed it from my server.

Continuous Integration for AppDaemon

I mentioned previously that I wanted to split out my AppDaemon apps and configuration into a separate repo from my HASS config. I did this as a prerequisite step of this setup and you can again find the new repo on Gitlab.

The inspiration for this configuration came mostly to @bachya on the HASS forum, whose post in reply to my earlier setup provided most of the details. Thanks for sharing!

I started out by copying across the now ubiquitous markdownlint and yamllint jobs. I then added jobs for pylint, mypy, flake8 and black:

Although this ends up being very verbose, I decided to implement these all as separate jobs so that I get individual pass/fail states for each. I’m also pretty sure the mypy job doesn’t do anything right now, because I’m not using any type hints in my Python code. However, the job is there for when I start adding those.

Unit Testing AppDaemon

Another thing that @bachya introduced me to was Appdaemontestframework. This provides a pytest based framework for unit testing your AppDaemon apps. Although I’m still working on the unit tests for my so far pretty minimal AD setup I did manage to get the framework up and running, which was a little tricky. I had some issues with setting up the initial configuration for the app, but I managed to work it out eventually.

The unit testing CI job is pretty simple:

All we do here is install the requirements that I need for the tests and then call py.test. Easy!

The deployment job for AppDaemon was also trivial, since it is pretty much a copy of the HASS one. Since AD detects changes to your apps automatically, there’s no need to restart. For more details you can check out the full CI pipeline on Gitlab.

continuous integration home assistant
A run of the AppDaemon pipeline – lots of preflight checks here!

Conclusion

Phew, that was a lot of work, but it was all the logical follow on from work I’d done before or that others had done. I now have a full set of CI pipelines for the three main components of my home automation setup. I’m really happy with each of them, but especially the ESPHome pipeline. As an embedded engineer in my day job I find it really cool that I can update a YAML file locally, commit/push it and then my CI takes over and ends up flashing a physical device! That this is even possible is a testament to all the pieces of software used.

Next Steps

I’m keen to keep going with CI as a means of automating my operations. I think my next target will be sprucing up my Ansible configurations and running them automatically from CI. Stay tuned for that in the hopefully near future!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.