ender 3 successful benchy

3D Printing: Getting to Benchy with the Ender 3

This post may contain affiliate links. Please see the disclaimer for more information.

I recently bought a 3D printer in the form of a Creality Ender 3. I thought I would take the time to document how I got up and running, since this area is completely new to me. Hopefully this will be useful to those who are just starting out in 3D printing, specifically with the Ender 3. It’s not going to be a complete guide, but more of a high level overview of the steps I took. I’ll link to more detailed instructions where relevant.

About the Ender 3

The Ender 3 is a budget 3D printer from Creality 3D. It has a pretty standard feature set, including a heated bed. Levelling of the bed must be performed manually, but there are several addons which can automate this. The whole design of the printer has been made Open Source, including the firmware. There are also a whole load of 3D printed addons that can be added to the printer to improve its design and performance.

I opted for the base model with a glass bed as an additional addon. This was because I’ve heard of people having issues with warping of the metal build plate over time (and heat). The glass bed will always remain flat and avoid this issue. When I actually received the unit I saw that with the bundle I purchased I got the original flexible build surface as well. So I have the choice between the two. I also purchased a Raspberry Pi to host an Octoprint setup (which I haven’t got around to setting up yet) and three rolls of PLA filament. My filament took ages to arrive. I’d probably advise you to grab try and grab a bundle that includes a roll of filament if you want to get going quickly. The available bundles are always changing, so have a search around for the best deal.

Putting it Together

The printer arrived very quickly (just a few days). It comes in kit form and requires some assembly. This shouldn’t be difficult for anyone who’s put together anything moderately complex from a kit before. However, the instructions in the box leave something to be desired so I’d advise you to follow some of the 3rd party instructions available online. I was able to complete the build in 2-3 hours spread over a couple of weekday evenings.

Why You Should Upgrade the Firmware

The Ender 3 ships with an older version of the Marlin firmware. This lacks some features available in newer versions. Most notable of these is the lack of thermal runaway protection in the stock firmware. This is an important safety feature, which will cut power to the heating elements of the printer if the firmware detects that something is wrong. This can happen if some part of the heating or feedback systems becomes loose. I think it’s really important to upgrade the firmware on the device to the latest Marlin, since the stock firmware may literally burn your house down! Of course even with the latest firmware accidents can still happen, so this isn’t a substitute for taking other safety precautions!

The firmware upgrade process, whilst not trivial is easy enough for anyone whose programmed an Arduino or ESP8266. That’s because the Marlin firmware is itself an Arduino sketch! Basically, doing the update is a two step process. First we must flash the Arduino bootloader to the mainboard. This will then allow us to upload our new firmware (and any subsequent updates) via the printer’s USB port.

Performing the Upgrade

ender 3 flashing bootloader
The bootloader programming setup, note the resistors between 5V and reset (together these make 120ohm)

Flashing the bootloader is the most tricky part of this. It requires making some connections to the main board from another Arduino. This will act as the programming device (I used my old Arduino Duemilanove). I also found that I had to insert a 120 Ohm resistor between the 5V and ground pins of the Arduino to prevent it being reset when the IDE connected to it. This is because we want the reset to be passed through to the printer mainboard, not to reset the programming sketch on our Arduino. I couldn’t flash the bootloader from platformio, which is my preferred tool for doing Arduino stuff, so I had to download the latest version of the Arduino IDE.

Once the bootloader was flashed I cloned a copy of Marlin from Github, checked out the latest 1.x release and copied over the Ender 3 example configuration. I did a quick double check to make sure thermal runaway protection was enabled in the config. I was then able to build and flash the new firmware via platformio and the USB port on the printer.

This whole process is covered in much more detail in the following videos, which I used as a basis for getting this done:

Levelling the Bed

I found levelling the bed to be somewhat tricky. I think the main problem was that I kept getting the extruder too close to the bed which would inhibit the flow of plastic from the nozzle. However, after a few goes I was able to get it dialed in reasonably well. From the performance of the printer so far it looks like this doesn’t have to be perfect, just good enough. The following video shows the process that I followed:

First Prints

For my first print I just started one of the test gcode files that come on the SD card. This was supposed to be a pig. However, I knew that it wouldn’t complete because the only filament I had was the small amount which is supplied with the printer. I was mainly just wanting to see the printer running!

ender 3 first print
Well that’s half a pig, right?

I had to wait another two weeks for the first of my rolls of filament to arrive, which was pretty frustrating. Once that arrived I sliced the 3DBenchy test model using Cura and printed it. To my dismay it didn’t come out as I expected on the first try:

ender 3 failed benchy
Oops!

After a little bit of head scratching I realised that I had a Cura setting enabled from a previous model which I had been slicing. The specific setting was “Spiralize Outer Contour” A.K.A “Vase Mode”, which explains the issue. After disabling this setting and re-slicing, the resulting gcode produced a pretty nice Benchy in around 2 hours.

ender 3 successful benchy
Finally, success!

Conclusions

Since proving that the printer is functioning correctly with a decent benchy print, I’ve done a few other prints. I’ll probably cover some of these in future articles. For my regular readers – don’t worry! This isn’t going to turn into a 3D printing blog. However, there will be the occasional post on the subject (probably more than occasional in the next couple of months as I’m learning about it).

I’m actually really impressed with the whole 3D printing ecosystem and the Ender 3 specifically. I really like how my whole 3D printing pipeline is Open Source. This goes right from creating the models (with OpenSCAD, FreeCAD or Blender), through slicing (Cura) and down to the machine itself and the Marlin firmware (and Octoprint, eventually).

Coming back specifically to the Ender 3, I’m pleased with the print quality. So far I’ve only printed with a 0.2mm layer height, but it supports down to 0.1mm. The objects I’ve produced, whilst obviously 3D printed, are of good detail and finish and are very strong. Overall, I’d recommend this printer to anyone who has the technical know how to get it up and running and perform the required firmware update.

Next Steps

Print more things! I really love the feeling of holding an object which just didn’t exist a few hours before. I have a whole list of objects to print, including some upgrades to the printer. For some of these I’ve been waiting for my black filament to arrive, so I should be able to get on with them soon.

I also need to get my Octoprint setup completed. It’s actually mostly there, I just have a couple more configuration steps to go. I also want to print a case for the Pi, which will fit underneath the bed of the printer. I hope to report back on my progress with that pretty soon.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

home assistant android shortcut

Quick Home Assistant Shortcuts for Android

This week’s post is going to be a little shorter than normal, since I’ve been working on projects which aren’t ready to share yet. However, I did want to share a quick, but useful approach to controlling smart devices directly from the home screen of your Android device via some Android shortcuts and Home Assistant.

home assistant android shortcut
The home screen housing my HASS shortcuts

Let’s Get Started

The app I’m using to do this is called HTTP Shortcuts. It’s both free (as in beer) and Open Source. You could accomplish the same thing with Tasker and even though I also use Tasker, I still use this app. This is because it’s just a nice simple app that does one thing well. You can also execute these shortcuts from within Tasker if you would like.

home assistant android shortcut
Our HTTP Shortcuts variables

Before we create a shortcut, we’re going to add a couple of variables to make life a bit easier when creating multiple shortcuts. To do this press the {} icon in the top left corner. We are going to add two variables, one for the the HASS server name and another for the authentication token.

The first of these is easy, simply add a new variable with the plus icon and enter the key as ha_server. Leave the type as constant and add the hostname of your HASS server as the value. Make sure to not include the URL scheme here (http:// or https://) since this appears to throw off the regex validation when creating the shortcut.

For the second variable follow the same process, however you’ll want to insert the value of the long lived access token you get from your Home Assistant user’s profile page in the value field.

Creating Shortcuts

With that bit of prep done we can navigate back to the main page. Here we will add a new shortcut by again clicking the plus icon. First give your shortcut a name and description. These are only for your own use. You might also want to choose an icon at this point.

Next, hit the ‘Basic Request Settings’ item to and set the method to POST. In the URL field we enter the value:

Here you’ll replace [domain] with the entity type e.g. switch and [service] with the service name, e.g. turn_on. I find that using the toggle service is the most useful for these shortcuts, since you are usually using them in visual range of the device and it basically gives you two shortcuts in one.

The {ha_server} notation should be left as it is. This is using the variable we created earlier. Variables can actually be easily inserted with the {} button so there is no need to type it. You’ll also want to adjust my example above if you are using plain HTTP rather than HTTPS.

Adding Headers

The next stage of this is to add a custom header to our request. This is the Authorization header which has the value Bearer {ha_auth_token}. You’ll see here that we use the other of our variables to include the authentication token.

home assistant android shortcut
Our HTTP Header

Request Body

Finally we come to the request body. Here we set the request body type to ‘Custom Text’. The content type field here should again be set to application/json and the value should be the JSON to pass to the service. In the simple case this would just contain the entity ID you want to operate on, e.g {"entity_id": "switch.myswitch"}. Of course this might contain more data depending on the service you are calling. The data is exactly the same as that passed to the services developer tool in HASS, so you can test your calls there and then copy the data over.

home assistant android shortcut
Our finished shortcut

Final Steps

Once you’ve finished you can add the shortcut to the home screen via the long press menu on the main listing page. You can also export a copy of all your shortcuts via the settings page for use on another device. I don’t think this exports the values of your variables unless you checked the ‘Allow Share…’ advanced setting for the variable itself. This is most likely a good thing, since you’re going to want to use a different authentication token on each device.

I hope you’ve found this useful. I use these shortcuts every day and find them invaluable. The app has quite a bit more functionality which I haven’t covered here, including cURL import/export and processing of responses using Javascript.

For my regular readers, I apologise for the shorter post. It’s going to be a busy month coming up, but hopefully I’ll be back to my usual schedule soon. Thanks for sticking with me!

ansible roles

Automating My Infrastructure with Ansible and Gitlab CI: Part 2 – Deploying Stuff with Roles

This post may contain affiliate links. Please see the disclaimer for more information.

In the first post of this series, I got started with Ansible running in Gitlab CI. This involved setting up the basic pipeline, configuring the remote machines for use with our system and making a basic playbook to perform package upgrades. In this post we’re going to build on top of this to create a re-usable Ansible role to deploy some software and configuration to our fleet of servers. We will do this using the power of Ansible roles.

In last week’s post I described my monitoring system, based on checkmk. At the end of the post I briefly mentioned that it would be great to use Ansible to deploy the checkmk agent to all my systems. That’s what I’m going to describe in this post. The role I’ve created for this deploys the checkmk agent from the package download on my checkmk instance and configures it to be accessed via SSH. It also installs a couple of plugins to enable some extra checks on my systems.

A Brief Aside: ansible-lint

In my previous post I set up a job which ran all the playbooks in my repository with the --check flag. This performs a dry run of the playbooks and will alert me to any issues. In that post I mentioned that all I was really looking for was some kind of syntax/sanity checking on the playbooks and didn’t really need the full dry run. Several members of the community stepped forward to suggest ansible-lint – thanks to all those that suggested it!

I’ve now updated my CI configuration to run ansible-lint instead of the check job. The updated job is shown below:

This is a pretty basic use of ansible-lint. All I’m doing is running it on all the playbooks in my playbooks directory. I do skip a single rule (403) with the -x argument. The rule in question is about specifying latest in package installs, which conflicts with my upgrade playbook. Since I’m only tweaking this small thing I just pass this via the CLI rather than creating a config file.

I’ve carried the preflight jobs and the ansible-lint job over to the CI configuration for my new role (described below). Since this is pretty much an exact copy of that of my main repo, I’m not going to explain it any further.

Creating a Base Role

I decided that I wanted my roles self contained in their own git repositories. This keeps everything a bit tidier at the price of a little extra complexity. In my previous Ansible configuration I had all my roles in the same repo and it just got to be a big mess after a while.

To create a role, first initialise it with ansible-galaxy. Then create a new git repo in the resulting directory:

I actually didn’t perform these steps and instead started from a copy of the old role I had for this in my previous configuration. This role has been tidied up and expanded upon for the new setup.

The ansible-galaxy command above will create a set of files and directories which provide a skeleton role. The first thing to do is to edit the README.md and meta/main.yml files for your role. Just update everything to suit what you are doing here, it’s pretty self explanatory. Once you’ve done this all the files can be added to git and committed to create the first version of your role.

Installing the Role

Before I move on to exactly what my role does, I’m going to cover how we will use this role in our main infrastructure project. This is done by creating a requirements.yml file which will list the required roles. These will then be installed by passing the file to ansible-galaxy. Since the installation tool can install from git we will specify the git URL as the installation location. Here are the contents of my requirements.yml file:

Pretty simple. In order to do the installation all we have to do is run the following command:

This will install the required Ansible roles to the playbooks/roles directory in our main project, where our playbooks can find them. The --force flag will ensure that the role always gets updated when we run the command. I’ve added this command in the before_script commands in the CI configuration to enable me to use the role in my CI jobs.

Now the role will be installed where we actually need it. However, we are not yet using it. I’ll come back to this later. Let’s make the role actually do something first!

What the Role Does

The main behaviour of the role is defined in the tasks/main.yml file. This file is rather long, so I won’t reproduce this here. Instead I’ll ask you to open the link and follow along with my description below:

  • The first task creates a checkmk user on the target system. This will be used by checkmk to log in and run the agent.
  • The next task creates a .ssh directory for the checkmk user and sets it’s permissions correctly.
  • Next we create an authorized_keys file for the user. This uses a template file which will restrict what the key can do. The actual key comes from the checkmk_pub_key variable which will be passed in from the main project. The template is as follows:
  • Next are a couple of tasks to install some dependent packages for the rest of the role. There is one task for Apt based systems and another for Yum based systems. I’m not sure if the monitoring-plugins package is actually required. I had it in my previous role and have just copied it over.
  • The two tasks remove the xinetd package on both types of system. Since we are accessing the agent via SSH we don’t need this. I was previously using this package for my agent access so I want to make sure it is removed. This behaviour can be disabled by setting the checkmk_purge_xinetd variable to false.
  • The next task downloads the checkmk agent deb file to the local machine. This is done to account for some of the remote servers not having direct access to the checkmk server. I then upload the file in the following task. The variables checkmk_server, checkmk_site_name and checkmk_agent_deb are used to specify the server address, monitoring instance (site) and deb file name. The address and site name are designed to be externally overridden by the main project.
  • The next two tasks repeat the download and upload process for the RPM version of the agent.
  • We then install the correct agent in the next two tasks.
  • The following task disables the systemd socket file for the agent to stop it being accessible over an unencrypted TCP port. Right now I don’t do this on my CentOS machines because they are too old to have systemd!
  • The final few tasks get in to installing the Apt and Docker plugins on systems that require it. I follow the same process of downloading then uploading the files and make them executable. The Docker plugin requires that the docker Python module be installed, which we achieve via pip. It also requires a config file, which as discussed in my previous post needs to be modified. I keep my modified copy in the repository and just upload it to the correct location.

The variables that are used in this are specified in the vars/main.yml and defaults/main.yml files. The default file contains the variables that should be overridden externally. I don’t specify a default for the SSH public key because I couldn’t think of a sensible value, so this at least must be specified for the role to run.

With all this in place our role is ready to go. Next we should try that from our main project.

Applying the Role

The first thing to do is to configure the role via the variables described above. I did this from my hosts.yml file which is encrypted, but the basic form is as follows:

The public key has to be that which will be used by the checkmk server. As such the private key must be installed on the server. I’ll cover how to set this up in checkmk below.

Next we have the playbook which will apply our role. I’ve opted to create a playbook for applying common roles to all my systems (of which this is the first). This goes in the file playbooks/common.yml:

This is extremely basic, all it does is apply the checkmk_agent role to all servers.

The corresponding CI job is only marginally more complex:

With those two in place a push to the server will start the pipeline and eventually deploy our role to our servers.

ansible roles
Our updated CI pipeline showing the ansible-lint job and the new common-roles job

Configuring Checkmk Agent Access via SSH

Of course the deployment on the remote servers is only one side of the coin. We also need to have our checkmk instance set up to access the agents via SSH. This is documented pretty well in the checkmk documentation. Basically it comes down to putting the private key corresponding to the public key used earlier in a known location on the server and then setting up an “Individual program call instead of agent access” rule in the “Hosts and Service Parameters” page of WATO.

I modified the suggested SSH call to specify the private key and user to use. Here is the command I ended up using in my configuration.

When you create the rule you can apply it to as many hosts as you like. In my setup this is all of them, but you should adjust as you see fit.

ansible roles
The checkmk WATO rule screen for SSH agent access

Conclusion

If you’ve been following along you should now be able to add new hosts to your setup (via hosts.yml) and have the checkmk agent deployed on them automatically. You should also have an understanding of how to create reasonably complex Ansible roles in external repositories and how to use them in your main Ansible project.

There are loads of things about roles that I haven’t covered here (e.g. handlers). The best place to start learning more would be the Ansible roles documentation page. You can then fan out from there on other concepts as they arise.

Next Steps

So far on this adventure I’ve tested my playbooks and roles by just making sure they work against my servers (initially on a non-critical one). It would be nice to have a better way to handle this and to be able to run these tests and verify that the playbook is working from a CI job. I’ll be investigating this for the next part of this series.

The next instalment will probably be delayed by a few weeks. I have something else coming which will take up quite a bit of time. For my regular readers, there will still be blog posts – they just won’t be about Ansible or CI. This is probably a good thing, I’ve been covering a lot of CI stuff recently!

As always please get in contact if you have any feedback or improvements to suggest, or even if you just want to chat about your own Ansible roles.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

checkmk docker container

Monitoring All The Things with checkmk

This post may contain affiliate links. Please see the disclaimer for more information.

Once your number of connected devices grows to a certain size, it becomes difficult to keep track of them all manually. At this point you’re going to want to turn to software to do this job for you. Network monitoring software fills this need nicely. The old stable Nagios used to be my go to for this. However, I switched to checkmk in the last couple of years and have found it quite superior. I’ve recently been doing some maintenance and upgrades to my monitoring setup, so now was a good time to write it up.

Checkmk is an Open Source infrastructure and application monitoring tool. It will keep track various attributes of your networked devices and alert you if they fall outside of pre-programmed thresholds. At its core it wraps Nagios, but provides a nicer UI and GUI configuration tool. checkmk also supports autodiscovery of services and checks to be performed on each system. This means you can spin up a monitoring system with great coverage in less than half the time you would spend fiddling around with a Nagios configuration.

Monitoring vs. Metrics

I’m going to take a little time to discuss the modern trend towards “metrics” based monitoring and how it relates to more traditional monitoring approaches. Modern metrics based systems such as Prometheus and Influxdb collect a series of aggregated data points about a running system. Typically these are written to a time series database (Influxdb actually is purely a time series database and requires external data collection tools). This data would then be used to feed some graphing/dashboarding tool, such as Grafana and also to generate alerts.

This is a different approach to traditional monitoring. To my mind metrics gathering is more passive. The system is only asking what the other systems (or typically only the applications on those systems) know about themselves. It is not actually probing the network to check that things are working. Monitoring systems not only collect data from systems, but actually probe the network to create data, for example whether a given service is responding or not.

The metrics gathering approach works well if all you care about are the applications. Obviously the application knows everything about itself, so why not ask it? It’s particularly popular in cloud environments where someone else is caring about the underlying infrastructure. Anywhere you care about physical infrastructure you’d probably be better off with a monitoring system first. Of course, this doesn’t preclude adding a metrics system later!

I think its important not to get these tools mixed up, they serve different purposes, even if there is some overlap.

Let’s Start Monitoring

I’ve had checkmk installed inside an LXD container for some time. In that time it’s been running pretty much flawlessly and alerting me to issues with my network as they come up. I recently decided it was time for an update, since I was still running 1.4.x and 1.5 had been out for a while. In the process I thought I’d move it into Docker, as I’ve been moving everything else. However, I ran into some issues with that.

I had several issues moving my configuration to the new server and getting it to run in the Docker container. Most of these could have been solved if I’d migrated the configuration to the new version (described below) before moving it. I did hit one show stopper issue though. This came about because my Docker data volumes are stored on NFS. Basically it looks like NFS doesn’t support file locking very well. This causes checkmk to throw “Bad file descriptor” errors all over the place.

checkmk nfs error
Oops, looks like checkmk doesn’t like it’s sites directory on NFS

In the end I decided to stick with my LXD container for this system. Eventually this will be migrated to an LXC container when I switch the host server over to Proxmox. The Docker setup is not actually recommended for a full monitoring install anyway, so I don’t feel too bad about this.

This is all a long winded way of saying, go and follow the Linux installation instructions if you are installing checkmk from scratch!

Upgrading My Configuration

Since I already had an existing system I just updated it by installing the new version. In the process checkmk 1.6 was released (good timing!), so I actually did this twice. To do the update I basically just downloaded the latest .deb file and installed it with dpkg -i.

It should be noted that this won’t overwrite your current install. There is a good reason for this, since checkmk has a configuration migration step which must be performed. This can be done using the omd tool. The steps to do so look like this:

Assuming that all went OK, you’re good to start actually monitoring other systems. We’ll cover the things I monitor in the next few sections.

Monitoring Standard Linux Systems

This is the easy one and pretty much standard fare for any monitoring system. Just install the checkmk agent for your distribution by following the official instructions and off you go. When you add a new host via WATO (the configuration GUI) checkmk will auto-discover as many services as it can for you. There are also a whole load of plugins that can be deployed to monitor other services.

checkmk linux server
Some of the service checks running against “eomer” my home automation Docker host

In my system, Linux machines are the majority of the hosts. This includes physical host machines, VMs, LXC/LXD containers and Raspberry Pis (since these are just Debian machines really). The only thing I haven’t had much luck with are my Libreelec machines. This seems to be because Libreelec doesn’t include a shell capable of running the agent script. I’ve been wondering if running the agent in a sufficiently privileged Docker container would work. However for now I’m just monitoring them externally (see “Monitoring Other Devices” below).

Monitoring pfSense

Since pfSense is really just FreeBSD underneath the checkmk BSD agent works fine with it. pfSense also supports SNMP out of the box. Therefore you can get more monitoring coverage by using an Agent+SNMP approach. There is a great tutorial over at Open School Solutions, which is what I followed to set all that up.

checkmk pfsense
pfSense will report the status of all its network interfaces among its checks

Monitoring OpenWRT Devices

OpenWRT devices can be monitored just like Linux machines. However you will need to install the OpenWRT specific agent from the agents page of your checkmk install. The agent can be run via either xinetd or SSH just as on a standard Linux machine.

OpenWRT devices will also return some extra information over SNMP if mini_snsmpd is installed on them. This makes the same Agent+SNMP approach adopted for pfSense desirable.

Monitoring Docker Containers

This one is new to me since I’ve just set it up after updating to the new version. The basic idea here is that you set up monitoring on the Docker host as you would a standard Linux machine. However, you also install the mk_docker.py plugin. This will do two things. Firstly it will expose a load of Docker checks on the host the next time you run service discovery. These by themselves are pretty useful. Secondly it will allow you to add each Docker container on the host as a host in it’s own right. The check data for the Docker container host is piggybacked from the agent connection to its Docker host.

At first I thought this was going to be pretty unwieldy, since you have to manually add a host entry for each Docker container. However, checkmk provides us with some tools to make this easier, which I’ll cover below.

Before you finish the setup on the Docker host, I’d recommend that you set the container_id variable in the /etc/checkmk/docker.cfg config file to name. This will allow you to add your container hosts to checkmk by name rather than by hex ID, which is much more human readable. It also avoids the situation where the hex ID changes if the container is destroyed and re-created. This obviously happens during an update of the container.

Adding Docker Container Hosts Quickly

checkmk wato folder
Setting up a host folder in WATO allows you to apply the same initial options to a bunch of hosts quickly

The next thing to do is to create a directory in WATO. Adding a host to a directory automatically populates the host with the configuration template used when the directory was created. This makes it super quick to just run through adding hosts with the same configuration such as our Docker containers.

The third thing to do is to create a new Host Group for your Docker containers. In this way you can easily see separate listings of your containers and other hosts on the monitoring dashboards.

checkmk hostgroup
Creating a host group for my Docker contains allow me to view just the statuses of the containers

With these things in place I’m finding the Docker container monitoring pretty useful in my fairly static environment. Due to the manual steps involved it’s really not going to work in a highly dynamic environment, but it works for my needs.

checkmk docker container
My Gotify container is unhappy – because I artificially stopped it for the screenshot!

Monitoring Windows Machines

Obviously checkmk also has an agent for Windows, so you can monitor those machines too. If you have any. I don’t, so I can’t comment on how well it works!

Monitoring Other Devices

For monitoring other devices that don’t support any of the available agents, you’re pretty much limited to external probing. With this you can get basic information on whether the device is up or down (via ping) and can check the availability of services on any open ports via manual checks.

Manual checks are in fact useful against any of the systems listed above as an external verification that a service is responding. As a minimum I typically implement an SSH check as well as HTTP checks on any open web services.

I have this approach deployed against several devices on my network including the two Libreelec devices mentioned above, my HDHomeRun and various IoT devices. It works pretty well – especially as the usual problem with these devices is that someone has unplugged them!

Conclusion

I hope this has given you a glimpse into my monitoring setup. It’s impossible to describe every part of it in full detail so I’ve opted for presenting only an overview here. Even after nearly two years I’m still adding and changing stuff on this system, so I may write up some of the more interesting details in future.

I’m really happy with my choice of checkmk to drive my monitoring system. It’s a nice blend of the reliability and stability of Nagios, with a layer of UI which makes it much easier to get a fully featured system. The recent upgrades have provided some useful features and it’s great to see it under such active development!

Next Steps

I still have some corners of my network where my monitoring setup doesn’t cover. Mainly just through lack of time to get it all set up. I’ll be looking to Ansible to help with this soon.

I’d also like to add some kind of metrics gathering system, probably Influxdb. This will be for Home Assistant data and metrics from my Traefik proxies (as well as anything else I can feed to it). I’d also like to round out the trifecta of observation systems with a log storage/aggregation system. I just wish there was something more lightweight than an ELK stack!

If you have any comments or improvements to my setup, as always, let me know in the feedback channels. I’m also interested in how others are handling this, so get in touch and let me know.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

ansible gitlab ci

Automating My Infrastructure with Ansible and Gitlab CI: Part 1 – Getting Started

This post may contain affiliate links. Please see the disclaimer for more information.

This is the first part in a multi-part series following my adventures in automating my self-hosting infrastructure with Ansible, running from Gitlab CI. In this post I’ll cover setting up my Ansible project, setting up the remote machines for Ansible/CI deployment, some initial checks in CI and automating of routine updates via our new system.

I’ve used Ansible quite extensively in the past, but with my recent focus on Docker and Gitlab CI I thought it was worth having a clean break. Also my previous Ansible configurations were a complete mess, so it’s a good opportunity to do things better. I’ll still be pulling in parts of my old config where needed to prevent re-inventing the wheel.

Since my ongoing plan is to deploy as many of my applications as possible with Docker and docker-compose, I’ll be focusing mainly in tasks relating to the host machines. Some of these will be set up tasks which will deploy a required state to each machine. Others will be tasks to automate routine maintenance.

Inventory Setup

Before we get started, I’ll link you to the Gitlab repository for this post. You’ll find that some of the files are encrypted with ansible-vault, since they contain sensitive data. Don’t worry though, I’ll go through them as examples, starting with hosts.yml.

The hosts.yml file is my Ansible inventory and contains details of all the machines in my infrastructure. Previously, I’d only made use of inventories in INI format, so the YAML support is a welcome addition. The basic form of my inventory file is as follows:

To create this file we need to use the command ansible-vault create hosts.yml. This will ask for a password which you will need when running your playbooks. To edit the file later, replace the create subcommand with edit.

As you can see we start out with the top level group called all. This group has several variables set to configure all the hosts. The first of these ensures that we are using Python 3 on each remote system. The other two set the remote user and the SSH key used for authentication. I’m connecting to all of my systems with a specific user, called ci, which I will handle setting up in the next section.

The remainder of the file specifies the remote hosts in the infrastructure. For now I’ve divided these up into three groups, corresponding to VMs, Raspberry Pis and physical host machines. It’s worth noting that a host can be in multiple groups, so you can pretty much get as complicated as you like.

Each host has several variables associated with it. The first of these is the password for sudo operations. I like each of my hosts to have individual passwords for security purposes. The second variable (password_hash) is a hashed version of the same password which we will use later when setting up the user. These hashes are generated with the mkpasswd command as per the documentation. The final variable (ansible_host) I’m using here is optional and you only need to include it if the hostname of the server in question can’t be resolved via DNS. In this case you can specify an IP address for the server here.

In order to use this inventory file we need to pass the -i flag to Ansible (along with the filename) on every run. Alternatively, we can configure Ansible to use this file by creating an ansible.cfg file in our current directory. To do this we download the template config file and edit the inventory line so it looks like this:

Setting Up the Remote Machines

At this point you should comment out the ansible_user and ansible_ssh_private_key_file lines in your inventory, since we don’t currently have a ci user on any of our machines and we haven’t added the key yet. This we will take care of now – via Ansible itself. I’ve created a playbook which will create that user and set it up for use with our Ansible setup:

Basically all we do here is create the user (with the password from the inventory file) and set it up for access via SSH with a key. I’ve put this in the playbooks subdirectory in the name of keeping things organised. You’ll also need a playbooks/files directory in which you should put the ci_authorized_keys file. This file will be copied to .ssh/authorized_keys on the server, so obviously has that format. In order to create your key generate it in the normal way with ssh-keygen and save it locally. Copy the public part into ci_authorized_keys and keep hold of the private part for later (don’t commit it to git though!).

Now we should run that against our servers with the command:

This will prompt for your vault password from earlier and then configure each of your servers with the playbook.

Spinning Up the CI

At this point we have our ci user created and configured, so we should uncomment those lines in our inventory file. We can now perform a connectivity check to check that this worked:

If that works you should see success from each host.

Next comes our base CI setup. I’ve imported some of my standard preflight jobs from my previous CI pipelines, specifically the shellcheck, yamllint and markdownlint jobs. The next stage in the pipeline is a check stage. Here I’m putting jobs which check that Ansible itself is working and that the configuration is valid, but don’t make any changes to the remote systems.

I started off with a generic template for Ansible jobs:

This sets up the CI environment for Ansible to run. We start from a base Python 3.7 image and install Ansible via Pip. This takes a little bit of time doing it on each run so it would probably be better to build a custom image which includes Ansible (all the ones I found were out of date).

The next stage is to set up the authentication tokens. First we write the ANSIBLE_VAULT_PASSWORDvariable to a file, followed by the DEPLOYMENT_SSH_KEY variable. These variables are defined in the Gitlab CI configuration as secrets. The id_rsa file requires its permissions set to 0600 for the SSH connection to succeed. We also make sure to remove both these files after the job completes.

The last thing to set are a couple of environment variables to allow Ansible to pick up our config file (by default it won’t in the CI environment due to a permissions issue). We also need to tell it the vault key file to use.

Check Jobs

I’ve implemented two check jobs in the pipeline. The first of these performs the ping action which we tested before. This is to ensure that we have connectivity to each of our remote machines from the CI runner. The second iterates over each of the YAML files in the playbooks directory and runs them in check mode. This is basically a dry-run. I’d prefer if this was just a syntax/verification check without having to basically run through the whole thing, but there doesn’t seem to be a way to do that in Ansible.

The jobs for both of these are shown below:

Before these will run on the CI machine, we need to make a quick modification to our ansible.cfg file. This will allow Ansible to accept the SSH host keys without prompting. Basically you just uncomment the host_key_checking line and ensure it is set to false:

Doing Something Useful

At this stage we have our Ansible environment set up in CI and our remote machines are ready to accept instructions. We’ve also performed some verification steps to give us some confidence that we are doing something sensible. Sounds like we’re ready to make this do something useful!

Managing package updates has always been a pain for me. Once you get beyond a couple of machines, manually logging in and applying updates becomes pretty unmanageable. I’ve traditionally taken care of this on a Saturday morning, sitting down and applying updates to each machine whilst having my morning coffee. The main issue with this is just keeping track of which machines I already updated. Luckily there is a better way!

[[Side note: Yes, before anyone asks, I am aware of the unattended-upgrades package and I have it installed. However, I only use the default configuration of applying security updates automatically. This is with good reason. I want at least some manual intervention in performing other updates, so that if something critical goes wrong, I’m on hand to fix it.]]

With our shiny new Ansible setup encased in a layer of CI, we can quite easily take care of applying package upgrades to a whole fleet of machines. Here’s the playbook to do this (playbooks/upgrades.yml):

This is pretty simple, first we apply it to all hosts. Secondly we specify the line serial: 2 to run only on two hosts at a time (to even out the load on my VM hosts). Then we get into the tasks. These basically run an upgrade selectively based upon the OS in question (I still have a couple of CentOS machines knocking around). Each of the update tasks will perform the notify block if anything changes (i.e. any packages got updated). In this case all this does is execute the Reboot handler, which will reboot the machine. The when clause of that handler causes it not to execute if the machine is in the physical group (so that my host machines are not updated). I still need to handle not rebooting the CI runner host, but so far I haven’t added it to this system.

We could take this further, for example snapshotting any VMs before doing the update, if our VM host supports that. For now this gets the immediate job done and is pretty simple.

I’ve added a CI job for this. The key thing to note about this is that it is a manual job, meaning it must be directly triggered from the Gitlab UI. This gives me the manual step I mentioned earlier:

Now all I have to do on a Saturday morning is click a button and wait for everything to be updated!

ansible gitlab ci
Our finished pipeline. Note the state of the final “package-upgrades” job which indicates a manual job. Gitlab helpfully provides a play button to run it.

Conclusion

I hope you’ve managed to follow along this far. We’ve certainly come a long way, from nothing to an end to end automated system to update all our servers (with a little manual step thrown in for safety).

Of course there is a lot more we can do now that we’ve got the groundwork set up. In the next instalment I’m going to cover installing and configuring software that you want to be deployed across a fleet of machines. In the process we’ll take a look at Ansible roles. I’d also like to have a look at testing my Ansible roles with Molecule, but that will probably have to wait for yet another post.

I’m interested to hear about anyone else’s Ansible setup, feel free to share via the feedback channels!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.