ansible roles

Automating My Infrastructure with Ansible and Gitlab CI: Part 3 – SSH Keys and Dotfiles

This post may contain affiliate links. Please see the disclaimer for more information.

This is the third post in a series, you can find the first two installments here and here.

Having recently reinstalled on both of my client machines, I took the opportunity to rotate my SSH keys. Luckily I backed up the old keys before doing this, so I didn’t lock myself out of anything. However, it did leave me having to update the authorized_keys files on all my servers (about 15 at last count). Of course there is a better way than doing this all manually, so cue some Ansible automation!

While I was at it I decided it would be nice to deploy my dotfiles across all my machines. I’ve had them stored in a git repo for some time and manage them with GNU Stow. However, I would never get around to deploying the repo onto new machines and installing all the relevant tools. Writing the Ansible automation to do this was pretty tricky, but I got there in the end. I also added my client machines to my Ansible inventory so that they get the same setup deployed to them.

Getting Started

For those who haven’t read the previous installments in this series, all the code for this article is going in my main ansible-infrastructure repo on GitLab.

I started out by installing some base packages that I would need for the rest of the steps. This is complicated slightly by having different distributions on different machines. The servers are all running Ubuntu or Debian (usually in the form of Raspbian or Proxmox), whilst the clients are running Manjaro (i.e. Archlinux). This is easily dealt with in Ansible by way of a set of checks against ansible_distribution:

- hosts: all
  tasks:
    - name: Install common apt packages
      become: true
      when: ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'
      apt:
        pkg:
          - vim
          - git
          - tmux
          - htop
          - dnsutils
          - ack-grep
          - stow
          - zsh
          - build-essential
          - python-dev
          - python3-dev
          - cmake
          - curl

    - name: Install common pacman packages
      become: true
      when: ansible_distribution == 'Archlinux'
      pacman:
        name:
          - vim
          - git
          - tmux
          - htop
          - bind-tools
          - ack
          - stow
          - zsh
          - cmake
          - clang
          - curl

You’ll see here that some of the packages are named differently in the different distros. We also need to use different package management modules for each.

It should be noted that I didn’t start out with this full list, but instead just added a few basics (e.g. git, vim, etc) and added more as I encountered the need for them. You’ll also notice that I take the opportunity to install any nice utilities that I like to have everywhere, such as htop and dig (provided by dnsutils/bind-tools).

Deploying SSH Keys

Deploying the SSH keys turned out to be fairly trivial, despite being the main task that I wanted to accomplish here. This is thanks to the excellent authorized_key module in Ansible:

- name: Set up authorized keys
  become: true
  authorized_key:
    user: '{{ admin_user }}'
    state: present
    key: '{{ item }}'
  with_file:
    - public_keys/aragorn.pub
    - public_keys/arathorn.pub
    - public_keys/work.pub
    - public_keys/phone.pub

Here I add a set of four keys from the repository, by way of the with_file clause in Ansible. I copied all the public keys into the playbooks/files/public_keys directory for ease of access. This also makes it easy to rotate keys as we’ll see below.

I set the user to add the keys to a custom variable called admin_user. This variable is set to a default value at the top of my inventory file and then overriden it for certain hosts or groups. For example I use the standard pi user on my Raspberry Pis, so the variable is set to pi for the rpis group. This ensures that the keys always get installed for the right user.

I also wanted to remove the old keys from my machines, which is pretty straightforward:

- name: Remove old authorized keys
  become: true
  authorized_key:
    user: '{{ admin_user }}'
    state: absent
    key: '{{ item }}'
  with_file:
    - public_keys/riker.pub.deprecated

Now if I want to rotate keys in future, I’ll just add the new key to the repository, rename the old key to remind myself that it’s no longer in use and update the file lists in these tags. Done!

A Minor Detour

I haven’t mentioned so far this article that all of this is running through the Gitlab CI pipeline I built in my original post. In fact this is most likely going to be a GitLab CI article, without much GitLab CI stuff. That’s because the previous pipeline has been working brilliantly.

However, one issue has been the speed. Making changes, committing, pushing and waiting for the pipeline to complete takes quite a long time. It was pretty frustrating given the number of iterations I needed to get this right!

I noticed that it took quite a while each time to install Ansible and Ansible Lint in the containers and that this was done twice for each pipeline. Given my recent success with custom Docker images, I built a quick image containing the tools I needed (with a 3 line Dockerfile!). I was able to quickly copy over the previous docker build pipeline and get this building via CI. All I then had to do was update the images used in my main pipeline and remove the old installation commands. Boom, much faster!

You can check out my Ansible image on GitLab and pull it with the command:

$ docker pull registry.gitlab.com/robconnolly/docker-ansible

I haven’t set up a periodic build for this yet, but I’m intending that this image will be automatically updated on a weekly basis.

Deploying My Dotfiles

My dotfiles are deployed from a private repository on my internal Gitea instance. So far I haven’t published them as they contain quite a few unredacted details of my network. In order to deploy them I generated a new SSH key and added it as a deploy key to the project in Gitea.

ansible dotfiles
Deploy keys in Gitea are added in the project Settings->Deploy Keys

I then encrypted the private key with Ansible vault (I added the public key to the repo too, in case I need it again in future):

$ ansible-vault encrypt playbooks/files/dotfiles_deploy_key

I then copy the private key to each of the machines which need it:

- hosts: all,!cloud,!rpis
  serial: 2
  tasks:
    - name: Copy deploy key
      become: true
      become_user: "{{ admin_user }}"
      copy:
        src: dotfiles_deploy_key
        dest: "/home/{{ admin_user }}/.dotfiles_deploy_key"
        owner: "{{ admin_user }}"
        group: "{{ admin_user }}"
        mode: 0600

You’ll note that the above is in a new play to the previous steps. That’s because I wanted to restrict which machines get my dotfiles. The cloud machined currently can’t access my Gitea instance, since I still need to deploy my OpenVPN setup to some of them. The Raspberry Pis have trouble with some of the later steps in the setup, so I’ve skipped them too for now. I’m also running this two hosts at a time, because of the compilation step (see below).

The next step is to simply clone the repository with the Ansible git module:

- name: Clone dotfiles repo
  become: true
  become_user: "{{ admin_user }}"
  git:
    repo: "{{ dotfiles_repo }}"
    dest: "/home/{{ admin_user }}/dotfiles"
    version: master
    accept_hostkey: true
    ssh_opts: "-i /home/{{ admin_user }}/.dotfiles_deploy_key"

The dotfiles_repo variable host the URL to clone the repository from and is again defined in my encrypted inventory file. I use the ssh_opts clause to set the key for git to use.

You’ll note that the tasks above all use become_user to switch to the admin_user. In order to get this to work on some of my hosts I had to set allow_world_readable_tmpfiles to true. This has some security implications, so you might want to tread carefully, if you have potentially untrustworthy users on your systems. It seemed to work without this set on Ubuntu based systems, but those with a pure Debian base had issues.

Running Stow

The next step was to unstow a few of the newly deployed files. For this we can use the command module and with_items:

- name: Unstow dotfiles
  become: true
  become_user: "{{ admin_user }}"
  command:
    chdir: "/home/{{ admin_user }}/dotfiles"
    cmd: "stow {{ item.name }}"
    creates: "{{ item.file }}"
  with_items:
    - { name: ssh, file: "/home/{{ admin_user }}/.ssh/config"}
    - { name: vim, file: "/home/{{ admin_user }}/.vimrc"}
    - { name: tmux, file: "/home/{{ admin_user }}/.tmux.conf"}

In the items list here I pass a dictionary of names and filenames. This is so that Ansible can use one of the files which should be created to know if it needs to run the command. These are accessed with the item.variable notation in the templates. I really like the templating in Ansible!

Dealing With Vim Plugins

My Vim config contains a load of plugins, which I manage with Vundle. Installing these should be pretty simple, but it confused me for ages because the command always seemed to fail (with no output!), even when I could see it working on the command line. As it turns out, the command will exit with a return code of one even when it is successful! You can see why I was confused! In the end I came up with:

- name: Install vundle
  become: true
  become_user: "{{ admin_user }}"
  git:
    repo: https://github.com/VundleVim/Vundle.vim.git
    dest: "/home/{{ admin_user }}/.vim/bundle/Vundle.vim"
    version: master

- name: Install vim plugins
  become: true
  become_user: "{{ admin_user }}"
  shell:
    cmd: 'vim -E -s -c "source /home/{{ admin_user }}/.vimrc" -c PluginInstall -c qa || touch /home/{{ admin_user }}/.vim/plugins_installed'
    creates: "/home/{{ admin_user }}/.vim/plugins_installed"

I ended up using the shell module from Ansible to create a file when the installation completes. This file is used as the check in Ansible for whether it should run the command again. The || operator has to be used here (rather than &&) due to the weird return code. This does however have the effect of changing the overall return code to zero, which makes Ansible happy.

The final step here is compiling the YouCompleteMe plugin, which is just running another command:

- name: Build ycm_core
  become: true
  become_user: "{{ admin_user }}"
  command:
    cmd: "./install.py --clang-completer {% if ansible_distribution == 'Archlinux' %}--system-libclang{% endif %}"
    chdir: "/home/{{ admin_user }}/.vim/bundle/YouCompleteMe"
    creates: "/home/{{ admin_user }}/.vim/bundle/YouCompleteMe/third_party/ycmd/ycm_core.so"
  environment:
    YCM_CORES: 1

You’ll see above that the command is different on Arch based systems, since I use the system libclang there to work around a compile issue. I also define the YCM_CORES environment variable. This limits the number of cores to one, which seems to stop the build running out of memory on small virtual machines!

Deploying Oh-My-Zsh

The final piece to this increasingly complex puzzle is installing Oh-My-Zsh to give me a nice Zsh environment. This is again accomplished with the shell module:

- name: Install oh-my-zsh
  become: true
  become_user: "{{ admin_user }}"
  shell:
    cmd: sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended
    creates: "/home/{{ admin_user }}/.oh-my-zsh"
  register: ohmyzsh

You’ll see here that I register a variable containing the status of this task. This is used in the next step to delete the default zshrc that the installer will create for us:

- name: Remove default zshrc
  become: true
  become_user: "{{ admin_user }}"
  file:
    name: "/home/{{ admin_user }}/.zshrc"
    state: absent
  when: ohmyzsh.changed

I then unstow my Zsh config as before:

- name: Unstow zsh config
  become: true
  become_user: "{{ admin_user }}"
  command:
    chdir: "/home/{{ admin_user }}/dotfiles"
    cmd: "stow zsh"
    creates: "/home/{{ admin_user }}/.zshrc"

This process is probably ripe for simplification, since I assume the installer wouldn’t overwrite an existing zshrc. If I unstowed the Zsh config earlier I could probably remove the file deletion, but I haven’t tried this to see if it works.

The absolute last step, is to switch the default shell for admin_user over to Zsh:

- name: Change shell to zsh
  become: true
  user:
    name: "{{ admin_user }}"
    shell: /usr/bin/zsh

Done!

Conclusion

Phew! That comes out as a pretty epic playbook. I’ve opted to keep this all in my common playbook for now since it’s getting run against every machine, along with my previous roles. I may split it up later however, if it becomes useful to do so.

The playbook works really well now and it’s nice to have the same environment on every machine. I also really like the centralised SSH key management, which solves a real issue for me.

One improvement I would like to make would be around the syncing of changes to the dotfiles repository to all the machines. This could be as simple as deploying a cron job to git pull that repository periodically, but I’d rather have it react to changes in to repo. I could move the repository to GitLab and run a pipeline which would deploy it, but this would mean duplicating my Ansible inventory (and keeping two copies up to date). I’m wondering if a webhook could be used to trigger the main CI pipeline?

I’m interested to know if anyone out there has solved similar problems in a different way. Please let me know in the comments!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

ansible roles

Automating My Infrastructure with Ansible and Gitlab CI: Part 2 – Deploying Stuff with Roles

This post may contain affiliate links. Please see the disclaimer for more information.

In the first post of this series, I got started with Ansible running in Gitlab CI. This involved setting up the basic pipeline, configuring the remote machines for use with our system and making a basic playbook to perform package upgrades. In this post we’re going to build on top of this to create a re-usable Ansible role to deploy some software and configuration to our fleet of servers. We will do this using the power of Ansible roles.

In last week’s post I described my monitoring system, based on checkmk. At the end of the post I briefly mentioned that it would be great to use Ansible to deploy the checkmk agent to all my systems. That’s what I’m going to describe in this post. The role I’ve created for this deploys the checkmk agent from the package download on my checkmk instance and configures it to be accessed via SSH. It also installs a couple of plugins to enable some extra checks on my systems.

A Brief Aside: ansible-lint

In my previous post I set up a job which ran all the playbooks in my repository with the --check flag. This performs a dry run of the playbooks and will alert me to any issues. In that post I mentioned that all I was really looking for was some kind of syntax/sanity checking on the playbooks and didn’t really need the full dry run. Several members of the community stepped forward to suggest ansible-lint – thanks to all those that suggested it!

I’ve now updated my CI configuration to run ansible-lint instead of the check job. The updated job is shown below:

ansible-lint:
  <<: *ansible
  stage: check
  script:
    - ansible-lint -x 403 playbooks/*.yml

This is a pretty basic use of ansible-lint. All I’m doing is running it on all the playbooks in my playbooks directory. I do skip a single rule (403) with the -x argument. The rule in question is about specifying latest in package installs, which conflicts with my upgrade playbook. Since I’m only tweaking this small thing I just pass this via the CLI rather than creating a config file.

I’ve carried the preflight jobs and the ansible-lint job over to the CI configuration for my new role (described below). Since this is pretty much an exact copy of that of my main repo, I’m not going to explain it any further.

Creating a Base Role

I decided that I wanted my roles self contained in their own git repositories. This keeps everything a bit tidier at the price of a little extra complexity. In my previous Ansible configuration I had all my roles in the same repo and it just got to be a big mess after a while.

To create a role, first initialise it with ansible-galaxy. Then create a new git repo in the resulting directory:

$ ansible-galaxy init ansible_role_checkmk_agent
$ cd ansible_role_checkmk_agent
$ git init .

I actually didn’t perform these steps and instead started from a copy of the old role I had for this in my previous configuration. This role has been tidied up and expanded upon for the new setup.

The ansible-galaxy command above will create a set of files and directories which provide a skeleton role. The first thing to do is to edit the README.md and meta/main.yml files for your role. Just update everything to suit what you are doing here, it’s pretty self explanatory. Once you’ve done this all the files can be added to git and committed to create the first version of your role.

Installing the Role

Before I move on to exactly what my role does, I’m going to cover how we will use this role in our main infrastructure project. This is done by creating a requirements.yml file which will list the required roles. These will then be installed by passing the file to ansible-galaxy. Since the installation tool can install from git we will specify the git URL as the installation location. Here are the contents of my requirements.yml file:

---
 - name: checkmk_agent
   scm: git
   src: git+https://gitlab.com/robconnolly/ansible_role_checkmk_agent.git
   version: master

Pretty simple. In order to do the installation all we have to do is run the following command:

$ ansible-galaxy install --force -r requirements.yml -p playbooks/roles/

This will install the required Ansible roles to the playbooks/roles directory in our main project, where our playbooks can find them. The --force flag will ensure that the role always gets updated when we run the command. I’ve added this command in the before_script commands in the CI configuration to enable me to use the role in my CI jobs.

Now the role will be installed where we actually need it. However, we are not yet using it. I’ll come back to this later. Let’s make the role actually do something first!

What the Role Does

The main behaviour of the role is defined in the tasks/main.yml file. This file is rather long, so I won’t reproduce this here. Instead I’ll ask you to open the link and follow along with my description below:

  • The first task creates a checkmk user on the target system. This will be used by checkmk to log in and run the agent.
  • The next task creates a .ssh directory for the checkmk user and sets it’s permissions correctly.
  • Next we create an authorized_keys file for the user. This uses a template file which will restrict what the key can do. The actual key comes from the checkmk_pub_key variable which will be passed in from the main project. The template is as follows:
command="/usr/bin/sudo /usr/bin/check_mk_agent",no-port-forwarding,no-x11-forwarding,no-agent-forwarding {{ checkmk_pub_key }}
  • Next are a couple of tasks to install some dependent packages for the rest of the role. There is one task for Apt based systems and another for Yum based systems. I’m not sure if the monitoring-plugins package is actually required. I had it in my previous role and have just copied it over.
  • The two tasks remove the xinetd package on both types of system. Since we are accessing the agent via SSH we don’t need this. I was previously using this package for my agent access so I want to make sure it is removed. This behaviour can be disabled by setting the checkmk_purge_xinetd variable to false.
  • The next task downloads the checkmk agent deb file to the local machine. This is done to account for some of the remote servers not having direct access to the checkmk server. I then upload the file in the following task. The variables checkmk_server, checkmk_site_name and checkmk_agent_deb are used to specify the server address, monitoring instance (site) and deb file name. The address and site name are designed to be externally overridden by the main project.
  • The next two tasks repeat the download and upload process for the RPM version of the agent.
  • We then install the correct agent in the next two tasks.
  • The following task disables the systemd socket file for the agent to stop it being accessible over an unencrypted TCP port. Right now I don’t do this on my CentOS machines because they are too old to have systemd!
  • The final few tasks get in to installing the Apt and Docker plugins on systems that require it. I follow the same process of downloading then uploading the files and make them executable. The Docker plugin requires that the docker Python module be installed, which we achieve via pip. It also requires a config file, which as discussed in my previous post needs to be modified. I keep my modified copy in the repository and just upload it to the correct location.

The variables that are used in this are specified in the vars/main.yml and defaults/main.yml files. The default file contains the variables that should be overridden externally. I don’t specify a default for the SSH public key because I couldn’t think of a sensible value, so this at least must be specified for the role to run.

With all this in place our role is ready to go. Next we should try that from our main project.

Applying the Role

The first thing to do is to configure the role via the variables described above. I did this from my hosts.yml file which is encrypted, but the basic form is as follows:

all:
  vars:
    checkmk_server: <server_ip>
    checkmk_site_name: <mysitename>
    checkmk_pub_key: <mypubkey>

The public key has to be that which will be used by the checkmk server. As such the private key must be installed on the server. I’ll cover how to set this up in checkmk below.

Next we have the playbook which will apply our role. I’ve opted to create a playbook for applying common roles to all my systems (of which this is the first). This goes in the file playbooks/common.yml:

---
- hosts: all
  roles:
    - { role: checkmk_agent }

This is extremely basic, all it does is apply the checkmk_agent role to all servers.

The corresponding CI job is only marginally more complex:

common-roles:
  <<: *ansible
  stage: deploy
  script:
    - ansible-playbook playbooks/common.yml
  only:
    refs:
      - master

With those two in place a push to the server will start the pipeline and eventually deploy our role to our servers.

ansible roles
Our updated CI pipeline showing the ansible-lint job and the new common-roles job

Configuring Checkmk Agent Access via SSH

Of course the deployment on the remote servers is only one side of the coin. We also need to have our checkmk instance set up to access the agents via SSH. This is documented pretty well in the checkmk documentation. Basically it comes down to putting the private key corresponding to the public key used earlier in a known location on the server and then setting up an “Individual program call instead of agent access” rule in the “Hosts and Service Parameters” page of WATO.

I modified the suggested SSH call to specify the private key and user to use. Here is the command I ended up using in my configuration.

/usr/bin/ssh -i /omd/sites/site_name/.ssh/id_rsa -o StrictHostKeyChecking=no checkmk@$HOSTADDRESS$

When you create the rule you can apply it to as many hosts as you like. In my setup this is all of them, but you should adjust as you see fit.

ansible roles
The checkmk WATO rule screen for SSH agent access

Conclusion

If you’ve been following along you should now be able to add new hosts to your setup (via hosts.yml) and have the checkmk agent deployed on them automatically. You should also have an understanding of how to create reasonably complex Ansible roles in external repositories and how to use them in your main Ansible project.

There are loads of things about roles that I haven’t covered here (e.g. handlers). The best place to start learning more would be the Ansible roles documentation page. You can then fan out from there on other concepts as they arise.

Next Steps

So far on this adventure I’ve tested my playbooks and roles by just making sure they work against my servers (initially on a non-critical one). It would be nice to have a better way to handle this and to be able to run these tests and verify that the playbook is working from a CI job. I’ll be investigating this for the next part of this series.

The next instalment will probably be delayed by a few weeks. I have something else coming which will take up quite a bit of time. For my regular readers, there will still be blog posts – they just won’t be about Ansible or CI. This is probably a good thing, I’ve been covering a lot of CI stuff recently!

As always please get in contact if you have any feedback or improvements to suggest, or even if you just want to chat about your own Ansible roles.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

ansible gitlab ci

Automating My Infrastructure with Ansible and Gitlab CI: Part 1 – Getting Started

This post may contain affiliate links. Please see the disclaimer for more information.

This is the first part in a multi-part series following my adventures in automating my self-hosting infrastructure with Ansible, running from Gitlab CI. In this post I’ll cover setting up my Ansible project, setting up the remote machines for Ansible/CI deployment, some initial checks in CI and automating of routine updates via our new system.

I’ve used Ansible quite extensively in the past, but with my recent focus on Docker and Gitlab CI I thought it was worth having a clean break. Also my previous Ansible configurations were a complete mess, so it’s a good opportunity to do things better. I’ll still be pulling in parts of my old config where needed to prevent re-inventing the wheel.

Since my ongoing plan is to deploy as many of my applications as possible with Docker and docker-compose, I’ll be focusing mainly in tasks relating to the host machines. Some of these will be set up tasks which will deploy a required state to each machine. Others will be tasks to automate routine maintenance.

Inventory Setup

Before we get started, I’ll link you to the Gitlab repository for this post. You’ll find that some of the files are encrypted with ansible-vault, since they contain sensitive data. Don’t worry though, I’ll go through them as examples, starting with hosts.yml.

The hosts.yml file is my Ansible inventory and contains details of all the machines in my infrastructure. Previously, I’d only made use of inventories in INI format, so the YAML support is a welcome addition. The basic form of my inventory file is as follows:

---
all:
  vars:
      ansible_python_interpreter: /usr/bin/python3
      ansible_user: ci
      ansible_ssh_private_key_file: ./id_rsa
  children:
    vms:
      hosts:
        vm1.mydomain.com:
          ansible_sudo_pass: supersecret
          password_hash: "[hashed version of ansible_sudo_pass]"
          # use this if the hostname doesn't resolve
          #ansible_host: [ip address]
        vm2.mydomain.com:
          ...
    rpis:
      # Uncomment this for running before the CI user is created
      #vars:
      #  ansible_user: pi
      hosts:
        rpi1.mydomain.com:
          ...
    physical:
      hosts:
        phys1.mydomain.com:
          ...

To create this file we need to use the command ansible-vault create hosts.yml. This will ask for a password which you will need when running your playbooks. To edit the file later, replace the create subcommand with edit.

As you can see we start out with the top level group called all. This group has several variables set to configure all the hosts. The first of these ensures that we are using Python 3 on each remote system. The other two set the remote user and the SSH key used for authentication. I’m connecting to all of my systems with a specific user, called ci, which I will handle setting up in the next section.

The remainder of the file specifies the remote hosts in the infrastructure. For now I’ve divided these up into three groups, corresponding to VMs, Raspberry Pis and physical host machines. It’s worth noting that a host can be in multiple groups, so you can pretty much get as complicated as you like.

Each host has several variables associated with it. The first of these is the password for sudo operations. I like each of my hosts to have individual passwords for security purposes. The second variable (password_hash) is a hashed version of the same password which we will use later when setting up the user. These hashes are generated with the mkpasswd command as per the documentation. The final variable (ansible_host) I’m using here is optional and you only need to include it if the hostname of the server in question can’t be resolved via DNS. In this case you can specify an IP address for the server here.

In order to use this inventory file we need to pass the -i flag to Ansible (along with the filename) on every run. Alternatively, we can configure Ansible to use this file by creating an ansible.cfg file in our current directory. To do this we download the template config file and edit the inventory line so it looks like this:

inventory      = hosts.yml

Setting Up the Remote Machines

At this point you should comment out the ansible_user and ansible_ssh_private_key_file lines in your inventory, since we don’t currently have a ci user on any of our machines and we haven’t added the key yet. This we will take care of now – via Ansible itself. I’ve created a playbook which will create that user and set it up for use with our Ansible setup:

---
- hosts: all
  tasks:
    - name: Create CI user
      become: true
      user:
        name: ci
        comment: CI User
        groups: sudo
        append: true
        password: "{{ password_hash }}"

    - name: Create .ssh directory
      become: true
      file:
        path: /home/ci/.ssh
        state: directory
        owner: ci
        group: ci
        mode: 0700

    - name: Copy authorised keys
      become: true
      copy:
        src: ci_authorized_keys
        dest: /home/ci/.ssh/authorized_keys
        owner: ci
        group: ci
        mode: 0600

Basically all we do here is create the user (with the password from the inventory file) and set it up for access via SSH with a key. I’ve put this in the playbooks subdirectory in the name of keeping things organised. You’ll also need a playbooks/files directory in which you should put the ci_authorized_keys file. This file will be copied to .ssh/authorized_keys on the server, so obviously has that format. In order to create your key generate it in the normal way with ssh-keygen and save it locally. Copy the public part into ci_authorized_keys and keep hold of the private part for later (don’t commit it to git though!).

Now we should run that against our servers with the command:

ansible-playbook --ask-vault-pass playbooks/setup.yml

This will prompt for your vault password from earlier and then configure each of your servers with the playbook.

Spinning Up the CI

At this point we have our ci user created and configured, so we should uncomment those lines in our inventory file. We can now perform a connectivity check to check that this worked:

ansible all -m ping

If that works you should see success from each host.

Next comes our base CI setup. I’ve imported some of my standard preflight jobs from my previous CI pipelines, specifically the shellcheck, yamllint and markdownlint jobs. The next stage in the pipeline is a check stage. Here I’m putting jobs which check that Ansible itself is working and that the configuration is valid, but don’t make any changes to the remote systems.

I started off with a generic template for Ansible jobs:

.ansible: &ansible
  image:
    name: python:3.7
    entrypoint: [""]
  before_script:
    - pip install ansible
    - ansible --version
    - echo $ANSIBLE_VAULT_PASSWORD > vault.key
    - echo "$DEPLOYMENT_SSH_KEY" > id_rsa
    - chmod 600 id_rsa
  after_script:
    - rm vault.key id_rsa
  variables:
    ANSIBLE_CONFIG: $CI_PROJECT_DIR/ansible.cfg
    ANSIBLE_VAULT_PASSWORD_FILE: ./vault.key
  tags:
    - ansible

This sets up the CI environment for Ansible to run. We start from a base Python 3.7 image and install Ansible via Pip. This takes a little bit of time doing it on each run so it would probably be better to build a custom image which includes Ansible (all the ones I found were out of date).

The next stage is to set up the authentication tokens. First we write the ANSIBLE_VAULT_PASSWORDvariable to a file, followed by the DEPLOYMENT_SSH_KEY variable. These variables are defined in the Gitlab CI configuration as secrets. The id_rsa file requires its permissions set to 0600 for the SSH connection to succeed. We also make sure to remove both these files after the job completes.

The last thing to set are a couple of environment variables to allow Ansible to pick up our config file (by default it won’t in the CI environment due to a permissions issue). We also need to tell it the vault key file to use.

Check Jobs

I’ve implemented two check jobs in the pipeline. The first of these performs the ping action which we tested before. This is to ensure that we have connectivity to each of our remote machines from the CI runner. The second iterates over each of the YAML files in the playbooks directory and runs them in check mode. This is basically a dry-run. I’d prefer if this was just a syntax/verification check without having to basically run through the whole thing, but there doesn’t seem to be a way to do that in Ansible.

The jobs for both of these are shown below:

ping-hosts:
  <<: *ansible
  stage: check
  script:
    - ansible all -m ping

check-playbooks:
  <<: *ansible
  stage: check
  script:
    - |
      for file in $(find ./playbooks -maxdepth 1 -iname "*.yml"); do
        ansible-playbook $file --check
      done

Before these will run on the CI machine, we need to make a quick modification to our ansible.cfg file. This will allow Ansible to accept the SSH host keys without prompting. Basically you just uncomment the host_key_checking line and ensure it is set to false:

host_key_checking = False

Doing Something Useful

At this stage we have our Ansible environment set up in CI and our remote machines are ready to accept instructions. We’ve also performed some verification steps to give us some confidence that we are doing something sensible. Sounds like we’re ready to make this do something useful!

Managing package updates has always been a pain for me. Once you get beyond a couple of machines, manually logging in and applying updates becomes pretty unmanageable. I’ve traditionally taken care of this on a Saturday morning, sitting down and applying updates to each machine whilst having my morning coffee. The main issue with this is just keeping track of which machines I already updated. Luckily there is a better way!

[[Side note: Yes, before anyone asks, I am aware of the unattended-upgrades package and I have it installed. However, I only use the default configuration of applying security updates automatically. This is with good reason. I want at least some manual intervention in performing other updates, so that if something critical goes wrong, I’m on hand to fix it.]]

With our shiny new Ansible setup encased in a layer of CI, we can quite easily take care of applying package upgrades to a whole fleet of machines. Here’s the playbook to do this (playbooks/upgrades.yml):

---
- hosts: all
  serial: 2
  tasks:
    - name: Perform apt upgrades
      become: true
      when: ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'
      apt:
        update_cache: true
        upgrade: dist
      notify:
        - Reboot

    - name: Perform yum upgrades
      become: true
      when: ansible_distribution == 'CentOS'
      yum:
        name: '*'
        state: latest
      notify:
        - Reboot

  handlers:
    - name: Reboot
      when: "'physical' not in group_names"
      become: true
      reboot:

This is pretty simple, first we apply it to all hosts. Secondly we specify the line serial: 2 to run only on two hosts at a time (to even out the load on my VM hosts). Then we get into the tasks. These basically run an upgrade selectively based upon the OS in question (I still have a couple of CentOS machines knocking around). Each of the update tasks will perform the notify block if anything changes (i.e. any packages got updated). In this case all this does is execute the Reboot handler, which will reboot the machine. The when clause of that handler causes it not to execute if the machine is in the physical group (so that my host machines are not updated). I still need to handle not rebooting the CI runner host, but so far I haven’t added it to this system.

We could take this further, for example snapshotting any VMs before doing the update, if our VM host supports that. For now this gets the immediate job done and is pretty simple.

I’ve added a CI job for this. The key thing to note about this is that it is a manual job, meaning it must be directly triggered from the Gitlab UI. This gives me the manual step I mentioned earlier:

package-upgrades:
  <<: *ansible
  stage: deploy
  script:
    - ansible-playbook playbooks/upgrades.yml
  only:
    refs:
      - master
  when: manual

Now all I have to do on a Saturday morning is click a button and wait for everything to be updated!

ansible gitlab ci
Our finished pipeline. Note the state of the final “package-upgrades” job which indicates a manual job. Gitlab helpfully provides a play button to run it.

Conclusion

I hope you’ve managed to follow along this far. We’ve certainly come a long way, from nothing to an end to end automated system to update all our servers (with a little manual step thrown in for safety).

Of course there is a lot more we can do now that we’ve got the groundwork set up. In the next instalment I’m going to cover installing and configuring software that you want to be deployed across a fleet of machines. In the process we’ll take a look at Ansible roles. I’d also like to have a look at testing my Ansible roles with Molecule, but that will probably have to wait for yet another post.

I’m interested to hear about anyone else’s Ansible setup, feel free to share via the feedback channels!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.