restic rsync backups

Centralised Backups With Restic and Rsync

This post may contain affiliate links. Please see the disclaimer for more information.

In my recent post on synchronising ZFS snapshots from remote servers, I mentioned that I had being using rsync for the same purpose. This is part of my larger overall backup strategy with restic. It was brought to my attention recently that I hadn’t actually written up my backup approach. This post will rectify that!

The key requirement of my system was to have something that would work across multiple systems, without being too difficult to maintain. I also wanted it to scale to new systems easily as my self-hosting infrastructure inevitably continues to grow. Of course, I had the usual requirements of local and off site backups, with the off site copy suitably secured. Restic fits the bill quite nicely for secure local and remote backups, but has no way to synchronise multiple systems unless you set it up on each system individually.

Backup Architecture

I’ve architected my backups as a centralised system, where the relevant data from each satellite system is propagated to a central server and then backed up to various end points from there. This architecture was chosen because it was reasonably easy for me to setup and maintain and actually results in more copies of the data since it has to be copied to the backup server first.

restic rsync backups
They say a picture is worth a thousand words…

As you can see from the diagram the synchronisation from the remote systems to the backup server is done via rsync. This is done in a pull fashion. The backup server connects to each machine in turn and pulls down the files to be backed up to it’s local cache.

The second stage is a backup using restic to both a locally connected external hard drive and to the cloud (in this case Backblaze B2). I’ll cover each of these steps in the following sections.

Synchronising Remote Machines with Rsync

The first step is to synchronise the relevant files on the remote machines via rsync. When I say remote machines here, I specifically mean machines which are not the central backup server. These could be remote cloud machines, hosts on the local network or VMs hosted on the same machine. In my case it’s all three, since I run the backups on my main home server.

For each machine I want to synchronise, I have a script looking like this:

#!/bin/bash

HOST=<REMOTE HOSTNAME>
PORT=22
USER=backup
SSH_KEY="/storage/data/backup/keys/backup_key"

BASE_DEST=/storage/data/backup/$HOST

LOG_DIR=/storage/data/backup/logs
LOG_FILE=$LOG_DIR/rsync-$HOST.log

function do_rsync() {
    echo "Starting rsync job for $HOST:$1 at $(date '+%Y-%m-%d %H:%M:%S')..." >> $LOG_FILE 2>&1
    echo >> $LOG_FILE 2>&1

    mkdir -p $BASE_DEST$1 >> $LOG_FILE 2>&1
    /usr/bin/rsync -avP --delete -e "ssh -p $PORT -i $SSH_KEY" $USER@$HOST:$1 $BASE_DEST$1 >> $LOG_FILE 2>&1

    echo >> $LOG_FILE 2>&1
    echo "Job finished at $(date '+%Y-%m-%d %H:%M:%S')." >> $LOG_FILE 2>&1
}

mkdir -p $LOG_DIR

do_rsync <DIRECTORY 1>
do_rsync <DIRECTORY 2>
...

Here we start with some basic configuration, including the hostname, port, user and SSH key to use to connect to the remote host. I then configure the local destination directory, which is located on my main ZFS mirror. I also configure where the logs will be stored.

We then get into the main function of the script, called do_rsync. This sets up the logging environment and does the actual rsync transfer with the options we’ve specified. It takes as an argument the remote directory to backup (which obviously must be readable by the user in question).

We then close out the script by ensuring the log directory exists and then calling the do_rsync function for the directories we are interested in. Looking at the backup scripts now it would actually be good to factor out the common functionality here into a helper script. This could then be sourced by all of the host specific scripts. I also need to move this into git which will happen with my continued migration to Ansible.

A Note About Security

Obviously, with the rsync client logging in to the remote system automatically via SSH it’s good to restrict what this can do. To this end, the SSH key is locked down so that the only command that can be run is that executed by the rsync client. This is done via the ~/.ssh/authorized_keys file:

command="/usr/bin/rsync --server --sender -vlogDtpre.is . ${SSH_ORIGINAL_COMMAND//* \//\/}",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding ssh-rsa .....

These backup scripts are run from cron and spread throughout the day so as not to overlap with each other, in an effort to even out the traffic on the network. I’m not really happy with this part of the solution, it might just be better to run the whole lot in sequence.

Local Restic Backups

The next step in the process is to run the backup to the locally connected external drive with restic. This backup is run over all the previously synced data as well as data from the local machine, such as the contents of my Nextcloud server and media collection.

This backup is achieved with the following script:

#!/bin/bash
set -e

######## START CONFIG ########
DISK_UUID="<DISK UUID>"
#GLOBAL_FLAGS="-q"
MOUNT_POINT=/mnt/backup
export RESTIC_REPOSITORY=$MOUNT_POINT/restic/storage
export RESTIC_PASSWORD_FILE=/root/restic-local.pass
######## END CONFIG ########

echo "Starting backup process at $(date '+%Y-%m-%d %H:%M:%S')."

# check for the backup disk and mount it
if [ ! -e /dev/disk/by-uuid/$DISK_UUID ]; then
    echo "Backup disk not found!" >&2
    exit 1
fi
echo "Mounting backup disk..."
mount -t ext4 /dev/disk/by-uuid/$DISK_UUID $MOUNT_POINT

# pre-backup check
echo "Performing pre-backup check..."
restic $GLOBAL_FLAGS check

# perform backups
echo "Performing backups..."
restic $GLOBAL_FLAGS backup /storage/data/nextcloud
restic $GLOBAL_FLAGS backup /storage/data/backup
restic $GLOBAL_FLAGS backup /storage/music
restic $GLOBAL_FLAGS backup /storage/media
# add any other directories here...

# post-backup check
echo "Performing post backup check..."
restic $GLOBAL_FLAGS check

# clean up old snapshots
echo "Cleaning up old snapshots..."
restic $GLOBAL_FLAGS forget -d 7 -w 4 -m 6 -y 2 --prune

# final check
echo "Performing final backup check..."
restic $GLOBAL_FLAGS check

# unmount backup disk
echo "Unmounting backup disk..."
umount $MOUNT_POINT

echo "Backups completed at $(date '+%Y-%m-%d %H:%M:%S')."
exit 0

This script is pretty simple, despite the wall of commands. First we have some configuration in which I specify the UUID of the external disk and the mount point of the disk. This is done because the disk is kept unmounted when not in use. The path to the restic repository, relative to the mount point and the path to the password file are also specified.

We then move into checking for and mounting the external disk. The first restic command performs a check on the repository to make sure all is well, before getting into the backups for the directories we are interested in. This is followed by another check to make sure that went OK.

I then run a restic forget command to prune old snapshots. Currently I’m keeping the last 7 days of backups, 4 weekly backups, 6 monthly backups and 2 yearly backups! I run a final restic check before unmounting the external disk.

This script is run once a day from cron. I use the following command to reduce the priority of the backup script and avoid interfering with the normal operation of the server:

/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 /storage/data/backup/restic-local.sh >> /storage/data/backup/logs/restic-local.log

Remote Restic Backups

The final stage in this process is a separate backup to the cloud. As mentioned before I use Backblaze’s B2 service for this since it seems to be about the cheapest around. I’ve been reasonably happy with it so far at least.

#!/bin/bash
set -e

######## START CONFIG ########
B2_CONFIG=/root/b2_creds.sh
#GLOBAL_FLAGS="-q"
export RESTIC_REPOSITORY="<MY B2 REPO>"
export RESTIC_PASSWORD_FILE=/root/restic-remote-b2.pass
######## END CONFIG ########

echo "Starting backup process at $(date '+%Y-%m-%d %H:%M:%S')."

# load b2 config
source $B2_CONFIG

# pre-backup check
echo "Performing pre-backup check..."
restic $GLOBAL_FLAGS check

# perform backups
echo "Performing backups..."
restic $GLOBAL_FLAGS backup /storage/data/nextcloud
restic $GLOBAL_FLAGS backup /storage/data/backup
restic $GLOBAL_FLAGS backup /storage/music
# This costs to much to backup, but it's not the end of the world
# if I lose a load of DVD rips!
#restic $GLOBAL_FLAGS backup /storage/media

# post-backup check
echo "Performing post backup check..."
restic $GLOBAL_FLAGS check

# clean up old snapshots
echo "Cleaning up old snapshots..."
restic $GLOBAL_FLAGS forget -w 8 -m 12 -y 2 --prune

# final check
echo "Performing final backup check..."
restic $GLOBAL_FLAGS check

echo "Backups completed at $(date '+%Y-%m-%d %H:%M:%S')."
exit 0

This looks very similar to the previous script, but differs in the configuration. First I specify the file where I keep my B2 credentials, to be sourced later. This file is of the form:

export B2_ACCOUNT_ID="<MY ACCOUNT ID>"
export B2_ACCOUNT_KEY="<MY ACCOUNT KEY>"

I then set the RESTIC_REPOSITORY and RESTIC_PASSWORD_FILE variables as before. In this case the repository is of the form b2:bucketname:path/to/repo.

The snapshot retention policy here is different, with 8 weekly backups, 12 monthly backups and 2 yearly backups retained. This is mostly because I only run this backup once per week – the backup script will actually take more than 24 hours to run with all the checking and forgetting thrown in! The script is run from cron with the same nice/ionice combination as the local backup.

Conclusion

With all that in place, I have a pretty comprehensive backup system. The system stores at least 3 copies of any data (live, local backup, remote backup) and in the case of remote systems 4 copies (live, backup server cache, local backup, remote backup). The main issue I have with this setup currently is the use of the local external disk, which I don’t like being connected to the same server. Hopefully I’ll be moving this to another machine in my next round of server upgrades.

I also don’t really like the reliance on the cloud, even though I’ve got no complaints about the B2 service. My ideal system would probably be an SBC based system located at the home of someone with a fast, non-data capped internet connection. Ideally this person would also live on a different continent! I could then run a Minio server in place of the B2 service. This would probably end up cheaper in the long run, since I’m paying nearly $10/month for the current service.

One last piece of sage advice: TEST YOUR BACKUPS! They are worth nothing if you don’t know they are working. I’ve done a couple of test restores with this system, but I’m probably due for another one.

What’s your backup routine like? Got improvements for my system? Feel free to share them in the comments.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

ansible roles

Automating My Infrastructure with Ansible and Gitlab CI: Part 3 – SSH Keys and Dotfiles

This post may contain affiliate links. Please see the disclaimer for more information.

This is the third post in a series, you can find the first two installments here and here.

Having recently reinstalled on both of my client machines, I took the opportunity to rotate my SSH keys. Luckily I backed up the old keys before doing this, so I didn’t lock myself out of anything. However, it did leave me having to update the authorized_keys files on all my servers (about 15 at last count). Of course there is a better way than doing this all manually, so cue some Ansible automation!

While I was at it I decided it would be nice to deploy my dotfiles across all my machines. I’ve had them stored in a git repo for some time and manage them with GNU Stow. However, I would never get around to deploying the repo onto new machines and installing all the relevant tools. Writing the Ansible automation to do this was pretty tricky, but I got there in the end. I also added my client machines to my Ansible inventory so that they get the same setup deployed to them.

Getting Started

For those who haven’t read the previous installments in this series, all the code for this article is going in my main ansible-infrastructure repo on GitLab.

I started out by installing some base packages that I would need for the rest of the steps. This is complicated slightly by having different distributions on different machines. The servers are all running Ubuntu or Debian (usually in the form of Raspbian or Proxmox), whilst the clients are running Manjaro (i.e. Archlinux). This is easily dealt with in Ansible by way of a set of checks against ansible_distribution:

- hosts: all
  tasks:
    - name: Install common apt packages
      become: true
      when: ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'
      apt:
        pkg:
          - vim
          - git
          - tmux
          - htop
          - dnsutils
          - ack-grep
          - stow
          - zsh
          - build-essential
          - python-dev
          - python3-dev
          - cmake
          - curl

    - name: Install common pacman packages
      become: true
      when: ansible_distribution == 'Archlinux'
      pacman:
        name:
          - vim
          - git
          - tmux
          - htop
          - bind-tools
          - ack
          - stow
          - zsh
          - cmake
          - clang
          - curl

You’ll see here that some of the packages are named differently in the different distros. We also need to use different package management modules for each.

It should be noted that I didn’t start out with this full list, but instead just added a few basics (e.g. git, vim, etc) and added more as I encountered the need for them. You’ll also notice that I take the opportunity to install any nice utilities that I like to have everywhere, such as htop and dig (provided by dnsutils/bind-tools).

Deploying SSH Keys

Deploying the SSH keys turned out to be fairly trivial, despite being the main task that I wanted to accomplish here. This is thanks to the excellent authorized_key module in Ansible:

- name: Set up authorized keys
  become: true
  authorized_key:
    user: '{{ admin_user }}'
    state: present
    key: '{{ item }}'
  with_file:
    - public_keys/aragorn.pub
    - public_keys/arathorn.pub
    - public_keys/work.pub
    - public_keys/phone.pub

Here I add a set of four keys from the repository, by way of the with_file clause in Ansible. I copied all the public keys into the playbooks/files/public_keys directory for ease of access. This also makes it easy to rotate keys as we’ll see below.

I set the user to add the keys to a custom variable called admin_user. This variable is set to a default value at the top of my inventory file and then overriden it for certain hosts or groups. For example I use the standard pi user on my Raspberry Pis, so the variable is set to pi for the rpis group. This ensures that the keys always get installed for the right user.

I also wanted to remove the old keys from my machines, which is pretty straightforward:

- name: Remove old authorized keys
  become: true
  authorized_key:
    user: '{{ admin_user }}'
    state: absent
    key: '{{ item }}'
  with_file:
    - public_keys/riker.pub.deprecated

Now if I want to rotate keys in future, I’ll just add the new key to the repository, rename the old key to remind myself that it’s no longer in use and update the file lists in these tags. Done!

A Minor Detour

I haven’t mentioned so far this article that all of this is running through the Gitlab CI pipeline I built in my original post. In fact this is most likely going to be a GitLab CI article, without much GitLab CI stuff. That’s because the previous pipeline has been working brilliantly.

However, one issue has been the speed. Making changes, committing, pushing and waiting for the pipeline to complete takes quite a long time. It was pretty frustrating given the number of iterations I needed to get this right!

I noticed that it took quite a while each time to install Ansible and Ansible Lint in the containers and that this was done twice for each pipeline. Given my recent success with custom Docker images, I built a quick image containing the tools I needed (with a 3 line Dockerfile!). I was able to quickly copy over the previous docker build pipeline and get this building via CI. All I then had to do was update the images used in my main pipeline and remove the old installation commands. Boom, much faster!

You can check out my Ansible image on GitLab and pull it with the command:

$ docker pull registry.gitlab.com/robconnolly/docker-ansible

I haven’t set up a periodic build for this yet, but I’m intending that this image will be automatically updated on a weekly basis.

Deploying My Dotfiles

My dotfiles are deployed from a private repository on my internal Gitea instance. So far I haven’t published them as they contain quite a few unredacted details of my network. In order to deploy them I generated a new SSH key and added it as a deploy key to the project in Gitea.

ansible dotfiles
Deploy keys in Gitea are added in the project Settings->Deploy Keys

I then encrypted the private key with Ansible vault (I added the public key to the repo too, in case I need it again in future):

$ ansible-vault encrypt playbooks/files/dotfiles_deploy_key

I then copy the private key to each of the machines which need it:

- hosts: all,!cloud,!rpis
  serial: 2
  tasks:
    - name: Copy deploy key
      become: true
      become_user: "{{ admin_user }}"
      copy:
        src: dotfiles_deploy_key
        dest: "/home/{{ admin_user }}/.dotfiles_deploy_key"
        owner: "{{ admin_user }}"
        group: "{{ admin_user }}"
        mode: 0600

You’ll note that the above is in a new play to the previous steps. That’s because I wanted to restrict which machines get my dotfiles. The cloud machined currently can’t access my Gitea instance, since I still need to deploy my OpenVPN setup to some of them. The Raspberry Pis have trouble with some of the later steps in the setup, so I’ve skipped them too for now. I’m also running this two hosts at a time, because of the compilation step (see below).

The next step is to simply clone the repository with the Ansible git module:

- name: Clone dotfiles repo
  become: true
  become_user: "{{ admin_user }}"
  git:
    repo: "{{ dotfiles_repo }}"
    dest: "/home/{{ admin_user }}/dotfiles"
    version: master
    accept_hostkey: true
    ssh_opts: "-i /home/{{ admin_user }}/.dotfiles_deploy_key"

The dotfiles_repo variable host the URL to clone the repository from and is again defined in my encrypted inventory file. I use the ssh_opts clause to set the key for git to use.

You’ll note that the tasks above all use become_user to switch to the admin_user. In order to get this to work on some of my hosts I had to set allow_world_readable_tmpfiles to true. This has some security implications, so you might want to tread carefully, if you have potentially untrustworthy users on your systems. It seemed to work without this set on Ubuntu based systems, but those with a pure Debian base had issues.

Running Stow

The next step was to unstow a few of the newly deployed files. For this we can use the command module and with_items:

- name: Unstow dotfiles
  become: true
  become_user: "{{ admin_user }}"
  command:
    chdir: "/home/{{ admin_user }}/dotfiles"
    cmd: "stow {{ item.name }}"
    creates: "{{ item.file }}"
  with_items:
    - { name: ssh, file: "/home/{{ admin_user }}/.ssh/config"}
    - { name: vim, file: "/home/{{ admin_user }}/.vimrc"}
    - { name: tmux, file: "/home/{{ admin_user }}/.tmux.conf"}

In the items list here I pass a dictionary of names and filenames. This is so that Ansible can use one of the files which should be created to know if it needs to run the command. These are accessed with the item.variable notation in the templates. I really like the templating in Ansible!

Dealing With Vim Plugins

My Vim config contains a load of plugins, which I manage with Vundle. Installing these should be pretty simple, but it confused me for ages because the command always seemed to fail (with no output!), even when I could see it working on the command line. As it turns out, the command will exit with a return code of one even when it is successful! You can see why I was confused! In the end I came up with:

- name: Install vundle
  become: true
  become_user: "{{ admin_user }}"
  git:
    repo: https://github.com/VundleVim/Vundle.vim.git
    dest: "/home/{{ admin_user }}/.vim/bundle/Vundle.vim"
    version: master

- name: Install vim plugins
  become: true
  become_user: "{{ admin_user }}"
  shell:
    cmd: 'vim -E -s -c "source /home/{{ admin_user }}/.vimrc" -c PluginInstall -c qa || touch /home/{{ admin_user }}/.vim/plugins_installed'
    creates: "/home/{{ admin_user }}/.vim/plugins_installed"

I ended up using the shell module from Ansible to create a file when the installation completes. This file is used as the check in Ansible for whether it should run the command again. The || operator has to be used here (rather than &&) due to the weird return code. This does however have the effect of changing the overall return code to zero, which makes Ansible happy.

The final step here is compiling the YouCompleteMe plugin, which is just running another command:

- name: Build ycm_core
  become: true
  become_user: "{{ admin_user }}"
  command:
    cmd: "./install.py --clang-completer {% if ansible_distribution == 'Archlinux' %}--system-libclang{% endif %}"
    chdir: "/home/{{ admin_user }}/.vim/bundle/YouCompleteMe"
    creates: "/home/{{ admin_user }}/.vim/bundle/YouCompleteMe/third_party/ycmd/ycm_core.so"
  environment:
    YCM_CORES: 1

You’ll see above that the command is different on Arch based systems, since I use the system libclang there to work around a compile issue. I also define the YCM_CORES environment variable. This limits the number of cores to one, which seems to stop the build running out of memory on small virtual machines!

Deploying Oh-My-Zsh

The final piece to this increasingly complex puzzle is installing Oh-My-Zsh to give me a nice Zsh environment. This is again accomplished with the shell module:

- name: Install oh-my-zsh
  become: true
  become_user: "{{ admin_user }}"
  shell:
    cmd: sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended
    creates: "/home/{{ admin_user }}/.oh-my-zsh"
  register: ohmyzsh

You’ll see here that I register a variable containing the status of this task. This is used in the next step to delete the default zshrc that the installer will create for us:

- name: Remove default zshrc
  become: true
  become_user: "{{ admin_user }}"
  file:
    name: "/home/{{ admin_user }}/.zshrc"
    state: absent
  when: ohmyzsh.changed

I then unstow my Zsh config as before:

- name: Unstow zsh config
  become: true
  become_user: "{{ admin_user }}"
  command:
    chdir: "/home/{{ admin_user }}/dotfiles"
    cmd: "stow zsh"
    creates: "/home/{{ admin_user }}/.zshrc"

This process is probably ripe for simplification, since I assume the installer wouldn’t overwrite an existing zshrc. If I unstowed the Zsh config earlier I could probably remove the file deletion, but I haven’t tried this to see if it works.

The absolute last step, is to switch the default shell for admin_user over to Zsh:

- name: Change shell to zsh
  become: true
  user:
    name: "{{ admin_user }}"
    shell: /usr/bin/zsh

Done!

Conclusion

Phew! That comes out as a pretty epic playbook. I’ve opted to keep this all in my common playbook for now since it’s getting run against every machine, along with my previous roles. I may split it up later however, if it becomes useful to do so.

The playbook works really well now and it’s nice to have the same environment on every machine. I also really like the centralised SSH key management, which solves a real issue for me.

One improvement I would like to make would be around the syncing of changes to the dotfiles repository to all the machines. This could be as simple as deploying a cron job to git pull that repository periodically, but I’d rather have it react to changes in to repo. I could move the repository to GitLab and run a pipeline which would deploy it, but this would mean duplicating my Ansible inventory (and keeping two copies up to date). I’m wondering if a webhook could be used to trigger the main CI pipeline?

I’m interested to know if anyone out there has solved similar problems in a different way. Please let me know in the comments!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

x2go manjaro

Remote Workstation with x2go and Manjaro Linux

This post may contain affiliate links. Please see the disclaimer for more information.

I’ve been running a very under-powered and increasingly ancient laptop, a ThinkPad X131e, for several years. It’s been upgraded over time with an SSD and a replacement battery. For my requirements this is mostly OK, since my workload mostly consists of a web browser and terminal windows and it runs my preferred desktop (KDE Plasma Desktop) just fine. Since my server performs the majority of my computing I haven’t really been limited by this. However, every now and then I need to run a couple of apps which just bring it to a halt as it thrashes around trying to swap. Clearly 4GB of RAM is not enough to run modern applications.

I actually do have another desktop computer, but my use of it recently has been limited to the odd DVD rip. This was mainly due to having to switch the mouse, keyboard and monitor over from my work machine in order to use it in the limited space which is my home office. I could have probably remedied this with a KVM switch, but somehow I never got around to it. I was also not enthused about spending my evenings sitting in the same office I’ve just been working in all day. So that computer sat gathering dust for the most part.

That was until I came across a post about using x2go to create a remote workstation after musing whether this would be possible. I decided to do the same, since I had the hardware already available to make it work.

A (Semi) Failed Experiment

Initially I did a few tests with local VMs to see if x2go was going to work with KDE, since I’d seen mixed reports about this. The good news is that it works pretty well (at least for basic remote desktop, I’ll come to some of the problems below). The bad news is that my preferred desktop distro – KDE Neon – didn’t work well. First of all I couldn’t install the client on my laptop due to a dependency issue in APT. Secondly, although the desktop worked fine I was unable to suspend the session due to some Systemd/D-BUS issue. So I tried another distro. I’d heard good things about Manjaro and the KDE edition works great with x2go (both client and server). It also has a really nice default theme!

I also wanted the remote desktop to run as a VM on the machine, under Proxmox, so that I could potentially switch distros easily or create extra VMs for other purposes. I spent quite a bit of time configuring this only to find a few issues. The first was that I couldn’t get the host machine to pass through the internal DVD drive to the VM which was a deal breaker. The second was that suspending the VM and shutting down the host was pretty clunky and prone to just hang for no reason. As I added this machine in a cluster with my other host, the cluster would also lose quorum when the host went down. This causes lots of things to fail, including VM backups on the remaining host.

Back to Bare Metal

I decided to abandon the VM approach for now and go with a bare metal install to see if I can work with the remote desktop system. It wasn’t a complete loss, since I got a chance to try out the newly updated clustering in Proxmox, which will be relevant when I convert my existing Ubuntu server over.

The bare metal install of Manjaro was pretty boring (which is a good thing, installing a Linux distro should be boring and stable!). One thing I noticed was that I wasn’t able to manually set up LVM from the GUI installer. I could create volume groups, but it wouldn’t let me add partitions to them! As far as I understand Manjaro Architect lets you do this. I played with this a bit later when installing Manjaro on my Laptop, but opted to go with the default encrypted system option from the GUI installer. If I ever come to reinstall on the desktop machine, I’ll look at Manjaro Architect further.

Saving Power

Since this machine won’t be in use all the time, I wanted to shut it down and power it up remotely. I also wanted my x2go session to be persistent so I could pick up where I left off. For this reason I opted to use hibernate coupled with wake-on-LAN.

Configuring the wake-on-LAN took a little while. Even after enabling it in the BIOS and configuring network manager, it still didn’t work. It turned out that it was being disabled by TLP. After fixing that it worked fine.

The next problem was that I wanted to power the system up and down via a switch in Home Assistant. This was difficult as the Home Automation system is on a different subnet to the machine in question. I opted in the end to run the WOL command from my pfSense firewall over SSH. The following HASS configuration gave me the switch I was looking for:

shell_command:
  desktop_power_on: "ssh -i /config/id_rsa -o 'StrictHostKeyChecking=no' home-assistant@<my_firewall> -- '-i <subnet broadcast address> <mac address>'"
  desktop_power_off: "ssh -i /config/id_rsa -o 'StrictHostKeyChecking=no' <user>@<my_desktop>"

binary_sensor:
  - platform: ping
    name: "Desktop Computer State"
    host: <desktop ip>
    count: 1
    scan_interval: 5

switch:
  - platform: template
    switches:
      desktop_computer_power:
        value_template: "{{ is_state('binary_sensor.desktop_computer_state', 'on') }}"
        turn_on:
          service: shell_command.desktop_power_on
        turn_off:
          service: shell_command.desktop_power_off

This requires a bit of setup on both the firewall and the desktop machines. First I created a user in pfSense with the relevant permissions to log in via SSH. I then added the following to the authorised SSH keys field:

command="wol $SSH_ORIGINAL_COMMAND",no-port-forwarding,no-x11-forwarding,no-agent-forwarding ssh-rsa .....

This basically allows the SSH key to only run the wol command and to pass through the original command as arguments to it. You’ll note that in the power on command above, only the arguments are specified. This means that I can send WOL packets to any machine on the network, but the key can’t do anything else.

There is a similar bit of configuration on the desktop machine:

command="systemctl hibernate",no-port-forwarding,no-x11-forwarding,no-agent-forwarding ssh-rsa ....

This restricts the key to only running the systemctl hibernate command.

With that in place I have a nice switch in my HASS GUI to power up and down the machine. I can also automate it to power up and down under certain conditions if I wish.

x2go manjaro
The resulting power switch in HASS

Setting Up x2go

Since Manjaro is based on Arch Linux, I just installed the x2goserver package on my desktop and the x2goclient package on my laptop.

x2go is fairly trivial to set up, requiring only the enabling of X11 forwarding in the SSH daemon on the server side, just follow the instructions to do this.

x2go manjaro
My x2go Session Preferences

In order to connect to a KDE desktop running on the server, we need to set up a profile in the x2go client. The main thing here is to set the session type to “Custom desktop” with the command startplasma-x11 (the KDE session type doesn’t work for some reason). Obviously you also need to set the address of the machine to log in to. I also found that I needed to set the path to my SSH key in order to have it be used by the client.

The Finished Product

The final product is pretty nice. I can remotely boot my desktop machine from my laptop or phone. I then connect with x2go from my laptop and get to work! My previous session will be restored if there is one, meaning I can just pick up where I left off. The connection to the remote system is excellent, with no noticeable delay. So far I’ve only tested this over my local wifi, but that will be 97% of it’s usage anyway. There is some tearing when moving windows and some graphical issues when the x2go session window gets resized, but these resolve themselves after a few seconds.

x2go manjaro
My full remote desktop in all it’s glory!

One wrinkle is shared folders, which are supposed to be supported by x2go, but currently seem to be broken. I get an error message, similar to that described here when I try to mount one. Apparently that bug is fixed, but I guess the version containing the fix hasn’t been released yet. If the new version doesn’t land soon I’ll probably try and work around it with SSHFS+some scripting. For now it’s not too much of an issue, all my files are now on the desktop anyway!

I also don’t see any sound devices on the remote system, but I haven’t tried playing sound to see if it’s working. I tend not to listen to music or watch movies on my laptop anyway and I can still do this locally if need be.

Conclusion

Overall, I’m pretty happy with this setup. It’s not perfect, but it is nice to have easier access to my other machine. I still have a lot of setup to do at both ends to make this work better and to make the remote machine feel like home, but it’s getting there.

I’m not sure if I’ll run this setup full time yet or just for certain applications. I’ve decided to upgrade the RAM in both systems as well as my server, so we’ll see how the balance of local vs remote falls out after doing that! I’m also eyeing a Pinebook Pro as a potential replacement laptop.

I really like Manjaro as a distro. It’s not quite as polished a KDE experience as KDE Neon is, but it’s getting there. I like the Arch base – I’ve been a fan of Arch since my very first experience with it (CAUTION: very old post!), but I just couldn’t get on with it as a main driver. Manjaro gives you a nice experience out of the box with the power to tweak as much as you like.

I’ll be sure to update on any further progress with fixing some of the issues I’ve encountered with this project. In the meantime, if anyone has solutions/workarounds to the shared folder issue, please leave a comment. Bye for now!

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

Integrating Remote Servers Into My Local Network

This post may contain affiliate links. Please see the disclaimer for more information.

I’ve been using Linode for many years to host what I consider to be my most “production grade” self-hosted services, namely this blog and my mail server. My initial Linode server was built in 2011 on CentOS 6. This is approaching end of life so, I’ve been starting to build its replacement. Since originally building this server my home network has grown up and now provides a myriad of services. When starting out to build the new server, I thought it would be nice to be able to make use of these more easily from my remote servers. So I’ve begun some work to integrate the two networks more closely.

Integration Points

There are a few integration points I’m targeting here, some of which I’ve done already and others are still to be done:

  • Get everything onto the same network, or at least on different subnets of my main network so I can control traffic between networks via my pfSense firewall. This gives me the major benefit of being able to access selected services on my local network from the cloud without having to make that service externally accessible. In order to do this securely you want to make sure the connection is encrypted – i.e. you want a VPN. I’m using OpenVPN.
  • Use ZFS snapshots for backing up the data on the remote systems. I’d previously been using plain old rsync for copying the data down locally where it gets rolled into my main backups using restic. There is nothing wrong with this approach, but using ZFS snapshots gives more flexibility for restoring back to a certain point without having to extract the whole backup.
  • Integrate into my check_mk setup for monitoring. I’m already doing this and deploying the agent via Ansible/CI. However, now the agent connection will go via the VPN tunnel (it’s SSH anyway, so this doesn’t make a huge difference).
  • Deploy the configuration to everything with Ansible and Gitlab CI – I’m still working on this!
  • Build a centralised logging server and send all my logs to it. This will be a big win, but sits squarely in the to-do column. However, it will benefit from the presence of the VPN connection, since the syslog protocol isn’t really suitable for running over the big-bad Internet.

Setting Up OpenVPN

I’m setting this up with the server being my local pfSense firewall and the clients being the remote cloud machines. This is somewhat the reverse to what you’d expect, since the remote machines have static IPs. My local IP is dynamic, but DuckDNS does a great job of not making this a problem.

The server setup is simplified somewhat due to using pfSense with the OpenVPN Client Export package. I’m not going to run through the full server setup here – the official documentation does a much better job. One thing worth noting is that I set this up as the second OpenVPN server running on my pfSense box. This is important as I want these clients to be on a different IP range, so that I can firewall them well. The first VPN server runs my remote access VPN which has unrestricted access, just as if I were present on my LAN.

In order to create the second server, I just had to select a different UDP port and set the IP range I wanted in the wizard. It should also be noted that the VPN configuration is set up not to route any traffic through it by default. This prevents all the traffic from the remote server trying to go via my local network.

On the client side, I’m using the standard OpenVPN package from the Ubuntu repositories:

$ sudo apt install openvpn

After that you can extract the configuration zip file from the server and test with OpenVPN in your terminal:

$ unzip <your_config>.zip
$ cd <your_config>
$ sudo openvpn --config <your_config>.ovpn

After a few seconds you should see the client connect and should be able to ping the VPN address of the remote server from your local network.

Always On VPN Connection

To make this configuration persistent we first move the files into /etc/openvpn/client, renaming the config file to give it the .conf extension:

$ sudo mv <your_config>.key /etc/openvpn/client.key
$ sudo mv <your_config>.p12 /etc/openvpn/client.p12
$ sudo mv <your_config>.ovpn /etc/openvpn/client.conf

You’ll want to update the pkcs12 and tls-auth lines to point to the new .p12 and .key files. I used full paths here just to makes sure it would work later. I also added a route to my local network in the client config:

route 10.0.0.0 255.255.0.0

You should then be able to activate the OpenVPN client service via systemctl:

$ sudo systemctl start openvpn-client@client.service
$ sudo systemctl enable openvpn-client@client.service

If you check your system logs, you should see the connection come up again. It’ll now persist across reboots and should also reconnect if the connection goes down for any reason. So far it’s been 100% stable for me.

At this point I added a DNS entry on my pfSense box to allow me to access the remote machine via it’s hostname from my local network. This isn’t required, but it’s quite nice to have. The entry points to the VPN address of the machine, so all traffic will go via the tunnel.

Firewall Configuration

Since these servers have publicly available services running on them, I don’t want them to have unrestricted access to my local network. Therefore, I’m blocking all incoming traffic from the new VPN’s IP range in pfSense. I’ll then add specific exceptions for the services I want them to access. This is pretty much how you would set up a standard DMZ.

remote server integrate
The firewall rules for the OpenVPN interface, note the SSH rule to allow traffic for our ZFS snapshot sync later

To do this I added an alias for the IP range in question and then added a block rule on the OpenVPN firewall tab in pfSense. This is slightly different to the way my DMZ is set up, since I don’t want to block all traffic on the OpenVPN interface, just traffic from that specific IP range (to allow my remote access VPN to continue working!).

You’ll probably also want to configure the remote server to accept traffic from the VPN so that you can access any services on the server from your local network. Do this with whatever Linux firewall tool you usually use (I use ufw).

Storing Data on ZFS

And now for something completely different….

As discussed before, I was previously backing up the data on these servers with rsync. However, I was missing the snapshotting I get on my local systems. These local systems mount their data directories via NFS to my main home server, which then takes care of the snapshot duties. I didn’t want to use NFS over the VPN connection for performance reasons, so I opted for local snapshots and ZFS replication.

In order to mount a ZFS pool on our cloud VM we need a device to store our data on. I could add some block storage to my Linodes (and I may in future), but I can also use a loopback file in ZFS (and not have to pay for extra space). To do this I just created a 15G blank file and created the zpool on top of that:

$ sudo mkdir /zpool
$ sudo dd if=/dev/zero of=/zpool/storage bs=1G count=15
$ sudo apt install zfsutils-linux
$ sudo zpool -m /storage storage /zpool/storage

I can then go about creating my datasets (one for the mail storage and one for docker volumes):

sudo zfs create storage/mail
sudo zfs create storage/docker-data

Automating ZFS Snapshots

To automate my snapshots I’m using Sanoid. To install it (and it’s companion tool Syncoid) I did the following:

$ sudo apt install pv lzop mbuffer libconfig-inifiles-perl libconfig-inifiles-perl git
$ git clone https://github.com/jimsalterjrs/sanoid
$ sudo mv sanoid /opt/
$ sudo chown -R root:root /opt/sanoid
$ sudo ln /opt/sanoid/sanoid /usr/sbin/
$ sudo ln /opt/sanoid/syncoid /usr/sbin/

Basically all we do here is install a few dependencies and then download Sanoid and install it in /opt. I then hard link the sanoid and syncoid executables into /usr/sbin so that they are on the path. We then need to copy over the default configuration:

$ sudo mkdir /etc/sanoid
$ sudo cp /opt/sanoid/sanoid.conf /etc/sanoid/sanoid.conf
$ sudo cp /opt/sanoid/sanoid.defaults.conf /etc/sanoid/sanoid.defaults.conf

I then edited the sanoid.conf file for my setup. My full configuration is shown below:

[storage/mail]
        use_template=production

[storage/docker-data]
        use_template=production
        recursive=yes

#############################
# templates below this line #
#############################

[template_production]
        frequently = 0
        hourly = 36
        daily = 30
        monthly = 12
        yearly = 2
        autosnap = yes
        autoprune = yes

This is pretty self explanatory. Right now I’m keeping loads of snapshots, I’ll pare this down later if I start to run out of disk space. The storage/docker-data dataset has recursive snapshots enabled because I will most likely make each Docker volume its own dataset.

This is all capped off with a cron job in /etc/cron.d/zfs-snapshots:

*  *    * * *   root    TZ=UTC /usr/local/bin/log-output '/usr/sbin/sanoid --cron'

Since my rant a couple of weeks ago, I’ve been trying to assemble some better practices around cron jobs. The log-output script is one of these, from this excellent article.

Syncing the Snapshots Locally

The final part of the puzzle is using Sanoid’s companion tool Syncoid to sync these down to my local machine. This seems difficult to do in a secure way, due to the permissions that zfs receive needs. I tried to use delegated permissions, but it looks like the mount permission doesn’t work on Linux.

The best I could come up with was to add a new unprivileged user and allow it to only run the zfs command with sudo by adding the following via visudo:

syncoid ALL=(ALL) NOPASSWD:/sbin/zfs

I also set up an SSH key on the remote machine and added it to the syncoid user on my home server. Usually I would restrict the commands that could be run via this key for added security, but it looks like Syncoid does quite a bit so I wasn’t sure how to go about this (if any one has any idea let me know).

With that in place we can test our synchronisation:

$ sudo syncoid -r storage/mail syncoid@<MY HOME SERVER>:storage/backup/mail
$ syncoid -r storage/docker-data syncoid@<MY HOME SERVER>:storage/docker/`hostname`

For this to work you should make sure that the parent datasets are created on the receiving server, but not the destination datasets themselves, Syncoid likes to create them for you.

I then wrote a quick script to automate this which I dropped in /root/replicator.sh:

#!/bin/bash

USER=syncoid
HOST=<MY HOME SERVER>

HOSTNAME=$(hostname)

/usr/sbin/syncoid -r storage/mail $USER@$HOST:storage/backup/mail 2>&1
/usr/sbin/syncoid -r storage/docker-data $USER@$HOST:storage/docker/$HOSTNAME 2>&1

Then another cron job in /etc/cron.d/zfs-snapshots finishes the job:

56 *    * * *   root    /usr/local/bin/log-output '/root/replicator.sh'

Conclusion

Phew! There was quite a bit there. Thanks for sticking with me if you made it this far!

With this setup I’ve come a pretty long way towards my goal of better integrating my remote servers. So far I’ve only deployed this to a single server (which will become the new mailserver). There are a couple of others to go, so the next step will be to automate as much as possible of this via Ansible roles.

I hope you’ve enjoyed this journey with me. I’m interested to hear how others are integrating remote and local networks together. Let me know if you have anything to add via the feedback channels.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.

home assistant occupancy simulation

Keeping Baddies Away with Occupancy Simulation and Home Assistant

This post may contain affiliate links. Please see the disclaimer for more information.

Home security is one of the more interesting and useful applications of home automation technologies. Of-course it’s a whole field and industry in itself with all manner of products and services available. Of course the first part of any security system should be good physical security (good doors, windows and locks). Once you get past this it seems like you can go two ways: prevention or detection.

Most traditional systems are in the detection camp. They aim to detect an intruder and alert someone, either via the traditional loud noise or other means. The presence of an alarm box or sign my provide a minimal amount of prevention capability, but not much. Camera systems on the other hand provide both pretty robust prevention as well as (increasingly) advanced detection capabilities.

The third option is to ramp up the prevention angle by making it look like someone is home. It’s this that I’m going to tackle here, with an occupancy simulation application for Home Assistant called Occusim.

Note: I’m not a home security specialist and you should definitely consult a professional. This post is provided for informational purposes and should not be relied upon to provide a full security system. The author excepts no responsibility for any consequence of you relying on this information and/or not consulting someone who actually does this stuff for a living. YOU HAVE BEEN WARNED!

What is Occupancy Simulation?

Basically it’s pretending you are home, by switching on and off things as you would when you’re there. The reasoning behind it is that potential intruders are less likely to invade a house with people inside. I’m sure there are parts of the world where that doesn’t hold. However, I’d guess it applies in enough places to be useful.

Methods of occupancy simulation range from the very low tech “light on a timer” to the high tech approach we’re going to take here, using Home Assistant. The key is to make the sequence as close to the normal usage pattern as possible. So that anyone potentially watching your house can’t tell the difference.

Introducing OccuSim

OccuSim is a tool for defining complex occupancy simulation rules, for use with Home Assistant. These rules are defined as sequences which proceed from beginning to end (with potentially some randomness applied). In this way you can program in a full day’s operation. There is also a fully random mode, which works better for non-scheduled events. Both modes can be combined to give a good representation of your normal home operation.

OccuSim is actually an AppDaemon app, so you will need AppDaemon installed as a dependency. I’m not going to cover that now. You can either see the link above or read my previous post on the subject.

Once you have a functional AppDaemon install you can install OccuSim by cloning the repository into the apps subdirectory of you AppDaemon configuration. Rather than just a pure clone, I prefer to add it as a submodule in my AppDaemon configuration repository. This makes it easy to reinstall if you move your configuration around:

cd apps
git submodule add https://github.com/acockburn/occusim.git occusim
git commit -m "Add OccuSim"

With this setup, updating OccuSim is a little more involved. This is because you also need to commit the submodule change:

cd apps/occusim
git pull origin
cd ../..
git add apps/occusim
git commit -m "Update OccuSim"

Initial Configuration

OccuSim is configured as an app in your apps.yaml file. There is some initial config before you get down to creating your simulation sequences:

occupancy_simulator:
  class: OccuSim
  module: occusim
  log: '1'
  notify: '1'
  enable: input_boolean.vacation_mode,on
  test: '0'
  dump_times: '1'
  reset_time: '02:00:00'

Here I add a new app to my setup, called occupancy_simulator. We set the Python class to load as OccuSim from the module occusim, since we installed it in it’s own subdirectory. I set OccuSim to log it’s activity and to notify when a sequence step is activated. This will use the default notify.notify service in HASS, so you better have that set up to go to the right place. I haven’t found a way to change the notifier that it uses.

The enable setting takes an input_boolean entity and a state in which OccuSim should be active. Here I use my vacation mode toggle. I actuate mine manually, but it’s perfectly possible to set this from a HASS automation if you so desire.

The last few settings are some basic housekeeping. Test mode is disabled to ensure that OccuSim does it’s thing (though this can be useful when debugging your sequences). I also set the dump_times option to true so that I can see the times of the steps in the logs. I then set the time when I want to re-calculate the steps for the upcoming day. In my case this is set to 2am.

Setting up Sequences

A simple sequence might be the following:

step_evening_name: Evening
step_evening_start: 'sunset'
step_evening_on_1: scene.main_lights

This sequence simply turns on my main lights scene at sunset. The configuration variables take the form step_<id>_<variable> where <id> is a custom identifier that needs to be common across all variables for the step. Multiple entities can be turned on or off by appending numbers to the variables in question, e.g. step_evening_on_1, step_evening_on_2, etc.

You can add an offset to the times for sunset or sunrise, such as sunset - 00:20:00 to trigger 20 minutes before sunset. You can of course also specify an absolute time in the form HH:MM:SS.

More Advanced Sequences

So far that’s great, but we could have just turned on a light at sunset with a basic HASS automation rule. The real power of OccuSim comes in randomising the sequence times (within bounds) and creating multi-step sequences.

Expanding on our previous example:

step_evening_name: Evening
step_evening_start: 'sunset - 00:40:00'
step_evening_end: 'sunset - 00:10:00'
step_evening_on_1: scene.main_lights

This will turn on the lights at a random time between 40 minutes before sunset and ten minutes before sunset.

Let’s now create a multi-step sequence:

step_movie1_name: Movie Scene
step_movie1_start: '20:00:00'
step_movie1_end: '20:30:00'
step_movie1_on_1: scene.movie

step_movie2_name: Movie Scene Pause
step_movie2_relative: Movie Scene
step_movie2_start_offset: '00:35:00'
step_movie2_end_offset: '00:45:00'
step_movie2_on_1: script.downlights_bright

step_movie3_name: Movie Scene Play
step_movie3_relative: Movie Scene Pause
step_movie3_start_offset: '00:03:00'
step_movie3_end_offset: '00:06:00'
step_movie3_on_1: scene.movie

Here I start a sequence that will execute my movie scene sometime between 8pm and 8.30pm. I then specify a second step which will emulate us pausing the movie and putting the kitchen lights on. This happens sometime between 35 and 45 minutes later and is relative to the previous step. This means that whatever time the previous step is executed the second step will always come 35-45 minutes after that. The third step in the sequence is another relative one. This will execute 3-6 minutes after the previous one and emulates us starting up the movie again.

Random Mode

As mentioned earlier, you can also create totally random events. For example you might use the following to simulate overnight bathroom trips:

random_bathroom_name: Overnight Bathroom
random_bathroom_start: Bedtime
random_bathroom_end: Morning
random_bathroom_minduration: 00:02:00
random_bathroom_maxduration: 00:05:00
random_bathroom_number: 2
random_bathroom_on_1: light.bathroom
random_bathroom_off_1: light.bathroom

step_bedtime_name: Bedtime
step_bedtime_start: '21:30:00'
step_bedtime_end: '22:30:00'

step_morning_name: Morning
step_morning_start: 'sunrise + 00:20:00'

This configuration creates a 2 randomised events lasting 2-5 minutes, which turn on and off the bathroom light. These are defined as starting and ending relative to other steps in the sequence. I’ve defined the Bedtime and Morning steps in my example to illustrate this. These steps can of course contain actions of their own just like the steps above. The full configuration basically says “sometime starting between 9.30pm and 10.30pm and lasting until 20 minutes after sunrise there will be two instances of the bathroom light coming on for 2-5 minutes”. Pretty cool!

Conclusion

OccuSim is a pretty powerful tool, which allows you to create complex occupancy simulation rules with Home Assistant. I’ve only really just begun to map out my own simulations. You can find these on GitLab. Some of the examples here are based upon my rules, but the real times have been changed. I’m going to continue expanding upon this in the coming weeks. Hopefully I’ll end up with a pretty accurate simulation.

I’m also exploring further home security options and will hopefully be expanding my existing ZoneMinder setup this year. Stay tuned to the blog for more info when that happens.

If you liked this post and want to see more, please consider subscribing to the mailing list (below) or the RSS feed. You can also follow me on Twitter. If you want to show your appreciation, feel free to buy me a coffee.