Logo

Your Personal Sysadmin

Blog

Kamal: What Else Would You Need?

Published: 2024-03-19

Since its initial release, Kamal (formerly mrsk) has emerged as the long-awaited facelift to Capistrano, while also making the us of Docker seem like a walk in the park. At its heart, Kamal taps into Docker, steering us away from the oftentimes bewildering saga of Ruby on Rails deployment. It transitions us from wrangling a mishmash of infrastructures—spanning development, testing, staging, to production across varied machines and operating systems with their own quirks in versions and package management—to the blissfully straightforward narrative of deploying with a single, neatly packaged container image.

At first glance, tools like Kamal are akin to a silver bullet for the solo developer, allowing for a breezy journey from code to production. This is crucial because it means you can focus on what truly matters: crafting solutions to business challenges with your code.

Post-deployment, the ‘state’ saga begins, ranging from the health of the operating system and network to the dynamics of application data.

However, what often goes unmentioned, similar to discussions about cloud computing: it’s all still running on someone else’s hardware. Beneath the surface, there lurks the ever-present need for handling ‘state, state, state’. And state is what comes after the deploy: starting from the state of the operating system, the network state, all the way up to the state of the application data.

So, after you deploy and then redeploy, it’s time to ponder over your next moves:

  1. How do I maintain and update the operating system and essential services?
  2. How can I monitor the current state of the operating system, services, and my application?
  3. What strategies should I use for managing data storage, whether in short-term storage like Redis or long-term solutions like PostgreSQL and file systems?
  4. How do I manage state stored in third-party applications, and apply the above considerations to them?

Server Options

When choosing a server, you would probably go for a virtual unmanaged instance from popular provider. They’re budget-friendly, and even the smaller offerings can comfortably support an app, a worker, a Redis instance, and a PostgreSQL instance. At the time of writing, Hetzner’s CX11, at 4.55 Euros, will fit the bill nicely. For a tad more, at 5.22 Euros, you get a boost with an extra thread and double the storage. Over at Digital Ocean, a comparable setup will cost you about 18 USD per month. AWS, with its notoriously cryptic pricing, offers something similar at a ballpark cost. Linode’s in the mix too, with options that might set you back somewhere between 12 to 24 USD.

What do you mean, unmanaged?

Mind you, all of these units are unmanaged, which means all responsibility lies with you as the contract partner to the hosting service. A server left to its own devices can quickly turn into a digital wasteland. Public-facing services are like open invitations for anyone with mischief in mind. Once compromised, it’s essential hot garbage. Best to scrap the server and start fresh.

Even if it’s “just a side project” or “not a top priority,” vigilance is non-negotiable. So, here are some high-level tips on how I keep things under control.

Tackling State

To keep the operating system and essential services in tip-top shape, I use Ansible playbooks, categorizing them by setup, service roles, and specific configurations for my setups — often CapRover on Docker Swarm or vanilla Ubuntu Server.

I organize tasks around the server’s life cycle—think “system upgrades,” “Docker cleanup,” “backup preparations,” “user and SSH key updates,” and so on.

For business processes, programmatic runbooks provide a blend of manual steps, scripts, and API requests (i.e. dns providers, git management, etc.), allowing for scalable and efficient project management. Their greatest strength lies in blending various layers of complexity: Beginning with a straightforward, step-by-step to-do list, you can incrementally incorporate external command calls or integrate comprehensive scripts to streamline and automate processes.

Gaining Insights

Gaining insight into the performance of your operating system, services, and applications is a crucial maintenance activity. There’s no one-size-fits-all solution; it’s all about the scope of your project and your personal preferences.

For a comprehensive overview at various levels, starting with an application monitoring tool like Honeybadger is wise for tracking errors, uptime, and job performance. It’s an excellent first step. As you become more acquainted with your application, developing custom metrics endpoints for specific features can be a beneficial next move.

Diving deeper, the management of services and the operating system necessitates centralized, indexed logging capable of generating actionable metrics. For a solo host, beginning with journald to capture all system logs is a practical choice. Given the diversity of the Linux logging landscape, setting this up can be complex. Selecting software or services that offer more than basic stdout or logfile outputs is crucial, though integrating such logs with journald can add complexity. The ability to live-tail journald, viewing all log outputs in real-time, greatly aids in understanding how different services interact on a server. For environments beyond a single host, integrating tools like Promtail, Loki, and Grafana can address journald’s limitations, especially for developing alerting rules based on incident experiences to improve oversight.

Monitoring the actual health of your host is also vital. Hosting providers may not alert you to issues like maxed-out threads, choked I/O, or full disks. Tools like Netdata as a starter, and the trifecta of Node Exporter, Prometheus, and Alertmanager later on are invaluable for these deeper diagnostics. When selecting tools or third-party services, consider the type of data access they provide and the flexibility for configuring custom metrics and alerts.

Backup Strategies

Many hosting providers offer daily snapshots of your system as part of their service, often costing you 10-20% of your monthly server fees. Generally, this seems adequate until a snapshot coincides with a crucial database transaction or file write, potentially leading to data inconsistencies or corruption. The process for creating snapshots isn’t always transparent, and the common advice to shut down your server for a manual snapshot highlights its limitations.

My approach prioritizes data over the operating system’s state, focusing on database content and important application files — like media or client products — which are irreplaceable and costly to recreate. For everything else, tools like Ansible can rebuild the system from scratch.

Restic in conjunction with Autorestic is my preferred backup solution. It supports a wide range of storage backends and secures backups with encryption, offering differential backups and the option for full backups at specific intervals. The downside is its recovery process, which requires navigating through Restic, but the trade-off for secure, manageable backups is worth it. Adhering to the 3-2-1 backup strategy — three total backups on two different media types, with one stored offsite — provides a robust safety net. Thus, combining provider snapshots with Restic backups across various platforms, like B2 Backblaze and Hetzner storage boxes, ensures comprehensive coverage. While daily backups are standard, timing them to capture the day’s significant work can further safeguard your data.

Conclusion

Managing an application’s lifecycle is a complex task that can’t be fully covered in just one article. I plan to dive deeper into the topics discussed here, sharing more of my experiences and insights. This will allow me to explore these concepts in a more practical, detailed manner, moving beyond theoretical discussions to real-world applications.

Therefore, feel free to leverage Kamal for streamlining your deployment process, but remember, deployment is just one phase in the broader journey of an application’s life.

References

Kamal

Hosting Providers

Management Tools

Insights Tools

Backup Tools

Moving lvm-thin volumes on proxmox between vm-s or ct-s

Published: 2021-11-05

Following this official howto

lvs shows you all volumes in their volume group (in my case ‘ssd’)

LV               VG  Attr       LSize    Pool        Data%  Meta%
data             pve twi-a-tz-- 32.12g               0.00   1.58
root             pve -wi-ao---- 16.75g
swap             pve -wi-ao---- 8.00g
guests           ssd twi-aotz-- <2.33t               74.93  45.51
vm-100-disk-0    ssd Vwi-a-tz-- 12.00g guests        72.69
vm-101-disk-0    ssd Vwi-a-tz-- 12.00g guests        85.22
vm-101-disk-1    ssd Vwi-a-tz-- 50.00g guests        99.95
vm-102-disk-0    ssd Vwi-a-tz-- 12.00g guests        97.57
vm-102-disk-1    ssd Vwi-a-tz-- 50.00g guests        64.54
vm-103-disk-0    ssd Vwi-a-tz-- 12.00g guests        74.37
vm-103-disk-1    ssd Vwi-a-tz-- 150.00g guests        52.42
vm-104-disk-0    ssd Vwi-a-tz-- 12.00g guests        90.74
vm-104-disk-1    ssd Vwi-a-tz-- 10.00g guests        95.27
vm-105-disk-0    ssd Vwi-a-tz-- 12.00g guests        55.79
vm-105-disk-1    ssd Vwi-a-tz-- 10.00g guests        32.89
vm-106-disk-0    ssd Vwi-a-tz-- 12.00g guests        77.78
vm-106-disk-1    ssd Vwi-a-tz-- 10.00g guests        99.82
vm-107-disk-0    ssd Vwi-a-tz-- 32.00g guests        0.00
vm-107-disk-1    ssd Vwi-a-tz-- 500.00g guests        95.41
vm-108-disk-0    ssd Vwi-aotz-- 8.00g guests        43.73
vm-109-disk-0    ssd Vwi-a-tz-- 12.00g guests        52.41
vm-109-disk-1    ssd Vwi-a-tz-- 50.00g guests        2.22
vm-110-disk-0    ssd Vwi-a-tz-- 12.00g guests        51.14
vm-110-disk-1    ssd Vwi-a-tz-- 50.00g guests        2.22
vm-111-disk-0    ssd Vwi-a-tz-- 12.00g guests        84.85
vm-111-disk-1    ssd Vwi-a-tz-- 100.00g guests        16.97
vm-112-disk-0    ssd Vwi-a-tz-- 8.00g guests        13.53
vm-113-disk-0    ssd Vwi-a-tz-- 8.00g guests        11.55
vm-114-disk-0    ssd Vwi-a-tz-- 16.00g guests        84.31
vm-115-disk-0    ssd Vwi-a-tz-- 16.00g guests        97.12
vm-116-disk-0    ssd Vwi-a-tz-- 8.00g guests        31.49
vm-117-cloudinit ssd Vwi-aotz-- 4.00m guests        50.00
vm-117-disk-0    ssd Vwi-aotz-- 10.00g guests        39.71
vm-117-disk-1    ssd Vwi-aotz-- 1000.00g guests        97.47

If the id of the new ct or vm is not equal to the id of the volume’s previous attachment, rename them, i.e.

lvrename ssd/vm-101-disk-1 ssd/vm-117-disk-2

this will make vm-101-disk-1 available as vm-117-disk-2, you have to increase the count in the end of the name.

then edit the config of the actual vm.

take the line from /etc/pve/qemu-server/<vm id>.conf that describes the volume to the new <vm id>.conf

the tricky thing was to run qm rescan afterwards which fixed syntax and made the volume appear in the web gui where i could finally attache it to the new vm.

run openvpn in client mode automatically after linux boot

Published: 2021-01-15

scenario: send out a raspberry pi model b rev1, all setup with raspberryi os / raspbian.

the hardware specs are nothing much, but the machine is reliable, even when apparently half the ram chips are dead….

install openvpn, then take the config file from the server you want to connect to - in my case an ovpn file generated by pivpn - and put it into the config folder `/etc/openvpn/`. if your vpn profile is password protected, just add a simple textfile with the cleartext pass and reference it in your vpn profile file like so: askpass /etc/openvpn/passwordfilename

make sure openvpn.service is started and enabled. systemctl enable openvpn && systemctl restart openvpn

should be it, ip a should show you the tunnel interface already.

ps: for the routing, make sure that your that your router has a static entry that sends all the traffic to the vpn subnet to the vpn server, but that is something that depends really on your own net topology.

subtle changes in key format of key pairs generated with `ssh-keygen` on linux

Published: 2019-07-11

I just came across an unexpected ssh key subtlety you might want to consider while creating a drone ci deployment pipeline using drone’s ansible plugin.

Part of the pipeline includes deploying code to a remote host via ssh. I generated a new key pair with ssh-keygen. This created a key with openssh new format starting with:

-----BEGIN OPENSSH PRIVATE KEY-----

Apparently ansible does not like this format and on the “Gathering facts” step erred out with the message “Invalid key”. Googling that was not very successful, and I could not find that particular message in the ansible source, until i eventually found an unrelated closed issue on github which pointed me towards possible problems with key formats.

Eventually i generated a new key pair like so ssh-keygen -m PEM, the -m option setting the key format. The key then had the starting line

-----BEGIN RSA PRIVATE KEY-----

As far as i understand both keys are actually RSA keys, the latter’s PEM format being implied, whereas the former uses some new openssh format i was not previously aware of.

Earlier runs of ssh-keygen did produce keys in the PEM format and as i am running Archlinux with OpenSSH_8.0p1, OpenSSL 1.1.1c 28 May 2019

One of the rolling updates to my system probably brought along this unexpected change.

Hope that helps somebody.

Installing Ubuntu per minimal image to PC engines APU2

Published: 2017-05-17

This is the company: PCengines

This is the device: APU2

nullmodem setup

using putty

Check which com port, mine was set to ‘com4’

Get a usb to serial converter, install drivers. Some of those converters seem to have timing problems, but i did not encounter that.

I once tried lowest baud rate 9600 and that produced some nice screen carnival, but nothing really legible.

prepping usb stick

Download usb prep tool ‘TinyCore USB Installer’ and run it against on usb, i’ve used an 8GB stick, make sure it’s not the slowest

To try out you can now boot into TINYCORE. So put this into the APU2’s usb port and boot up having the serial nullmodem cable connected and the putty session open. Finished boot is indicated by an audible beep. This is good to check the serial connection which you should have established parallel to that.

If you want to keep the option of booting into TINYCORE open, backup the syslinux.config fom the USB’s root directory, as this one will be overwritten by the package content we are now downloading.

Download special ubuntu package from pcengines, unpack and move the three files into the usb root folder / or :/ depending on your system.

Now plug in the usb into the apu2 and boot having the serial nullmodem cable connected and the putty session open. You will see the setup menu, similar to this screen shot:

View Installation Setup Wizzard

The terminal setup process seems daunting at first, but it essentially is really analogues to the graphical ubuntu installer. I found my way around by basically following the Easy Path(tm) of most of the suggestions of the installer, going automatically step by step through the menu. On some of the sub menus i was able to make some educated changes as i knew a bit of more details and i had a good idea where i want to go with this system, but this might not apply to you.

The one exception was the network configuration. running the automatic network detection seems to have got the dhcpd info , but when i dropped into the busy box ash shell environment (one menu option Execute a shell in the main hierarchy at the beginning of the installation process), i had to run dhclient directly on the interface again. Checking via ip addr i now could verify the indeed applied values, and could ping any public server. With exit i dropped back into the installation menu. On a later second setup run this problem did not occur again.

I chose no automatic updates as i can see the cronjob using quite some resources. I’d rather manually schedule that for this particular system at them moment. Part of the minimum running service policy of mine for this instance.

I followed some tip regarding the bootloader installation, and it apparently solved my problem of an unfinished installation before. I lost the link, but it boiled down to manually enter the first partition of the setup target (pcie flash device in my case), so that was /dev/sdb1 as opposed to /dev/sdb. Again, this might be different for you.

Once that was done, and with a bit more patience i rebooted and eventually login via ssh could be established. I then halted the machine, physically unplugged the usb key and the console, and replugged power.

After about 45 sec ping answered and after than ssh came back online.