Run a Command Transiently When a Systemd Service Exitschain link icon indicating an anchor to a heading

The Problem

I have a Systemd oneshot unit that runs a backup on a timer. Recently, I performed an update on that machine while the backup was running, and I wanted to reboot after the backup finished. In the past, I might have started a shell loop in a tmux session like this:

mainpid="$(systemctl show --property=MainPID backup-job.service |
    cut -d= -f2)"
while kill -0 "$mainpid"
do
    sleep 10
done; systemctl reboot

This has never failed for me, but it could have an issue where the backup job completes during the sleep and its PID gets re-used before the next execution of kill. I decided to investigate using a transient unit started with systemd-run instead.

Running a Command After a Oneshot Exits

Since I didn’t want to accidentally reboot the system while testing, I created a new oneshot unit to use while developing this:

# /etc/systemd/system/test10s.service
[Unit]
Description=Test Service that runs for 10s and then succeeds

[Service]
Type=oneshot
ExecStart=/usr/bin/sleep 10

And then I tested using the After property of the transient unit to run a command when the test10s unit exited:

# systemctl start --no-block test10s
# systemd-run --no-block --property After=test10s.service sh -c 'date > /tmp/datelog'
Running as unit: run-r154afc9c334e467c9b38c2027e1821b4.service
# journalctl --since=-1min -u test10s.service -u run-r154afc9c334e467c9b38c2027e1821b4.service
Jun 11 18:56:27 localhost systemd[1]: Starting Test Service that runs for 10s and then succeeds...
Jun 11 18:56:37 localhost systemd[1]: test10s.service: Succeeded.
Jun 11 18:56:37 localhost systemd[1]: Finished Test Service that runs for 10s and then succeeds.
Jun 11 18:56:37 localhost systemd[1]: Started /usr/bin/sh -c date > /tmp/datelog.
Jun 11 18:56:37 localhost systemd[1]: run-r154afc9c334e467c9b38c2027e1821b4.service: Succeeded.
# cat /tmp/datelog
Tue 11 Jun 2024 06:56:37 PM UTC

This worked perfectly! I also tested this with a unit that exited with a failure and, as I expected from the Systemd documentation, the transient command was also executed in this case.

If you decide you don’t want the transient follow-up command to execute, it is possible to run, for example, systemctl stop run-r154afc9c334e467c9b38c2027e1821b4 and it will never take effect.

Running a Command After a Non-Oneshot Exits

What if I had written my backup job as another service type, like simple or exec, or if for some reason the service had RemainAfterExit=true but would be stopped in some other way?

# /etc/systemd/system/simple10s.service
[Unit]
Description=Simple Service that runs for 10s and then succeeds

[Service]
Type=simple
ExecStart=/usr/bin/sleep 10

This is a trickier proposition. Generally, Systemd defines a unit as started once the main process has started (with various differences between the different Type= specifications); only oneshots are “started” once the main process exits. Thus, using After= would have the transient command executed basically immediately.

I tried a few ways of overcoming this, and none of them were perfect. The first that worked was:

# systemctl start simple10s
# systemd-run --property ExecStopPost="sh -c 'date > /tmp/datelog'" --property BindsTo=simple10s.service --property RemainAfterExit=true true
Running as unit: run-r2867d350197b4a0fb0c2a7791039e991.service
# journalctl --since=-1min -u simple10s.service -u run-r2867d350197b4a0fb0c2a7791039e991.service
-- Journal begins at Tue 2024-04-23 18:57:44 UTC, ends at Tue 2024-06-11 19:34:49 UTC. --
Jun 11 19:34:29 localhost systemd[1]: Started Simple Service that runs for 10s and then succeeds.
Jun 11 19:34:29 localhost systemd[1]: Started /usr/bin/true.
Jun 11 19:34:39 localhost systemd[1]: simple10s.service: Succeeded.
Jun 11 19:34:39 localhost systemd[1]: Stopping /usr/bin/true...
Jun 11 19:34:39 localhost systemd[1]: run-r2867d350197b4a0fb0c2a7791039e991.service: Succeeded.
Jun 11 19:34:39 localhost systemd[1]: Stopped /usr/bin/true.
# cat /tmp/datelog
Tue 11 Jun 2024 07:34:39 PM UTC

The BindsTo= property links the transient service in such a way that when the main service exits, it will also exit, thus running its ExecStopPost command.

The problem with this of course is that there is no way to cancel the transient service without running its command. Because the “real” command is in ExecStopPost, whatever way you try to stop it will still cause this to execute.

It is possible to allow the cancellation of the transient command, but it is a bit more verbose. If you let it run to completion:

# sytemctl start simple10s
# sudo systemd-run --property ExecStopPost="sh -c 'if test "\$SERVICE_RESULT" = success; then date >/tmp/datelog; fi'" --property BindsTo=simple10s.service sleep infinity
Running as unit: run-r3da04227980d47eeaf8d5299b8f14ca8.service
# journalctl --since=-1min -u test10s -u run-r3da04227980d47eeaf8d5299b8f14ca8.service
-- Journal begins at Tue 2024-04-23 18:57:44 UTC, ends at Tue 2024-06-11 20:25:07 UTC. --
Jun 11 20:24:52 localhost systemd[1]: Started Simple Service that runs for 10s and then succeeds.
Jun 11 20:24:52 localhost systemd[1]: Started /usr/bin/sleep infinity.
Jun 11 20:25:02 localhost systemd[1]: simple10s.service: Succeeded.
Jun 11 20:25:02 localhost systemd[1]: Stopping /usr/bin/sleep infinity...
Jun 11 20:25:02 localhost systemd[1]: run-r3da04227980d47eeaf8d5299b8f14ca8.service: Succeeded.
Jun 11 20:25:02 localhost systemd[1]: Stopped /usr/bin/sleep infinity.
# cat /tmp/datelog
Tue 11 Jun 2024 08:25:02 PM UTC

And if you decide to kill it before the main service finishes:

# sytemctl start simple10s
# systemd-run --property ExecStopPost="sh -c 'if test "\$SERVICE_RESULT" = success; then date >/tmp/datelog; fi'" --property BindsTo=test10s.service sleep infinity
Running as unit: run-r231cd647fb0b4839b8ac10e93b9ead68.service
# systemctl kill --signal SIGKILL run-r231cd647fb0b4839b8ac10e93b9ead68.service
# sudo journalctl --since=-1min -u simple10s -u run-r231cd647fb0b4839b8ac10e93b9ead68.service --no-pager
-- Journal begins at Tue 2024-04-23 18:57:44 UTC, ends at Tue 2024-06-11 20:28:59 UTC. --
Jun 11 20:27:12 localhost systemd[1]: Started Simple Service that runs for 10s and then succeeds.
Jun 11 20:27:12 localhost systemd[1]: Started /usr/bin/sleep infinity.
Jun 11 20:27:16 localhost systemd[1]: run-r231cd647fb0b4839b8ac10e93b9ead68.service: Sent signal SIGKILL to main process 959185 (sleep) on client request.
Jun 11 20:27:16 localhost systemd[1]: run-r231cd647fb0b4839b8ac10e93b9ead68.service: Main process exited, code=killed, status=9/KILL
Jun 11 20:27:16 localhost systemd[1]: run-r231cd647fb0b4839b8ac10e93b9ead68.service: Failed with result 'signal'.
Jun 11 20:27:22 localhost systemd[1]: simple10s.service: Succeeded.
# cat /tmp/datelog
cat: /tmp/datelog: No such file or directory

It is necessary to use systemctl kill --signal SIGKILL instead of systemctl stop because the underlying service needs to exit with a non-zero status.

TL;DR

To run a command after a oneshot service exits, use:

systemd-run --no-block --property After=${the_service} ${the_command}

This can be canceled by simply running systemctl stop ${the_transient_service_name}.

To run a command after a non-oneshot service exits, use:

systemd-run \
    --property ExecStopPost="sh -c 'if test "\$SERVICE_RESULT" = success; then ${the_command}; fi'" \
    --property BindsTo=${the_service} \
    sleep infinity

This can be canceled by running systemctl kill --signal SIGKILL ${the_transient_service_name}.

I think the first case with the oneshot is pretty easy to use, but the second case for all other service types is a bit too verbose and error-prone for me to use regularly, especially since the kill-loop is already firmly in my toolbox.