Re: systemd service causing bash to miss signals?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pipelines somewhat rely on the kernel delivering SIGPIPE to the writer as soon as the read end is closed. So if you have `foo | head -1`, then as soon as head reads enough and exits, foo gets killed via SIGPIPE. But as most systemd-managed services aren't shell interpreters, systemd marks SIGPIPE as "ignored" when starting the service process, so that if the service is somehow tricked into opening a pipe that a user has mkfifo'd, at least the kernel can't be tricked into killing the service. You can opt out of this using IgnoreSIGPIPE=.

(Though even if there's no signal, I believe  the writer should also get an -EPIPE out of every write attempt, but not all tools pay attention to it – some just completely ignore the write() result, like apparently `fold` does in your case...)

On Mon, Sep 19, 2022, 20:18 Brian Reichert <reichert@xxxxxxxxxxx> wrote:
I apologize for the vague subject.

The background: I've inherited some legacy software to manage.

This is on SLES12 SP5, running:

        systemd-228-157.40.1.x86_64

One element is a systemd-managed service, written in Perl, that in
turn, is using bash to generate random numbers (don't ask me why
this tactic was adopted).

Here's an isolation of that logic:

  pheonix:~ # cat /root/random_str.pl
  #!/usr/bin/perl
  print "$0 start ".time."\n";
  my $randStr = `cat /dev/urandom|tr -dc "a-zA-Z0-9"|fold -w 64|head -1`;
  print "$0 end ".time."\n";

You can run this from the command-line, to see how quickly it
nominally operates.

What I can reproduce in my environment, very reliably, is that when
this is invoked as a service:

- the 'head' command exits very quickly (to be expected)
- the shell does not exit (maybe missed a SIGCHILD?)
- 'fold' chews a CPU core
- A kernel trace shows that 'fold' is spinning on SIGPIPEs, as it's
  STDOUT is no longer connected to another process.

My service unit:

  pheonix:~ # cat /etc/systemd/system/random_str.service
  [Unit]
  Description=gernate random number
  After=network.target local-fs.target

  [Service]
  Type=oneshot
  RemainAfterExit=yes
  ExecStart=/root/random_str.pl
  ExecStop=/usr/bin/true
  #TimeoutSec=infinity
  TimeoutSec=900

  [Install]
  WantedBy=multi-user.target

Easy to repro; this hangs forever, instead of exiting quickly.

  pheonix:~ # systemctl daemon-reload
  pheonix:~ # systemctl start random_str

Let me know if there are any other details of my environment that
would be helpful here.

--
Brian Reichert                          <reichert@xxxxxxxxxxx>
BSD admin/developer at large   

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux