On Tue Feb 6, 2024 at 12:58 AM AEST, Marc Hartmayer wrote: > On Fri, Feb 02, 2024 at 04:57 PM +1000, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > Starting a pipeline of jobs in the background does not seem to have > > a simple way to reliably find the pid of a particular process in the > > pipeline (because not all processes are started when the shell > > continues to execute). > > > > The way PID of QEMU is derived can result in a failure waiting on a > > PID that is not running. This is easier to hit with subsequent > > multiple-migration support. Changing this to use $! by swapping the > > pipeline for a fifo is more robust. > > > > Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> > > --- > > […snip…] > > > > > + # Wait until the destination has created the incoming and qmp sockets > > + while ! [ -S ${migsock} ] ; do sleep 0.1 ; done > > + while ! [ -S ${qmp2} ] ; do sleep 0.1 ; done > > There should be timeout implemented, otherwise we might end in an > endless loop in case of a bug. Or is the global timeout good enough to > handle this situation? I was going to say it's not worthwhile since we can't recover, but actually printing where the timeout happens if nothing else would be pretty helpful to gather and diagnose problems especially ones we can't reproduce locally. So, yeah good idea. We have a bunch of potential hangs where we don't do anything already though. Sadly it doesn't look like $BASH_LINENO can give anything useful of the interrupted context from a SIGHUP trap. We might be able to do something like - timeout_handler() { echo "Timeout $timeout_msg" exit } trap timeout_handler HUP timeout_msg="waiting for destination migration socket to be created" while ! [ -S ${migsock} ] ; do sleep 0.1 ; done timeout_msg="waiting for destination QMP socket to be created" while ! [ -S ${qmp2} ] ; do sleep 0.1 ; done timeout_msg= Unless you have any better ideas. Not sure if there's some useful bash debugging options that can be used. Other option is adding timeout checks in loops and blocking commands... not sure if that's simpler and less error prone though. Anyway we have a bunch of potential hangs and timeouts that aren't handled already though, so I might leave this out for a later pass at it unless we come up with a really nice easy way to go. Thanks, Nick > > > + > > qmp ${qmp1} '"migrate", "arguments": { "uri": "unix:'${migsock}'" }' > ${qmpout1} > > > > # Wait for the migration to complete > > -- > > 2.42.0 > > > >