Re: mdadm/Create wait_for_zero_forks is stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 23, 2024 at 12:33 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote:
>
>
>
> On 2024-05-22 02:46, Xiao Ni wrote:
> > Hi Logan
> >
> > Thanks for your suggestion. I tried to create signalfd before fork()
> > but it still can't work. And I call sleep(2) before child exits, the
> > problem still can happen sometimes. This is what I tried. If you have
> > time to have a look, it's great. It's not a hurry thing.
>
> Can you send sample prints from this patch? I'm very surprised that with
> the delay on the child processes it can still happen.

There are three changes:
1. initialize signalfd before fork
2. Block SIGCHLD before fork
3. sleep 2 seconds before child exists
4. break when receiving SIGINT signal

>
> What do you do to make it happen? How frequently does it occur?

mdadm -Ss
i=0

while [ 1 ]; do
  /usr/sbin/mdadm -CfR /dev/md0 -l 5 -n3 /dev/loop0 /dev/loop1
/dev/loop2 --write-zeroes --auto=yes -v
  mdadm --wait /dev/md0
  mdadm -Ss
  sleep 1
  i=$((i+1))
  echo $i
done

It's easy to reproduce in my environment. The loop device is 20MB
which is the device size during regression tests.

>
>
> > @@ -185,17 +186,6 @@ static int wait_for_zero_forks(int *zero_pids, int count)
> >         if (!wait_count)
> >                 return 0;
> >
> > -       sigemptyset(&sigset);
> > -       sigaddset(&sigset, SIGINT);
> > -       sigaddset(&sigset, SIGCHLD);
> > -       sigprocmask(SIG_BLOCK, &sigset, NULL);
> > -
> > -       sfd = signalfd(-1, &sigset, 0);
> > -       if (sfd < 0) {
> > -               pr_err("Unable to create signalfd: %s\n", strerror(errno));
> > -               return 1;
> > -       }
>
> Strictly speaking, I don't think it's necessary to move the signalfd
> initialization. Blocking the signals should be enough, then any signals
> can be retrieved at a later time with signalfd. Though, I don't think it
> should hurt to do it this way.

I did a test in a simple c program.

part0: block SIGCHLD and SIGINT
part1: fork 3 child process. It doesn't do anything in the child
process and exits.
part2: read from sfd to check SIGCHLD
part-initsfd: sleep 2 seconds and sfd=signalfd

If I put part-initsfd between part1 and part2, it can happen the same
thing. part2 only sees one SIGCHLD and it's stuck. If part-initsfd is
before part1, it works well. The father process can see the 3 SIGCHLD
signals. So it's the reason it init sfd before fork in my patch.

Best Regards
Xiao

>
> Logan
>






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux