On Thu, May 23, 2024 at 12:33 AM Logan Gunthorpe <logang@xxxxxxxxxxxx> wrote: > > > > On 2024-05-22 02:46, Xiao Ni wrote: > > Hi Logan > > > > Thanks for your suggestion. I tried to create signalfd before fork() > > but it still can't work. And I call sleep(2) before child exits, the > > problem still can happen sometimes. This is what I tried. If you have > > time to have a look, it's great. It's not a hurry thing. > > Can you send sample prints from this patch? I'm very surprised that with > the delay on the child processes it can still happen. There are three changes: 1. initialize signalfd before fork 2. Block SIGCHLD before fork 3. sleep 2 seconds before child exists 4. break when receiving SIGINT signal > > What do you do to make it happen? How frequently does it occur? mdadm -Ss i=0 while [ 1 ]; do /usr/sbin/mdadm -CfR /dev/md0 -l 5 -n3 /dev/loop0 /dev/loop1 /dev/loop2 --write-zeroes --auto=yes -v mdadm --wait /dev/md0 mdadm -Ss sleep 1 i=$((i+1)) echo $i done It's easy to reproduce in my environment. The loop device is 20MB which is the device size during regression tests. > > > > @@ -185,17 +186,6 @@ static int wait_for_zero_forks(int *zero_pids, int count) > > if (!wait_count) > > return 0; > > > > - sigemptyset(&sigset); > > - sigaddset(&sigset, SIGINT); > > - sigaddset(&sigset, SIGCHLD); > > - sigprocmask(SIG_BLOCK, &sigset, NULL); > > - > > - sfd = signalfd(-1, &sigset, 0); > > - if (sfd < 0) { > > - pr_err("Unable to create signalfd: %s\n", strerror(errno)); > > - return 1; > > - } > > Strictly speaking, I don't think it's necessary to move the signalfd > initialization. Blocking the signals should be enough, then any signals > can be retrieved at a later time with signalfd. Though, I don't think it > should hurt to do it this way. I did a test in a simple c program. part0: block SIGCHLD and SIGINT part1: fork 3 child process. It doesn't do anything in the child process and exits. part2: read from sfd to check SIGCHLD part-initsfd: sleep 2 seconds and sfd=signalfd If I put part-initsfd between part1 and part2, it can happen the same thing. part2 only sees one SIGCHLD and it's stuck. If part-initsfd is before part1, it works well. The father process can see the 3 SIGCHLD signals. So it's the reason it init sfd before fork in my patch. Best Regards Xiao > > Logan >