Re: mdadm/Create wait_for_zero_forks is stuck

Logan Gunthorpe <logang@xxxxxxxxxxxx> · Tue, 21 May 2024 10:56:20 -0600

Hi Xiao,

I don't have time to dig into this myself, but my guess would be that
the signal for one of the children come too quickly, before the
sigprocmask() call in wait_for_zero_forks().

Seems like SIGCHLD should be blocked before the first call to
write_zeroes_fork(). I'm really not sure why I put in a block to SIGINT
and then a block to SIGCHLD after the processes started. I suspect
adding SIGCHLD to the sigprocmask in add_disks() and just removing the
sigprocmask in write_zeroes_fork() might fix the issue.

Thanks,

Logan

On 2024-05-21 01:05, Xiao Ni wrote:
> Hi Logan
> 
> I'm trying to fix errors of mdadm regression failures. There is a
> failure in 00raid5-zero sometimes. I added some logs:
> 
> In function write_zeroes_fork:
>                 if (fallocate(fd, FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE,
>                               offset_bytes, sz)) {
>                         pr_err("zeroing %s failed: %s\n", dv->devname,
>                                strerror(errno));
>                         ret = 1;
>                         break;
>                 } else
>                         printf("zeroing good\n");
> 
> In function wait_for_zero_forks:
>                 if (fdsi.ssi_signo == SIGINT) {
>                         printf("\n");
>                         pr_info("Interrupting zeroing processes,
> please wait...\n");
>                         interrupted = true;
>                         break;
>                 } else if (fdsi.ssi_signo == SIGCHLD) {
>                         printf("one child finishes, wait count %d\n",
> wait_count);
>                         if (!--wait_count)
>                                 break;
>                 }
> 
> while [ 1 ]; do
>   /usr/sbin/mdadm -CfR /dev/md0 -l 5 -n3 /dev/loop0 /dev/loop1
> /dev/loop2 --write-zeroes --auto=yes -v
>   mdadm --wait /dev/md0
>   mdadm -Ss
>   sleep 1
> done
> 
> zeroing good
> zeroing good
> zeroing good
> one child finishes, wait count 3
> one child finishes, wait count 2
> 
> It looks like the farther process misses one child signal.
> 
> root      174247  0.0  0.0   3628  2552 pts/0    S+   02:52   0:00  |
>              \_ /usr/sbin/mdadm -CfR /dev/md0 -l 5 -n3 /dev/loop0
> /dev/loop1 /dev/loop2 --write-zeroes --auto=yes -v
> root      174248  0.0  0.0      0     0 pts/0    Z+   02:52   0:00  |
>                  \_ [mdadm] <defunct>
> root      174249  0.0  0.0      0     0 pts/0    Z+   02:52   0:00  |
>                  \_ [mdadm] <defunct>
> root      174250  0.0  0.0      0     0 pts/0    Z+   02:52   0:00  |
>                  \_ [mdadm] <defunct>
> 
> ]# cat /proc/174247/stack
> [<0>] signalfd_dequeue+0x14d/0x170
> [<0>] signalfd_read_iter+0x7b/0xd0
> [<0>] vfs_read+0x201/0x330
> [<0>] ksys_read+0x5f/0xe0
> [<0>] do_syscall_64+0x7b/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Any ideas for this?
> 
> Best Regards
> Xiao
>