Re: REGRESSION: [PATCH 4/4] block: freeze the queue earlier in del_gendisk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 07, 2022 at 11:41:40PM -0600, Logan Gunthorpe wrote:
> I'm not really sure why this is yet, but this patch in rc4 causes some
> random failures with mdadm tests.
> 
> It seems the 11spare-migration tests starts failing roughly every other
> run because the block device is not quite cleaned up after mdadm --stop
> by the time the next mdadm --create commands starts, or rather there
> appears to be a race now between the newly created device and the one
> being cleaned up. This results in an infrequent sysfs panic with a
> duplicate filename error (see the end of this email).
> 
> I managed to bisect this and found a09b314005f3a09 to be the problematic
> commit.

Taking a look at the mddev code this commit just seems to increase the
race window of hitting horrible life time problems in md, but I'll also
try to reproduce and verify it myself.

Take a look at how md searches for a duplicate name in md_alloc,
mddev_alloc_unit and mddev_find_locked based on the all_mddevs list,
and how the mddev gets dropped from all_mddevs very early and long
before the gendisk is gone in mddev_put.  I think what needs to be
done is to implement a free_disk method and drop the mddev (and free it)
from that.  But given how much intricate mess is based on all_mddevs
we'll have to be very careful about that.



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux