Re: REGRESSION: [PATCH 4/4] block: freeze the queue earlier in del_gendisk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2022-07-09 02:17, Christoph Hellwig wrote:
> On Fri, Jul 08, 2022 at 09:55:55AM -0600, Logan Gunthorpe wrote:
>> I agree it's a mess, probably buggy and could use a cleanup with a
>> free_disk method. But I'm not sure the all_mdevs lifetime issues are the
>> problem here. If the entry in all_mdevs outlasts the disk, then
>> md_alloc() will just fail earlier. Many test scripts rely on the fact
>> that you can stop an mddev and recreate it immediately after. We need
>> some way of ensuring any deleted disks are fully deleted before trying
>> to make a new mddev, in case the new one has the same name as one being
>> deleted.
> 
> I think those tests are broken.  But fortunately that is just an
> assumption in the tests, while device name reuse is a real problem.
> 
> I could not reproduce your problem, but on the for-5.20/block
> tree I see a hang in 10ddf-geometry when running the tests.

Ah, strange. I never saw an issue with that test, though I didn't run
that one repeatedly with the latest branch. So maybe it was an
intermittent like I saw with 11spare-migration and I just missed it.

> The branch here:
> 
>     git://git.infradead.org/users/hch/block.git md-lifetime-fixes
>     http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/md-lifetime-fixes
> 
> fixes that for me and does not introduce new regressions.  Can you
> check if that helps your problem?

Yes, I can confirm those patches fix the bug I was seeing too.


I did a fairly quick review of the patches:

 - In the patch that introduces md_free_disk() it looks like md_free()
can still be called from the error path of md_alloc() before
mddev->gendisk is set... which seems to make things rather complicated
seeing we then can't use free_disk to finish the cleanup if the disk
hasn't been created yet. I probably need to take closer look at this
but, it might make more sense for the cleanup to remain in md_free() but
call kobject_put() in md_free_disk() and del_gendisk() in
mdev_delayed_delete(). Then md_alloc() can still use kobject_put() in
the error path and it makes a little more sense seeing we'd still be
freeing the kobject stuff in it's own release method instead of the
disks free method.

 - In the patch with md_rdevs_overlap, it looks like we remove the
rcu_read_lock(), which definitely seems out of place and probably isn't
correct. But the comment that was recreated still references the rcu so
probably should be changed.

 - The last patch has a typo in the title (disk is *a* freed).

Thanks!

Logan



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux