On Mon, Dec 20, 2021 at 05:23:08PM +0900, Tetsuo Handa wrote: > On 2021/12/20 5:00, Luis Chamberlain wrote: > > On Fri, Dec 17, 2021 at 07:37:43PM +0900, Tetsuo Handa wrote: > >> I think we can apply this patch as-is... > > > > Unfortunately I don't think so, don't we end up still with a race > > in between the first part of device_add() and the kobject_add() > > which adds the kobject to generic layer and in return enables the > > disk_release() call for the disk? I count 5 error paths in between > > including kobject_add() which can fail as well. > > I can't catch which path you are talking about. > Will you explain more details using call trace (or line numbers in > https://elixir.bootlin.com/linux/v5.16-rc6/source/block/genhd.c#L397 ) ? I mean right after disk_alloc_events(): https://elixir.bootlin.com/linux/v5.16-rc6/source/block/genhd.c#L440 And right inside device_add() https://elixir.bootlin.com/linux/v5.16-rc6/source/block/genhd.c#L452 Within device_add(), there are about 5 things which can from the beginning of device_add() on line 3275 up to where kobject_add() completes successfully in line 3329: https://elixir.bootlin.com/linux/v5.16-rc6/source/drivers/base/core.c#L3275 https://elixir.bootlin.com/linux/v5.16-rc6/source/drivers/base/core.c#L3329 > > If correct then something like the following may be needed: > > > > diff --git a/block/genhd.c b/block/genhd.c > > index 3c139a1b6f04..08ab7ce63e57 100644 > > --- a/block/genhd.c > > +++ b/block/genhd.c > > @@ -539,9 +539,10 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk, > > out_device_del: > > device_del(ddev); > > out_disk_release_events: > > - disk_release_events(disk); > > + if (!kobject_alive(&ddev->kobj)) > > + disk_release_events(disk); > > out_free_ext_minor: > > - if (disk->major == BLOCK_EXT_MAJOR) > > + if (!kobject_alive(&ddev->kobj) && disk->major == BLOCK_EXT_MAJOR) > > How can kobject_alive() matter? There are two hunks here. The first one I hope the above explains it. As for the second one, the assumption is that if device_add() succeeded the free_inode super op would do the respective blk_free_ext_minor() however now I am not sure if this is true. The kobject_alive() tells us if at least the device_add() had the kobject_add() complete. Since we have two hunks to consider I think we should be clear about differentiating between both. Luis