On (24/10/07 22:56), Christoph Hellwig wrote: > On Tue, Oct 08, 2024 at 02:26:17PM +0900, Sergey Senozhatsky wrote: > > Didn't copy one more backtrace here, there are two mutexes involved. > > > > schedule+0x554/0x1218 > > schedule_preempt_disabled+0x30/0x50 > > mutex_lock+0x3c/0x70 > > sr_block_release+0x2c/0x60 [sr_mod (HASH:d5f2 4)] > > blkdev_put+0x184/0x290 > > blkdev_release+0x34/0x50 > > __fput_sync+0xa8/0x2d8 > > __arm64_sys_close+0x6c/0xd8 > > invoke_syscall+0x78/0xf0 > > > > So process A holds cd->lock and sleeps in blk_queue_enter() > > process B holds ->open_mutex and sleeps on cd->lock, which is owned by A > > process C sleeps on ->open_mutex, which is owned by B. > > Oh, cd->mutex is a bit of a problem. And looking into the generic > CD layer code this can be relatively easily avoided while cleaning > a lot of the code up. Give me a little time to cook something up. Sure, thanks. I can't test the patch, tho. At least not yet. CD layer is in several reports, I also have reports with SD, and a bunch of reports that I still have to look at. E.g. schedule blk_queue_enter blk_mq_alloc_request scsi_execute_cmd ioctl_internal_command scsi_set_medium_removal sd_release blkdev_put cd->lock still falls a victim of "blk_queue_enter() and blk_queue_start_drain() are both called under ->open_mutex" thingy, which seems like a primary problem here. No matter why blk_queue_enter() sleeps, draining under ->open_mutex, given that what we want to drain can hold ->open_mutex, sometimes isn't going to drain. > I also wonder if simulating a cdrom removal might be possible using > qemu to help reproducing some of this. Hmm, that's an interesting idea. I've only tried to "unsafely" remove a USB stick out of my laptop so far, with no success.