On Fri, 04 Nov 2022, Mikulas Patocka wrote: > There's a crash in mempool_free when running the lvm test > shell/lvchange-rebuild-raid.sh. > > The reason for the crash is this: > * super_written calls atomic_dec_and_test(&mddev->pending_writes) and > wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev) > and bio_put(bio). > * so, the process that waited on sb_wait and that is woken up is racing > with bio_put(bio). > * if the process wins the race, it calls bioset_exit before bio_put(bio) > is executed. > * bio_put(bio) attempts to free a bio into a destroyed bio set - causing > a crash in mempool_free. > > We fix this bug by moving bio_put before atomic_dec_and_test. > > The function md_end_flush has a similar bug - we must call bio_put before > we decrement the number of in-progress bios. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 11557f0067 P4D 11557f0067 PUD 0 > Oops: 0002 [#1] PREEMPT SMP > CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 > Workqueue: kdelayd flush_expired_bios [dm_delay] > RIP: 0010:mempool_free+0x47/0x80 > Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00 > RSP: 0018:ffff88910036bda8 EFLAGS: 00010093 > RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8 > RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900 > R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000 > R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05 > FS: 0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0 > Call Trace: > <TASK> > clone_endio+0xf4/0x1c0 [dm_mod] > clone_endio+0xf4/0x1c0 [dm_mod] > __submit_bio+0x76/0x120 > submit_bio_noacct_nocheck+0xb6/0x2a0 > flush_expired_bios+0x28/0x2f [dm_delay] > process_one_work+0x1b4/0x300 > worker_thread+0x45/0x3e0 > ? rescuer_thread+0x380/0x380 > kthread+0xc2/0x100 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > </TASK> > Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd] > CR2: 0000000000000000 > ---[ end trace 0000000000000000 ]--- > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > > --- > drivers/md/md.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/drivers/md/md.c > =================================================================== > --- linux-2.6.orig/drivers/md/md.c 2022-11-03 15:29:02.000000000 +0100 > +++ linux-2.6/drivers/md/md.c 2022-11-03 15:33:17.000000000 +0100 > @@ -509,13 +509,14 @@ static void md_end_flush(struct bio *bio > struct md_rdev *rdev = bio->bi_private; > struct mddev *mddev = rdev->mddev; > > + bio_put(bio); > + > rdev_dec_pending(rdev, mddev); > > if (atomic_dec_and_test(&mddev->flush_pending)) { > /* The pre-request flush has finished */ > queue_work(md_wq, &mddev->flush_work); > } > - bio_put(bio); > } > > static void md_submit_flush_data(struct work_struct *ws); > @@ -913,10 +914,11 @@ static void super_written(struct bio *bi > } else > clear_bit(LastDev, &rdev->flags); > > + bio_put(bio); > + > if (atomic_dec_and_test(&mddev->pending_writes)) > wake_up(&mddev->sb_wait); > rdev_dec_pending(rdev, mddev); > - bio_put(bio); > } Thanks. I think this is a clear improvement. I think it would be a little better if the rdev_dec_pending were also move up. Then both code fragments would be: bio_put ; rdev_dec_pending ; atomic_dec_and_test Thanks, NeilBrown