Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2022-05-25 03:04, Guoqing Jiang wrote:
> I would prefer to focus on block tree or md tree. With latest block tree
> (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below
> similar issue as Donald reported, it happened with the cmd (which did
> work with 5.12 kernel).
> 
> vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap

Ok, so this test passes for me, but my VM was not running with bfq. It
also seems we have layers upon layers of different bugs to untangle.
Perhaps you can try the tests with bfq disabled to make progress on the
other regression I reported.

If I enable bfq and set the loop devices to the bfq scheduler, then I
hit the same bug as you and Donald. It's clearly a NULL pointer
de-reference in the bfq code, which seems to be triggered on the
partition read after mdadm opens a block device (not sure if it's the md
device or the loop device but I suspect the latter seeing it's not going
through any md code).

Simplifying things down a bit, the null pointer dereference can be
triggered by creating an md device with loop devices that have bfq
scheduler set:

  mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1

The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL.
It's hard to trace where the NULL comes from in there -- the code is a
bit complex.

I've found that the bfq bug exists in current md-next (42b805af102) but
did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug
was introduced by:

  4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage")

Reverting that commit and the next commit (075a53b7) on top of md-next
was confirmed to fix the bug.

I've copied Jan, Jens and Paolo who can hopefully help with this. A
cleaned up stack trace follows this email for their benefit.

Logan

--

 BUG: KASAN: null-ptr-deref in bfq_bio_bfqg+0x65/0xf0
 Read of size 1 at addr 0000000000000094 by task mdadm/850

 CPU: 1 PID: 850 Comm: mdadm Not tainted
5.18.0-rc3-eid-vmlocalyes-dbg-00005-g42b805af1024 #2113
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  kasan_report.cold+0x5f/0x1a9
  __asan_load1+0x4d/0x50
  bfq_bio_bfqg+0x65/0xf0
  bfq_bic_update_cgroup+0x2f/0x340
  bfq_insert_requests+0x568/0x5800
  blk_mq_sched_insert_request+0x180/0x230
  blk_mq_submit_bio+0x9f0/0xe50
  __submit_bio+0xeb/0x100
  submit_bio_noacct_nocheck+0x1fd/0x470
  submit_bio_noacct+0x350/0xa80
  submit_bio+0x84/0xf0
  submit_bh_wbc+0x27a/0x2b0
  block_read_full_page+0x578/0xb60
  blkdev_readpage+0x18/0x20
  do_read_cache_folio+0x290/0x430
  read_cache_page+0x41/0x130
  read_part_sector+0x7a/0x3d0
  read_lba+0x161/0x340
  efi_partition+0x1ce/0xdd0
  bdev_disk_changed+0x2e9/0x6a0
  blkdev_get_whole+0xd5/0x140
  blkdev_get_by_dev.part.0+0x37f/0x570
  blkdev_get_by_dev+0x51/0x60
  blkdev_open+0xa4/0x140
  do_dentry_open+0x2a7/0x6d0
  vfs_open+0x58/0x60
  path_openat+0x77e/0x13f0
  do_filp_open+0x154/0x280
  do_sys_openat2+0x119/0x2c0
  __x64_sys_openat+0xe7/0x160
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

--

bfq_bio_bfqg+0x65/0xf0:

bfq_bio_bfqg at block/bfq-cgroup.c:619
 614 		struct blkcg_gq *blkg = bio->bi_blkg;
 615 		struct bfq_group *bfqg;
 616 	
 617 		while (blkg) {
 618 			bfqg = blkg_to_bfqg(blkg);
>619<			if (bfqg->online) {
 620 				bio_associate_blkg_from_css(bio,
 621 				return bfqg;
 622 			}
 623 			blkg = blkg->parent;
 624 		



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux