Re: Bug report - issue with bfq?

Paolo Valente <paolo.valente@xxxxxxxxxx> · Wed, 3 Jan 2018 19:10:57 +0100

> Il giorno 03 gen 2018, alle ore 10:06, Guoqing Jiang <gqjiang@xxxxxxxx> ha scritto:
> 
> 
> 
> On 01/03/2018 03:44 PM, Paolo Valente wrote:
>> 
>>> Il giorno 03 gen 2018, alle ore 04:58, Guoqing Jiang <gqjiang@xxxxxxxx> ha scritto:
>>> 
>>> Hi,
>>> 
>> Hi
>> 
>>> In my test, I found some issues when try bfq with xfs.
>>> The test basically just set the disk's scheduler to bfq,
>>> create xfs on top of it, mount fs and write something,
>>> then umount the fs. After several rounds of iteration,
>>> I can see different calltraces appeared.
>>> 
>>> For example, the one which happened frequently:
>>> 
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Unmounting Filesystem
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Mounting V5 Filesystem
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Ending clean mount
>>> Jan 03 11:35:19 linux-mainline kernel: BUG: unable to handle kernel paging request at 0000000000029ec0
>>> Jan 03 11:35:19 linux-mainline kernel: IP: __mod_node_page_state+0x5/0x50
>>> Jan 03 11:35:19 linux-mainline kernel: PGD 0 P4D 0
>>> Jan 03 11:35:19 linux-mainline kernel: Oops: 0000 [#1] SMP KASAN
>>> Jan 03 11:35:19 linux-mainline kernel: Modules linked in: bfq(E) joydev(E) uinput(E) fuse(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) snd_hda_codec_generic(E) crct10dif_pclmul(E) crc32_pclmul(E) xfs(E) ghash_clmulni_intel(E) libcrc32c(E) crc32c_intel(E) pcbc(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) ppdev(E) aesni_intel(E) snd_timer(E) aes_x86_64(E) crypto_simd(E) snd(E) glue_helper(E) cryptd(E) pcspkr(E) virtio_balloon(E) virtio_net(E) parport_pc(E) parport(E) soundcore(E) i2c_piix4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) virtio_console(E) virtio_rng(E) virtio_blk(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) floppy(E) ehci_pci(E) qxl(E) serio_raw(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) sym53c8xx(E) scsi_transport_spi(E) button(E) libata(E)
>>> Jan 03 11:35:19 linux-mainline kernel:  ttm(E) drm(E) uhci_hcd(E) ehci_hcd(E) usbcore(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
>>> Jan 03 11:35:19 linux-mainline kernel: CPU: 0 PID: 3349 Comm: ps Tainted: G            E    4.15.0-rc1-69-default #1
>>> Jan 03 11:35:19 linux-mainline kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
>>> Jan 03 11:35:19 linux-mainline kernel: task: ffff880061efce80 task.stack: ffff880058bd0000
>>> Jan 03 11:35:19 linux-mainline kernel: RIP: 0010:__mod_node_page_state+0x5/0x50
>>> Jan 03 11:35:19 linux-mainline kernel: RSP: 0018:ffff880058bd7ce8 EFLAGS: 00010a07
>>> Jan 03 11:35:19 linux-mainline kernel: RAX: 00000000000003ff RBX: ffffea00011a3d80 RCX: 00000000011a3d80
>>> Jan 03 11:35:19 linux-mainline kernel: RDX: ffffffffffffffff RSI: 000000000000000d RDI: 0000000000000000
>>> Jan 03 11:35:19 linux-mainline kernel: RBP: ffffffffffffffff R08: ffff88006378a630 R09: ffff880058bd7d98
>>> Jan 03 11:35:19 linux-mainline kernel: R10: 00007f7f4d806280 R11: 0000000000000000 R12: ffffea00011a3d80
>>> Jan 03 11:35:19 linux-mainline kernel: R13: 00007f7f4f318000 R14: 00007f7f4f31c000 R15: ffff880058bd7e18
>>> Jan 03 11:35:19 linux-mainline kernel: FS:  0000000000000000(0000) GS:ffff880066c00000(0000) knlGS:0000000000000000
>>> Jan 03 11:35:19 linux-mainline kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jan 03 11:35:19 linux-mainline kernel: CR2: 0000000000029ec0 CR3: 0000000001c0d006 CR4: 00000000001606f0
>>> Jan 03 11:35:19 linux-mainline kernel: Call Trace:
>>> Jan 03 11:35:19 linux-mainline kernel: page_remove_rmap+0x11a/0x2b0
>>> Jan 03 11:35:19 linux-mainline kernel: unmap_page_range+0x547/0xa30
>>> Jan 03 11:35:19 linux-mainline kernel:  unmap_vmas+0x42/0x90
>>> Jan 03 11:35:19 linux-mainline kernel:  exit_mmap+0x86/0x180
>>> Jan 03 11:35:19 linux-mainline kernel:  mmput+0x4a/0x110
>>> Jan 03 11:35:19 linux-mainline kernel:  do_exit+0x25d/0xae0
>>> Jan 03 11:35:19 linux-mainline kernel:  do_group_exit+0x39/0xa0
>>> Jan 03 11:35:19 linux-mainline kernel:  SyS_exit_group+0x10/0x10
>>> Jan 03 11:35:19 linux-mainline kernel: entry_SYSCALL_64_fastpath+0x1a/0x7d
>>> Jan 03 11:35:19 linux-mainline kernel: RIP: 0033:0x7f7f4eb8c338
>>> Jan 03 11:35:19 linux-mainline kernel: RSP: 002b:00007ffca4400d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>>> Jan 03 11:35:19 linux-mainline kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7f4eb8c338
>>> Jan 03 11:35:19 linux-mainline kernel: RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
>>> Jan 03 11:35:19 linux-mainline kernel: RBP: 00007ffca4400d40 R08: 00000000000000e7 R09: fffffffffffffef8
>>> Jan 03 11:35:19 linux-mainline kernel: R10: 00007f7f4d806280 R11: 0000000000000246 R12: 00007f7f4f756000
>>> Jan 03 11:35:19 linux-mainline kernel: R13: 00007ffca4400cc8 R14: 00007f7f4f732b20 R15: 00007f7f4d5ebc70
>>> Jan 03 11:35:19 linux-mainline kernel: Code: f7 d9 48 39 ca 7c 05 65 88 50 0f c3 f0 48 01 94 f7 00 05 00 00 f0 48 01 14 f5 c0 c4 c0 81 31 d2 eb e5 0f 1f 40 00 0f 1f 44 00 00 <48> 8b 8f c0 9e 02 00 89 f6 48 8d 04 31 65 44 8a 40 01 4d 0f be
>>> Jan 03 11:35:19 linux-mainline kernel: RIP: __mod_node_page_state+0x5/0x50 RSP: ffff880058bd7ce8
>>> Jan 03 11:35:19 linux-mainline kernel: CR2: 0000000000029ec0
>>> Jan 03 11:35:19 linux-mainline kernel: ---[ end trace b5314eeef943a473 ]---
>>> Jan 03 11:35:19 linux-mainline kernel: Fixing recursive fault but reboot is needed!
>>> Jan 03 11:35:19 linux-mainline kernel: XFS (vdd): Unmounting Filesystem
>>> 
>>> 
>> Yes, this call trace may be related with bfq, because it concerns
>> (non-bfq) code that may get executed only when bfq is set as I/O
>> scheduler.  In fact, the failure happens on an exit_group, and bfq
>> supports cgroups, while either mq-deadline or kyber don't.
>> 
>> Did you also check what __mod_node_page_state+0x5 corresponds to in
>> your sources?  Maybe this piece of information could ring some bell, at
>> least for people more expert than me on the involved code.
> 
> 
> Seems it is a null dereferenced since RDI is 0000000000000000.
> 
> (gdb) l *__mod_node_page_state+0x5
> 0x385 is in __mod_node_page_state (/usr/src/kernels/4.15.0-rc1-69-default/mm/vmstat.c:338).
> 333    EXPORT_SYMBOL(__mod_zone_page_state);
> 334
> 335    void __mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item,
> 336                    long delta)
> 337    {
> 338        struct per_cpu_nodestat __percpu *pcp = pgdat->per_cpu_nodestats;
> 339        s8 __percpu *p = pcp->vm_node_stat_diff + item;
> 340        long x;
> 341        long t;
> 342
> 
> Let me know if you need anything else.
> 

This failure seems related to the mm data structures of the task.
Unfortunately, I have no idea of how a mistake in bfq could corrupt
such unrelated structures; unless bfq contains some serious error,
corrupting unrelated memory areas.

I hope someone can provide better insights,
Paolo

>>> Occasionally mount process hangs forever.
>>> linux-mainline:~ # cat /proc/19627/stack
>>> [<ffffffff810a65f2>] io_schedule+0x12/0x40
>>> [<ffffffff8119fbb7>] wait_on_page_bit+0xd7/0x100
>>> [<ffffffff811b3713>] truncate_inode_pages_range+0x423/0x7c0
>>> [<ffffffff81273768>] set_blocksize+0x98/0xb0
>>> [<ffffffff81273798>] sb_set_blocksize+0x18/0x40
>>> [<ffffffffa06a2e58>] xfs_fs_fill_super+0x1b8/0x590 [xfs]
>>> [<ffffffff8123bd4d>] mount_bdev+0x17d/0x1b0
>>> [<ffffffff8123c6d4>] mount_fs+0x34/0x150
>>> [<ffffffff81259702>] vfs_kern_mount+0x62/0x110
>>> [<ffffffff8125bd1a>] do_mount+0x1ca/0xc30
>>> [<ffffffff8125ca6e>] SyS_mount+0x7e/0xd0
>>> [<ffffffff8172fff3>] entry_SYSCALL_64_fastpath+0x1a/0x7d
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>> Maybe this hang has to do with the one already recently reported for
>> USB drives.  We have already found the cause of that one, and are
>> finalizing our fix.
> 
> I am not sure it is the same one since the test is run against virtio disk
> not usb, but anyway I can try the fix.
> 
> Thanks,
> Guoqing