> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@xxxxxxxxxx> ha scritto: > > > On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote: > > > On 4/2/21 9:39 PM, Paolo Valente wrote: > > > >> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@xxxxxxxxxx> ha scritto: > >> > >> Hi > > Hi > > > >> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks. > >> > > Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here: > > https://github.com/Algodev-github/bfq-mq > > > > Alternatively, I could try to provide you with patches to instrument your kernel. > HI Paolo > I tried your dev kernel, but with no luck to reproduce it, could you > provide the debug patch based on latest linux-block/for-next? > > Hi Paolo > This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it? Hi Yi, all, I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread. Thanks, Paolo > > LTP > [ 7223.279313] LTP: starting dio30 (diotest6 -b 65536 -n 100 -i 100 -o 1024000) > [ 7320.660837] ------------[ cut here ]------------ > [ 7320.660862] kernel BUG at mm/slub.c:314! > [ 7320.660872] Oops: Exception in kernel mode, sig: 5 [#1] > [ 7320.660882] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries > [ 7320.660894] Modules linked in: dummy veth minix binfmt_misc can_raw can nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache rds tun brd overlay fuse exfat vfat fat loop n_gsm pps_ldisc ppp_synctty mkiss ax25 ppp_async ppp_generic slcan slip slhc snd_hrtimer snd_seq snd_seq_device sctp ip6_udp_tunnel udp_tunnel snd_timer snd soundcore authenc pcrypt crypto_user sha3_generic n_hdlc rfkill sunrpc ibmveth pseries_rng crct10dif_vpmsum drm drm_panel_orientation_quirks i2c_core zram ip_tables xfs ibmvscsi scsi_transport_srp vmx_crypto crc32c_vpmsum [last unloaded: ltp_insmod01] > [ 7320.661052] CPU: 7 PID: 145059 Comm: diotest6 Tainted: G OE 5.12.0-rc7 #1 > [ 7320.661067] NIP: c00000000049b9b8 LR: c0000000004a161c CTR: c000000000935870 > [ 7320.661079] REGS: c0000000a9ae2f60 TRAP: 0700 Tainted: G OE (5.12.0-rc7) > [ 7320.661091] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 44008242 XER: 20040000 > [ 7320.661118] CFAR: c0000000004a1618 IRQMASK: 1 > [ 7320.661118] GPR00: c0000000004a161c c0000000a9ae3200 c000000001f10400 c000000007fdd500 > [ 7320.661118] GPR04: c00c00000005a700 c0000000169c59b0 c0000000169c5ac8 0000000000000001 > [ 7320.661118] GPR08: 0000000000000118 0000000000000001 c00c0000000586c0 0000000000000000 > [ 7320.661118] GPR12: 0000000000008000 c00000001eca6280 0000000000000000 0000000000000000 > [ 7320.661118] GPR16: 0000000002cba000 0000000000140000 0000000000000000 fffffffffffff77f > [ 7320.661118] GPR20: c0000007e4042780 0000000000000001 c0000000169c59b0 0000000000210d00 > [ 7320.661118] GPR24: c0000000169c59b0 0000000000000001 0000000000000001 c000000007fdd500 > [ 7320.661118] GPR28: c0000000169c59b0 000000008075002e c0000000169c59b0 c00c00000005a700 > [ 7320.661254] NIP [c00000000049b9b8] __slab_free+0x98/0x600 > [ 7320.661268] LR [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 > [ 7320.661281] Call Trace: > [ 7320.661286] [c0000000a9ae3200] [0000000000000005] 0x5 (unreliable) > [ 7320.661301] [c0000000a9ae32c0] [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 > [ 7320.661316] [c0000000a9ae3380] [c0000000009329f8] bfq_put_queue+0x1d8/0x2d0 > [ 7320.661330] [c0000000a9ae3420] [c0000000009350ec] bfq_init_rq+0x72c/0xc00 > [ 7320.661344] [c0000000a9ae3500] [c000000000935aa4] bfq_insert_requests+0x234/0x1600 > [ 7320.661359] [c0000000a9ae3650] [c0000000008fb264] blk_mq_sched_insert_requests+0xb4/0x1f0 > [ 7320.661374] [c0000000a9ae36a0] [c0000000008f32c8] blk_mq_flush_plug_list+0x138/0x1f0 > [ 7320.661390] [c0000000a9ae3710] [c0000000008e1538] blk_flush_plug_list+0x48/0x64 > [ 7320.661406] [c0000000a9ae3740] [c0000000008f3b1c] blk_mq_submit_bio+0x38c/0x710 > [ 7320.661421] [c0000000a9ae37e0] [c0000000008e0a94] submit_bio_noacct+0x3d4/0x680 > [ 7320.661436] [c0000000a9ae3880] [c0000000005e1048] iomap_dio_submit_bio+0xd8/0x100 > [ 7320.661452] [c0000000a9ae38b0] [c0000000005e1b48] iomap_dio_bio_actor+0x2f8/0x570 > [ 7320.661467] [c0000000a9ae3950] [c0000000005dbfc4] iomap_apply+0x1f4/0x3e0 > [ 7320.661481] [c0000000a9ae3a40] [c0000000005e1424] __iomap_dio_rw+0x204/0x5b0 > [ 7320.661495] [c0000000a9ae3b10] [c0000000005e17f0] iomap_dio_rw+0x20/0x80 > [ 7320.661509] [c0000000a9ae3b30] [c008000001cb2c34] xfs_file_dio_read+0x11c/0x1a0 [xfs] > [ 7320.661694] [c0000000a9ae3b70] [c008000001cb2dd4] xfs_file_read_iter+0x11c/0x170 [xfs] > [ 7320.661876] [c0000000a9ae3bb0] [c0000000004f8900] do_iter_readv_writev+0x130/0x240 > [ 7320.661893] [c0000000a9ae3c10] [c0000000004f935c] do_iter_read+0x14c/0x2b0 > [ 7320.661906] [c0000000a9ae3c60] [c0000000004f9568] vfs_readv+0x78/0xb0 > [ 7320.661918] [c0000000a9ae3d50] [c0000000004f9630] do_readv+0x90/0x1a0 > [ 7320.661931] [c0000000a9ae3db0] [c00000000002da24] system_call_exception+0x104/0x270 > [ 7320.661947] [c0000000a9ae3e10] [c00000000000d25c] system_call_common+0xec/0x278 > [ 7320.661961] --- interrupt: c00 at 0x7fff8699e724 > [ 7320.661970] NIP: 00007fff8699e724 LR: 00000000100051f0 CTR: 0000000000000000 > [ 7320.661982] REGS: c0000000a9ae3e80 TRAP: 0c00 Tainted: G OE (5.12.0-rc7) > [ 7320.661994] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR: 24002444 XER: 00000000 > [ 7320.662020] IRQMASK: 0 > [ 7320.662020] GPR00: 0000000000000091 00007fffc2bb1e90 00007fff86a87100 0000000000000004 > [ 7320.662020] GPR04: 000001001f590470 0000000000000014 0000000000000000 0000000000000000 > [ 7320.662020] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 7320.662020] GPR12: 0000000000000000 00007fff86b7a990 0000000000000000 0000000000000000 > [ 7320.662020] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 7320.662020] GPR20: 0000000000000000 0000000000000022 0000000000000000 0000000000000022 > [ 7320.662020] GPR24: 0000000000000004 0000000000000003 000001001f590470 000001001f5905c0 > [ 7320.662020] GPR28: 00000000100505c8 0000000000010000 0000000002b7a000 0000000000000014 > [ 7320.662149] NIP [00007fff8699e724] 0x7fff8699e724 > [ 7320.662158] LR [00000000100051f0] 0x100051f0 > [ 7320.662167] --- interrupt: c00 > [ 7320.662174] Instruction dump: > [ 7320.662182] fbc100b0 3ee00021 7f390734 62f70d00 3b400001 ebdf0020 811b0028 ebbf0028 > [ 7320.662205] 7fc9e278 7cdc4214 7d290074 7929d182 <0b090000> 78c50022 54c7c03e e93b00b8 > [ 7320.662229] ---[ end trace 817605fc936991f3 ]--- > [ 7320.666013] > > fstests > [ 1271.938817] run fstests generic/421 at 2021-04-16 16:03:05 > [ 1272.963661] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. > [ 1272.969886] xfs_io (pid 192274) is setting deprecated v1 encryption policy; recommend upgrading to v2. > [ 1273.764719] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. > [ 1274.735175] Kernel attempted to read user page (30) - exploit attempt? (uid: 0) > [ 1274.735210] BUG: Kernel NULL pointer dereference on read at 0x00000030 > [ 1274.735223] Faulting instruction address: 0xc00000000092cc78 > [ 1274.735246] Oops: Kernel access of bad area, sig: 11 [#1] > [ 1274.735257] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV > [ 1274.735273] Modules linked in: dm_flakey rfkill i40iw ib_uverbs ib_core sunrpc joydev ses at24 enclosure i40e scsi_transport_sas regmap_i2c ofpart ipmi_powernv ipmi_devintf powernv_flash rtc_opal ipmi_msghandler crct10dif_vpmsum mtd opal_prd i2c_opal fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto crc32c_vpmsum i2c_core aacraid drm_panel_orientation_quirks > [ 1274.735391] CPU: 48 PID: 192676 Comm: xfs_io Not tainted 5.12.0-rc7 #1 > [ 1274.735406] NIP: c00000000092cc78 LR: c000000000933c90 CTR: c000000000026570 > [ 1274.735430] REGS: c00000002e13f340 TRAP: 0300 Not tainted (5.12.0-rc7) > [ 1274.735443] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 84024800 XER: 20040000 > [ 1274.735478] CFAR: c00000000092cc58 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 1 > [ 1274.735478] GPR00: c000000000933c90 c00000002e13f5e0 c000000001f10400 c000000050803000 > [ 1274.735478] GPR04: c000000050806a60 0000000002045800 c00000002e13f688 0000000000000000 > [ 1274.735478] GPR08: 0000000000000002 c000000069474ef8 0000000000000000 0000000100017cbc > [ 1274.735478] GPR12: 0000000000002000 c000001ffffbb800 0000000000000000 c0000008438c9ff8 > [ 1274.735478] GPR16: c0000008438c9e80 0000000000010000 0000000000000000 0000000000080001 > [ 1274.735478] GPR20: c00000002e13fa40 c000000042b04f60 0000000000060800 c000000069474ec0 > [ 1274.735478] GPR24: c0000000425ed800 0000000000000000 c000000841f10d20 0000000002045800 > [ 1274.735478] GPR28: c0000000508031f0 c000000069474ec0 0000000000000000 c000000050803000 > [ 1274.735694] NIP [c00000000092cc78] bfq_rq_pos_tree_lookup+0x48/0x110 > [ 1274.735729] LR [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 > [ 1274.735763] Call Trace: > [ 1274.735779] [c00000002e13f5e0] [c000000000934ed4] bfq_init_rq+0x534/0xc00 (unreliable) > [ 1274.735819] [c00000002e13f650] [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 > [ 1274.735845] [c00000002e13f6c0] [c000000000935bfc] bfq_insert_requests+0x3ac/0x1600 > [ 1274.735872] [c00000002e13f810] [c0000000008fb244] blk_mq_sched_insert_requests+0xb4/0x1f0 > [ 1274.735888] [c00000002e13f860] [c0000000008f32a8] blk_mq_flush_plug_list+0x138/0x1f0 > [ 1274.735906] [c00000002e13f8d0] [c0000000008de4a8] blk_finish_plug+0x68/0x90 > [ 1274.735922] [c00000002e13f900] [c0000000003d64e0] read_pages+0x1d0/0x4b0 > [ 1274.735938] [c00000002e13f980] [c0000000003d6e8c] page_cache_ra_unbounded+0x21c/0x300 > [ 1274.735954] [c00000002e13fa20] [c0000000003c778c] filemap_get_pages+0x11c/0x7d0 > [ 1274.735981] [c00000002e13fae0] [c0000000003c7f30] filemap_read+0xf0/0x4a0 > [ 1274.735997] [c00000002e13fc10] [c00000000063e3ac] ext4_file_read_iter+0x7c/0x270 > [ 1274.736013] [c00000002e13fc60] [c0000000004f85c0] new_sync_read+0x120/0x190 > [ 1274.736038] [c00000002e13fd00] [c0000000004fb5a0] vfs_read+0x1d0/0x240 > [ 1274.736052] [c00000002e13fd50] [c0000000004fb6c4] sys_pread64+0xb4/0xe0 > [ 1274.736067] [c00000002e13fdb0] [c00000000002da24] system_call_exception+0x104/0x270 > [ 1274.736084] [c00000002e13fe10] [c00000000000d25c] system_call_common+0xec/0x278 > [ 1274.736100] --- interrupt: c00 at 0x7fff8d1899c8 > [ 1274.736111] NIP: 00007fff8d1899c8 LR: 00007fff8d1899a8 CTR: 0000000000000000 > [ 1274.736125] REGS: c00000002e13fe80 TRAP: 0c00 Not tainted (5.12.0-rc7) > [ 1274.736138] MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24022400 XER: 00000000 > [ 1274.736178] IRQMASK: 0 > [ 1274.736178] GPR00: 00000000000000b3 00007fffeb4b8360 00007fff8d1b7f00 0000000000000003 > [ 1274.736178] GPR04: 000000013a220000 0000000000001000 0000000000200000 00007fff8d256d08 > [ 1274.736178] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 1274.736178] GPR12: 0000000000000000 00007fff8d25e400 0000000000000000 0000000000000000 > [ 1274.736178] GPR16: 0000000000000000 00007fff8d0601a8 0000000000000000 0000000000000000 > [ 1274.736178] GPR20: 0000000000000000 0000000000000000 0000000000000003 0000000000000000 > [ 1274.736178] GPR24: 0000000000200000 000000010aac3508 0000000000000000 0000000000200000 > [ 1274.736178] GPR28: 0000000000000000 0000000000000003 0000000000001000 0000000000200000 > [ 1274.736407] NIP [00007fff8d1899c8] 0x7fff8d1899c8 > [ 1274.736509] LR [00007fff8d1899a8] 0x7fff8d1899a8 > [ 1274.736560] --- interrupt: c00 > [ 1274.736695] Instruction dump: > [ 1274.736767] f8010010 f821ff91 e9240000 2c290000 4082001c 480000d4 ebe90008 38890008 > [ 1274.736885] 2c3f0000 4182002c 7fe9fb78 e9490028 <e94a0030> 7c255040 4181ffe0 4080005c > [ 1274.737028] ---[ end trace 25f5fc09dd3c38dd ]--- > > > blktests nvme-tcp/011 > > [10384.975371] run blktests nvme/011 at 2021-04-16 16:43:37 > [10385.017040] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > [10385.028953] nvmet_tcp: enabling port 0 ( > 127.0.0.1:4420 > ) > [10385.186103] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:688053f17e1241f385cf0632cd1d173a. > [10385.200147] nvmet: unhandled identify cns 6 on qid 0 > [10385.205293] nvme nvme0: creating 128 I/O queues. > [10385.224831] nvme nvme0: mapped 128/0/0 default/read/poll queues. > [10385.259232] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr > 127.0.0.1:4420 > > [10387.700391] ------------[ cut here ]------------ > [10387.705007] kernel BUG at mm/slub.c:314! > [10387.708920] Internal error: Oops - BUG: 0 [#1] SMP > [10387.713701] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet nvme nvme_core loop dm_log_writes dm_flakey rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm rfkill mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x i2c_smbus coresight_tmc coresight_replicator coresight_tpiu joydev mlx5_core mlxfw acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler coresight_funnel thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper crct10dif_ce ghash_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded: nvmet] > [10387.777601] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 5.12.0-rc7 #1 > [10387.784030] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.16 07/29/2020 > [10387.793756] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) > [10387.799751] pc : set_freepointer+0x30/0x34 > [10387.8038 39] lr : kmem_cf fff000886aea300 x24: ffff8000126 aea300 x20: fff00 000000000001 x113: 0000000000000068 x12: ffff8000110e84e8 > [10387.859102] x11: ffff800012973d90 x10: 000000000000002f > [10387.864402] x9 : ffff8000106fd7ac x8 : 0000000000000017 > [10387.869702] x7 : 0000000000000001 x6 : 000001eca7a44c15 > [10387.875002] x5 : 00ffffffffffffff x4 : 0000000000000118 > [10387.880302] x3 : ffff000886aea418 x2 : ffff000886aea300 > [10387.885602] x1 : ffff000886aea300 x0 : ffff00080814f400 > [10387.890903] Call trace: > [10387.893337] set_freepointer+0x30/0x34 > [10387.897075] bfq_put_queue+0x15c/0x1e4 > [10387.900815] bfq_finish_requeue_request+0xdc/0x114 > [10387.905594] blk_mq_free_request+0x4c/0x1c0 > [10387.909767] __blk_mq_end_request+0x108/0x13c > [10387.914111] scsi_end_request+0xcc/0x1f0 > [10387.918023] scsi_io_completion+0x70/0x180 > [10387.922107] scsi_finish_command+0x108/0x140 > [10387.926367] scsi_softirq_done+0x7c/0x190 > [10387.930364] blk_complete_reqs+0x5c/0x74 > [10387.934277] blk_done_softirq+0x2c/0x3c > [10387.938100] __do_softirq+0x128/0x388 > [10387.941752] __irq_exit_rcu+0x16c/0x174 > [10387.945578] irq_exit+0x1c/0x30 > [10387.948708] __handle_domain_irq+0x8c/0xec > [10387.952794] gic_handle_irq+0x5c/0xdc > [10387.956443] el1_irq+0xb8/0x140 > [10387.959573] arch_cpu_idle+0x18/0x30 > [10387.963138] default_idle_call+0x4c/0x160 > [10387.967137] cpuidle_idle_call+0x14c/0x1a0 > [10387.971224] do_idle+0xb4/0x104 > [10387.974354] cpu_startup_entry+0x30/0x9c > [10387.978265] secondary_start_kernel+0xf4/0x120 > [10387.982699] 0x0 > [10387.984530] Code: ca000042 ca030042 f8246822 d65f03c0 (d4210000) > [10387.990612] ---[ end trace fcc292b65af7c325 ]--- > [10387.995218] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt > [10388.002601] SMP: stopping secondary CPUs > [10389.075031] SMP: failed to stop secondary CPUs 0-12,14-223 > [10389.080513] Kernel Offset: disabled > [10389.083988] CPU features: 0x00046002,63000838 > [10389.088333] Memory Limit: none > [10389.091454] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]--- >