Re: [bisected] bfq regression on latest linux-block/for-next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> Il giorno 20 apr 2021, alle ore 04:00, Yi Zhang <yi.zhang@xxxxxxxxxx> ha scritto:
> 
> 
> On Wed, Apr 7, 2021 at 11:15 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
> 
> 
> On 4/2/21 9:39 PM, Paolo Valente wrote:
> >
> >> Il giorno 1 apr 2021, alle ore 03:27, Yi Zhang <yi.zhang@xxxxxxxxxx> ha scritto:
> >>
> >> Hi
> > Hi
> >
> >> We reproduced this bfq regression[3] on ppc64le with blktests[2] on the latest linux-block/for-next branch, seems it was introduced with [1] from my bisecting, pls help check it. Let me know if you need any testing for it, thanks.
> >>
> > Thanks for reporting this bug and finding the candidate offending commit. Could you try this test with my dev kernel, which might provide more information? The kernel is here:
> > https://github.com/Algodev-github/bfq-mq
> >
> > Alternatively, I could try to provide you with patches to instrument your kernel.
> HI Paolo
> I tried your dev kernel, but with no luck to reproduce it, could you 
> provide the debug patch based on latest linux-block/for-next?
> 
> Hi Paolo
> This issue has been consistently reproduced with LTP/fstests/blktests on recent linux-block/for-next, do you have a chance to check it?

Hi Yi, all,
I've been working hard to port my code-instrumentation layer to the kernel in for-next. I seem I finished the porting yesterday. I tested it but the system crashed. I'm going to analyze the oops. Maybe this freeze is caused by mistakes in this layer, maybe the instrumentation is already detecting a bug. In the first case, I'll fix the mistakes and try the tests suggested in this thread.

Thanks,
Paolo

> 
> LTP
> [ 7223.279313] LTP: starting dio30 (diotest6 -b 65536 -n 100 -i 100 -o 1024000) 
> [ 7320.660837] ------------[ cut here ]------------ 
> [ 7320.660862] kernel BUG at mm/slub.c:314! 
> [ 7320.660872] Oops: Exception in kernel mode, sig: 5 [#1] 
> [ 7320.660882] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries 
> [ 7320.660894] Modules linked in: dummy veth minix binfmt_misc can_raw can nfsv3 nfs_acl nfs lockd grace nfs_ssc fscache rds tun brd overlay fuse exfat vfat fat loop n_gsm pps_ldisc ppp_synctty mkiss ax25 ppp_async ppp_generic slcan slip slhc snd_hrtimer snd_seq snd_seq_device sctp ip6_udp_tunnel udp_tunnel snd_timer snd soundcore authenc pcrypt crypto_user sha3_generic n_hdlc rfkill sunrpc ibmveth pseries_rng crct10dif_vpmsum drm drm_panel_orientation_quirks i2c_core zram ip_tables xfs ibmvscsi scsi_transport_srp vmx_crypto crc32c_vpmsum [last unloaded: ltp_insmod01] 
> [ 7320.661052] CPU: 7 PID: 145059 Comm: diotest6 Tainted: G           OE     5.12.0-rc7 #1 
> [ 7320.661067] NIP:  c00000000049b9b8 LR: c0000000004a161c CTR: c000000000935870 
> [ 7320.661079] REGS: c0000000a9ae2f60 TRAP: 0700   Tainted: G           OE      (5.12.0-rc7) 
> [ 7320.661091] MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44008242  XER: 20040000 
> [ 7320.661118] CFAR: c0000000004a1618 IRQMASK: 1  
> [ 7320.661118] GPR00: c0000000004a161c c0000000a9ae3200 c000000001f10400 c000000007fdd500  
> [ 7320.661118] GPR04: c00c00000005a700 c0000000169c59b0 c0000000169c5ac8 0000000000000001  
> [ 7320.661118] GPR08: 0000000000000118 0000000000000001 c00c0000000586c0 0000000000000000  
> [ 7320.661118] GPR12: 0000000000008000 c00000001eca6280 0000000000000000 0000000000000000  
> [ 7320.661118] GPR16: 0000000002cba000 0000000000140000 0000000000000000 fffffffffffff77f  
> [ 7320.661118] GPR20: c0000007e4042780 0000000000000001 c0000000169c59b0 0000000000210d00  
> [ 7320.661118] GPR24: c0000000169c59b0 0000000000000001 0000000000000001 c000000007fdd500  
> [ 7320.661118] GPR28: c0000000169c59b0 000000008075002e c0000000169c59b0 c00c00000005a700  
> [ 7320.661254] NIP [c00000000049b9b8] __slab_free+0x98/0x600 
> [ 7320.661268] LR [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
> [ 7320.661281] Call Trace: 
> [ 7320.661286] [c0000000a9ae3200] [0000000000000005] 0x5 (unreliable) 
> [ 7320.661301] [c0000000a9ae32c0] [c0000000004a161c] kmem_cache_free+0x25c/0x5e0 
> [ 7320.661316] [c0000000a9ae3380] [c0000000009329f8] bfq_put_queue+0x1d8/0x2d0 
> [ 7320.661330] [c0000000a9ae3420] [c0000000009350ec] bfq_init_rq+0x72c/0xc00 
> [ 7320.661344] [c0000000a9ae3500] [c000000000935aa4] bfq_insert_requests+0x234/0x1600 
> [ 7320.661359] [c0000000a9ae3650] [c0000000008fb264] blk_mq_sched_insert_requests+0xb4/0x1f0 
> [ 7320.661374] [c0000000a9ae36a0] [c0000000008f32c8] blk_mq_flush_plug_list+0x138/0x1f0 
> [ 7320.661390] [c0000000a9ae3710] [c0000000008e1538] blk_flush_plug_list+0x48/0x64 
> [ 7320.661406] [c0000000a9ae3740] [c0000000008f3b1c] blk_mq_submit_bio+0x38c/0x710 
> [ 7320.661421] [c0000000a9ae37e0] [c0000000008e0a94] submit_bio_noacct+0x3d4/0x680 
> [ 7320.661436] [c0000000a9ae3880] [c0000000005e1048] iomap_dio_submit_bio+0xd8/0x100 
> [ 7320.661452] [c0000000a9ae38b0] [c0000000005e1b48] iomap_dio_bio_actor+0x2f8/0x570 
> [ 7320.661467] [c0000000a9ae3950] [c0000000005dbfc4] iomap_apply+0x1f4/0x3e0 
> [ 7320.661481] [c0000000a9ae3a40] [c0000000005e1424] __iomap_dio_rw+0x204/0x5b0 
> [ 7320.661495] [c0000000a9ae3b10] [c0000000005e17f0] iomap_dio_rw+0x20/0x80 
> [ 7320.661509] [c0000000a9ae3b30] [c008000001cb2c34] xfs_file_dio_read+0x11c/0x1a0 [xfs] 
> [ 7320.661694] [c0000000a9ae3b70] [c008000001cb2dd4] xfs_file_read_iter+0x11c/0x170 [xfs] 
> [ 7320.661876] [c0000000a9ae3bb0] [c0000000004f8900] do_iter_readv_writev+0x130/0x240 
> [ 7320.661893] [c0000000a9ae3c10] [c0000000004f935c] do_iter_read+0x14c/0x2b0 
> [ 7320.661906] [c0000000a9ae3c60] [c0000000004f9568] vfs_readv+0x78/0xb0 
> [ 7320.661918] [c0000000a9ae3d50] [c0000000004f9630] do_readv+0x90/0x1a0 
> [ 7320.661931] [c0000000a9ae3db0] [c00000000002da24] system_call_exception+0x104/0x270 
> [ 7320.661947] [c0000000a9ae3e10] [c00000000000d25c] system_call_common+0xec/0x278 
> [ 7320.661961] --- interrupt: c00 at 0x7fff8699e724 
> [ 7320.661970] NIP:  00007fff8699e724 LR: 00000000100051f0 CTR: 0000000000000000 
> [ 7320.661982] REGS: c0000000a9ae3e80 TRAP: 0c00   Tainted: G           OE      (5.12.0-rc7) 
> [ 7320.661994] MSR:  800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE>  CR: 24002444  XER: 00000000 
> [ 7320.662020] IRQMASK: 0  
> [ 7320.662020] GPR00: 0000000000000091 00007fffc2bb1e90 00007fff86a87100 0000000000000004  
> [ 7320.662020] GPR04: 000001001f590470 0000000000000014 0000000000000000 0000000000000000  
> [ 7320.662020] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 7320.662020] GPR12: 0000000000000000 00007fff86b7a990 0000000000000000 0000000000000000  
> [ 7320.662020] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 7320.662020] GPR20: 0000000000000000 0000000000000022 0000000000000000 0000000000000022  
> [ 7320.662020] GPR24: 0000000000000004 0000000000000003 000001001f590470 000001001f5905c0  
> [ 7320.662020] GPR28: 00000000100505c8 0000000000010000 0000000002b7a000 0000000000000014  
> [ 7320.662149] NIP [00007fff8699e724] 0x7fff8699e724 
> [ 7320.662158] LR [00000000100051f0] 0x100051f0 
> [ 7320.662167] --- interrupt: c00 
> [ 7320.662174] Instruction dump: 
> [ 7320.662182] fbc100b0 3ee00021 7f390734 62f70d00 3b400001 ebdf0020 811b0028 ebbf0028  
> [ 7320.662205] 7fc9e278 7cdc4214 7d290074 7929d182 <0b090000> 78c50022 54c7c03e e93b00b8  
> [ 7320.662229] ---[ end trace 817605fc936991f3 ]--- 
> [ 7320.666013] 
> 
>  fstests
> [ 1271.938817] run fstests generic/421 at 2021-04-16 16:03:05 
> [ 1272.963661] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
> [ 1272.969886] xfs_io (pid 192274) is setting deprecated v1 encryption policy; recommend upgrading to v2. 
> [ 1273.764719] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: acl,user_xattr. Quota mode: none. 
> [ 1274.735175] Kernel attempted to read user page (30) - exploit attempt? (uid: 0) 
> [ 1274.735210] BUG: Kernel NULL pointer dereference on read at 0x00000030 
> [ 1274.735223] Faulting instruction address: 0xc00000000092cc78 
> [ 1274.735246] Oops: Kernel access of bad area, sig: 11 [#1] 
> [ 1274.735257] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV 
> [ 1274.735273] Modules linked in: dm_flakey rfkill i40iw ib_uverbs ib_core sunrpc joydev ses at24 enclosure i40e scsi_transport_sas regmap_i2c ofpart ipmi_powernv ipmi_devintf powernv_flash rtc_opal ipmi_msghandler crct10dif_vpmsum mtd opal_prd i2c_opal fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm vmx_crypto crc32c_vpmsum i2c_core aacraid drm_panel_orientation_quirks 
> [ 1274.735391] CPU: 48 PID: 192676 Comm: xfs_io Not tainted 5.12.0-rc7 #1 
> [ 1274.735406] NIP:  c00000000092cc78 LR: c000000000933c90 CTR: c000000000026570 
> [ 1274.735430] REGS: c00000002e13f340 TRAP: 0300   Not tainted  (5.12.0-rc7) 
> [ 1274.735443] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84024800  XER: 20040000 
> [ 1274.735478] CFAR: c00000000092cc58 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 1  
> [ 1274.735478] GPR00: c000000000933c90 c00000002e13f5e0 c000000001f10400 c000000050803000  
> [ 1274.735478] GPR04: c000000050806a60 0000000002045800 c00000002e13f688 0000000000000000  
> [ 1274.735478] GPR08: 0000000000000002 c000000069474ef8 0000000000000000 0000000100017cbc  
> [ 1274.735478] GPR12: 0000000000002000 c000001ffffbb800 0000000000000000 c0000008438c9ff8  
> [ 1274.735478] GPR16: c0000008438c9e80 0000000000010000 0000000000000000 0000000000080001  
> [ 1274.735478] GPR20: c00000002e13fa40 c000000042b04f60 0000000000060800 c000000069474ec0  
> [ 1274.735478] GPR24: c0000000425ed800 0000000000000000 c000000841f10d20 0000000002045800  
> [ 1274.735478] GPR28: c0000000508031f0 c000000069474ec0 0000000000000000 c000000050803000  
> [ 1274.735694] NIP [c00000000092cc78] bfq_rq_pos_tree_lookup+0x48/0x110 
> [ 1274.735729] LR [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
> [ 1274.735763] Call Trace: 
> [ 1274.735779] [c00000002e13f5e0] [c000000000934ed4] bfq_init_rq+0x534/0xc00 (unreliable) 
> [ 1274.735819] [c00000002e13f650] [c000000000933c90] bfq_setup_cooperator+0x2f0/0x4f0 
> [ 1274.735845] [c00000002e13f6c0] [c000000000935bfc] bfq_insert_requests+0x3ac/0x1600 
> [ 1274.735872] [c00000002e13f810] [c0000000008fb244] blk_mq_sched_insert_requests+0xb4/0x1f0 
> [ 1274.735888] [c00000002e13f860] [c0000000008f32a8] blk_mq_flush_plug_list+0x138/0x1f0 
> [ 1274.735906] [c00000002e13f8d0] [c0000000008de4a8] blk_finish_plug+0x68/0x90 
> [ 1274.735922] [c00000002e13f900] [c0000000003d64e0] read_pages+0x1d0/0x4b0 
> [ 1274.735938] [c00000002e13f980] [c0000000003d6e8c] page_cache_ra_unbounded+0x21c/0x300 
> [ 1274.735954] [c00000002e13fa20] [c0000000003c778c] filemap_get_pages+0x11c/0x7d0 
> [ 1274.735981] [c00000002e13fae0] [c0000000003c7f30] filemap_read+0xf0/0x4a0 
> [ 1274.735997] [c00000002e13fc10] [c00000000063e3ac] ext4_file_read_iter+0x7c/0x270 
> [ 1274.736013] [c00000002e13fc60] [c0000000004f85c0] new_sync_read+0x120/0x190 
> [ 1274.736038] [c00000002e13fd00] [c0000000004fb5a0] vfs_read+0x1d0/0x240 
> [ 1274.736052] [c00000002e13fd50] [c0000000004fb6c4] sys_pread64+0xb4/0xe0 
> [ 1274.736067] [c00000002e13fdb0] [c00000000002da24] system_call_exception+0x104/0x270 
> [ 1274.736084] [c00000002e13fe10] [c00000000000d25c] system_call_common+0xec/0x278 
> [ 1274.736100] --- interrupt: c00 at 0x7fff8d1899c8 
> [ 1274.736111] NIP:  00007fff8d1899c8 LR: 00007fff8d1899a8 CTR: 0000000000000000 
> [ 1274.736125] REGS: c00000002e13fe80 TRAP: 0c00   Not tainted  (5.12.0-rc7) 
> [ 1274.736138] MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24022400  XER: 00000000 
> [ 1274.736178] IRQMASK: 0  
> [ 1274.736178] GPR00: 00000000000000b3 00007fffeb4b8360 00007fff8d1b7f00 0000000000000003  
> [ 1274.736178] GPR04: 000000013a220000 0000000000001000 0000000000200000 00007fff8d256d08  
> [ 1274.736178] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000  
> [ 1274.736178] GPR12: 0000000000000000 00007fff8d25e400 0000000000000000 0000000000000000  
> [ 1274.736178] GPR16: 0000000000000000 00007fff8d0601a8 0000000000000000 0000000000000000  
> [ 1274.736178] GPR20: 0000000000000000 0000000000000000 0000000000000003 0000000000000000  
> [ 1274.736178] GPR24: 0000000000200000 000000010aac3508 0000000000000000 0000000000200000  
> [ 1274.736178] GPR28: 0000000000000000 0000000000000003 0000000000001000 0000000000200000  
> [ 1274.736407] NIP [00007fff8d1899c8] 0x7fff8d1899c8 
> [ 1274.736509] LR [00007fff8d1899a8] 0x7fff8d1899a8 
> [ 1274.736560] --- interrupt: c00 
> [ 1274.736695] Instruction dump: 
> [ 1274.736767] f8010010 f821ff91 e9240000 2c290000 4082001c 480000d4 ebe90008 38890008  
> [ 1274.736885] 2c3f0000 4182002c 7fe9fb78 e9490028 <e94a0030> 7c255040 4181ffe0 4080005c  
> [ 1274.737028] ---[ end trace 25f5fc09dd3c38dd ]--- 
> 
> 
> blktests nvme-tcp/011
> 
> [10384.975371] run blktests nvme/011 at 2021-04-16 16:43:37 
> [10385.017040] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 
> [10385.028953] nvmet_tcp: enabling port 0 (
> 127.0.0.1:4420
> ) 
> [10385.186103] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:688053f17e1241f385cf0632cd1d173a. 
> [10385.200147] nvmet: unhandled identify cns 6 on qid 0 
> [10385.205293] nvme nvme0: creating 128 I/O queues. 
> [10385.224831] nvme nvme0: mapped 128/0/0 default/read/poll queues. 
> [10385.259232] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
> 127.0.0.1:4420
>  
> [10387.700391] ------------[ cut here ]------------ 
> [10387.705007] kernel BUG at mm/slub.c:314! 
> [10387.708920] Internal error: Oops - BUG: 0 [#1] SMP 
> [10387.713701] Modules linked in: nvme_tcp nvme_fabrics nvmet_tcp nvmet nvme nvme_core loop dm_log_writes dm_flakey rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm iw_cm ib_ipoib ib_cm rfkill mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x i2c_smbus coresight_tmc coresight_replicator coresight_tpiu joydev mlx5_core mlxfw acpi_ipmi ipmi_ssif ipmi_devintf ipmi_msghandler coresight_funnel thunderx2_pmu coresight vfat fat fuse zram ip_tables xfs ast i2c_algo_bit drm_vram_helper crct10dif_ce ghash_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded: nvmet] 
> [10387.777601] CPU: 13 PID: 0 Comm: swapper/13 Not tainted 5.12.0-rc7 #1 
> [10387.784030] Hardware name: HPE Apollo 70             /C01_APACHE_MB         , BIOS L50_5.13_1.16 07/29/2020 
> [10387.793756] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=--) 
> [10387.799751] pc : set_freepointer+0x30/0x34 
> [10387.8038 39] lr : kmem_cf fff000886aea300  x24: ffff8000126 aea300 x20: fff00 000000000001 x113: 0000000000000068 x12: ffff8000110e84e8  
> [10387.859102] x11: ffff800012973d90 x10: 000000000000002f  
> [10387.864402] x9 : ffff8000106fd7ac x8 : 0000000000000017  
> [10387.869702] x7 : 0000000000000001 x6 : 000001eca7a44c15  
> [10387.875002] x5 : 00ffffffffffffff x4 : 0000000000000118  
> [10387.880302] x3 : ffff000886aea418 x2 : ffff000886aea300  
> [10387.885602] x1 : ffff000886aea300 x0 : ffff00080814f400  
> [10387.890903] Call trace: 
> [10387.893337]  set_freepointer+0x30/0x34 
> [10387.897075]  bfq_put_queue+0x15c/0x1e4 
> [10387.900815]  bfq_finish_requeue_request+0xdc/0x114 
> [10387.905594]  blk_mq_free_request+0x4c/0x1c0 
> [10387.909767]  __blk_mq_end_request+0x108/0x13c 
> [10387.914111]  scsi_end_request+0xcc/0x1f0 
> [10387.918023]  scsi_io_completion+0x70/0x180 
> [10387.922107]  scsi_finish_command+0x108/0x140 
> [10387.926367]  scsi_softirq_done+0x7c/0x190 
> [10387.930364]  blk_complete_reqs+0x5c/0x74 
> [10387.934277]  blk_done_softirq+0x2c/0x3c 
> [10387.938100]  __do_softirq+0x128/0x388 
> [10387.941752]  __irq_exit_rcu+0x16c/0x174 
> [10387.945578]  irq_exit+0x1c/0x30 
> [10387.948708]  __handle_domain_irq+0x8c/0xec 
> [10387.952794]  gic_handle_irq+0x5c/0xdc 
> [10387.956443]  el1_irq+0xb8/0x140 
> [10387.959573]  arch_cpu_idle+0x18/0x30 
> [10387.963138]  default_idle_call+0x4c/0x160 
> [10387.967137]  cpuidle_idle_call+0x14c/0x1a0 
> [10387.971224]  do_idle+0xb4/0x104 
> [10387.974354]  cpu_startup_entry+0x30/0x9c 
> [10387.978265]  secondary_start_kernel+0xf4/0x120 
> [10387.982699]  0x0 
> [10387.984530] Code: ca000042 ca030042 f8246822 d65f03c0 (d4210000)  
> [10387.990612] ---[ end trace fcc292b65af7c325 ]--- 
> [10387.995218] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt 
> [10388.002601] SMP: stopping secondary CPUs 
> [10389.075031] SMP: failed to stop secondary CPUs 0-12,14-223 
> [10389.080513] Kernel Offset: disabled 
> [10389.083988] CPU features: 0x00046002,63000838 
> [10389.088333] Memory Limit: none 
> [10389.091454] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]--- 
> 





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux