Re: v4.16-rc1 + dm-mpath + BFQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+Jens, Mike

> Il giorno 30 mar 2018, alle ore 01:16, Bart Van Assche <Bart.VanAssche@xxxxxxx> ha scritto:
> 
> On Thu, 2018-03-29 at 11:02 +0200, Paolo Valente wrote:
>>> Il giorno 01 mar 2018, alle ore 02:35, Bart Van Assche <bart.vanassche@xxxxxxx> ha scritto:
>>> Thank you for having shared your kernel config off-list. After having
>>> made the following changes to your kernel config I was able to run the
>>> srp-test software:
>>> * Enable CONFIG_DM_MULTIPATH_QL, CONFIG_DM_MULTIPATH_ST,
>>> CONFIG_SCSI_DH_RDAC, CONFIG_SCSI_DH_EMC and CONFIG_SCSI_DH_ALUA.
>>> * Disable CONFIG_KASAN. Apparently there is an incompatibility between the
>>> rdma_rxe driver and KASAN. I'm still analyzing this.
>>> 
>>> Please let me know whether these changes also allow you to run the srp-test
>>> software and whether you can reproduce what I reported at the start of this
>>> e-mail thread.
>>> 
>> 
>> Thanks for these new directives and sorry for my long delay.  I've
>> modified the config as per your suggestions (you can find my new
>> config attached), and retried.
>> 
>> Unfortunately, same failure:
>> $ sudo ./run_tests -c -d -r 10 -t 02-mq -e bfq
>> Unloaded the ib_srpt kernel module
>> Unloaded the rdma_rxe kernel module
>> SoftRoCE network interfaces: rxe0
>> Zero-initializing /dev/ram0 ... done
>> Zero-initializing /dev/ram1 ... done
>> mkdir: impossibile creare la directory "021c:42ff:fe4c:fac9": Invalid argument
>> Retrying with old port name format
>> mkdir: impossibile creare la directory "0xfe80000000000000021c42fffe4cfac9": Invalid argument
> 
> Hello Paolo,
> 

Hi

> With your kernel config and I/O scheduler "none" srp-test runs reliably
> on my test setup.

I tried with none too, but:
$ sudo ./run_tests -c -d -r 10 -t 02-mq -e none
[sudo] password di paolo: 
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
SoftRoCE network interfaces: rxe0
insmod: ERROR: could not insert module /lib/modules/4.16.0-rc1+/kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko: File exists

> The result for the BFQ scheduler is available below.


Thanks for pasting it.

According to the stack trace, the cause of the problem may still be
some missing initialization in request cloning, like the one I
reported [1], a thread that you initiated as a consequence of a
failure rather similar to the present one.

Mike and Jens took care of solving that issue (which had more general
implications than just driving BFQ crazy).  Unfortunately I can't
remember how that story ended, and I got somehow lost among threads
while trying to reconstruct it.

Mike, Jens, I guess you ended up making a fix; if so, do you have any
idea about how your fix relates to this new (?) issue.  This one
occurs after an end_clone_request, instead of a dm_mq_queue_rq, like
the previous one did.  Or, more in general, does this issue ring any
bell?

[1] https://www.spinics.net/lists/dm-devel/msg32088.html

> If
> the srp-test software did not start on your setup I assume that you are
> using another kernel version? Which kernel version did you use?
> 

Still 4.16-rc1, being that the version for which you reported this
issue in the first place.


Thanks,
Paolo

> Thanks,
> 
> Bart.
> 
> 
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000200
> IP: rb_erase+0x284/0x380
> PGD 0 P4D 0 
> Oops: 0002 [#1] SMP PTI
> Modules linked in: ib_srp libcrc32c scsi_transport_srp ib_srpt target_core_iblock target_core_mod rdma_cm iw_cm ib_cm scsi_debug brd rdma_rxe ip6_udp_tunnel udp_tunnel ib_umad ib_uverbs ib_core
> kyber_iosched bfq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw virtio_balloon virtio_console multipath virtio_net virtio_blk virtio_scsi ata_generic crc32c_intel virtio_pci virtio_ring
> virtio pata_acpi [last unloaded: ip6_udp_tunnel]
> CPU: 3 PID: 28 Comm: ksoftirqd/3 Not tainted 4.16.0-rc7-dbg+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> RIP: 0010:rb_erase+0x284/0x380
> RSP: 0000:ffffa5ad0040f908 EFLAGS: 00010206
> RAX: ffffde9f81e9b700 RBX: ffff9445775b1380 RCX: 0000000000000000
> RDX: ffffde9f81e9b700 RSI: ffff9445652e1380 RDI: ffff9445775b13e0
> RBP: ffffa5ad0040f908 R08: 0000000000000200 R09: 0000000000000002
> R10: 0000000000000001 R11: ffffffffaf25f020 R12: ffff9445775b13e0
> R13: ffff944564376800 R14: ffff944576328000 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff94457fd80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000200 CR3: 000000006b210001 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> elv_rb_del+0x24/0x30
> bfq_remove_request+0x9a/0x2e0 [bfq]
> bfq_finish_requeue_request+0x2e1/0x3b0 [bfq]
> blk_mq_free_request+0x5f/0x1a0
> blk_put_request+0x23/0x60
> multipath_release_clone+0xe/0x10
> dm_softirq_done+0xe3/0x270
> __blk_mq_complete_request+0xfd/0x190
> blk_mq_complete_request+0x69/0xa0
> dm_complete_request+0x22/0x30
> end_clone_request+0x1d/0x20
> __blk_mq_end_request+0x5b/0x70
> scsi_end_request+0xba/0x220
> scsi_io_completion+0x4f1/0x700
> ? scsi_dec_host_busy+0xa6/0x130
> scsi_finish_command+0xef/0x140
> scsi_softirq_done+0x11f/0x170
> __blk_mq_complete_request+0xfd/0x190
> blk_mq_complete_request+0x69/0xa0
> scsi_mq_done+0x34/0x100
> srp_recv_done+0x2f6/0xa40 [ib_srp]
> ? rxe_poll_cq+0x13a/0x150 [rdma_rxe]
> __ib_process_cq+0x83/0xc0 [ib_core]
> ib_poll_handler+0x2b/0x80 [ib_core]
> irq_poll_softirq+0x90/0x140
> __do_softirq+0xcf/0x4b1
> run_ksoftirqd+0x33/0x50
> smpboot_thread_fn+0xfc/0x170
> kthread+0x121/0x140
> ? sort_range+0x30/0x30
> ? kthread_create_worker_on_cpu+0x70/0x70
> ret_from_fork+0x3a/0x50
> Code: 83 e2 01 0f 85 45 fe ff ff 5d c3 4c 89 0e 4d 85 d2 0f 84 28 fe ff ff 48 83 c8 01 48 89 0a 49 89 02 5d c3 4d 85 c0 4c 89 06 74 9c <49> 89 10 5d c3 48 89 0e 5d c3 4d 89 48 10 eb d3 4d 8b 50 08
> 4c 
> RIP: rb_erase+0x284/0x380 RSP: ffffa5ad0040f908
> CR2: 0000000000000200
> ---[ end trace 29e2f703ddaa3232 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x2d000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> ---[ end Kernel panic - not syncing: Fatal exception in interrupt
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux