Hi, My system has been failing with recent kernels (4.12.x and 4.13-rc2) with a NULL pointer dereference at the stack trace given at the end of this email. This happens when simply running 'ib_write_bw -R <server>' with a Chelsio T6 (cxgb4). I've bisected (log attached) to find the offending commit to be: commit 1e7710f3f6563940bb6bbc94aa8eadfd344a86af Author: Matan Barak <matanb@xxxxxxxxxxxx> IB/core: Change completion channel to use the reworked objects schema Reverting this commit (and the dependent commits db1b5ddd53365 and e0fcc61113c that also fix other bugs with this commit) from v4.12.3 fixes the issue. I did the bisect with the userspace libraries in Debian Stretch but I also had this bug with rdma-core v14. I was pretty sure v4.12 kernels worked for me in the past but likely only before I upgraded from Jessie to Stretch. Thanks, Logan PS. As a side rant, this bug was found after a very *frustrating* day of what was supposed to be the 20 minute task of getting my RDMA cards plugged in again. I tried with both CX4s and the T6s (and I'm still not sure if my CX4s work yet). Instead, it turns out there's a whole mess of bugs in the kernel I had to go up against. I went back and forth between different versions of the userspace libraries because I was sure 4.11 worked -- but it turned out 4.11.10+, 4.12.x and who knows what other stable kernels are currently broken by the bug fixed in [1]. And there was a whole other bug that broke things that was fixed in the 4.12-rc series that I had to carefully bisect around to find the one reported above. So frustrating!! [1] 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3 -- [ 53.320439] iwpm_register_pid: Unable to send a nlmsg (client = 2) [ 54.738579] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 54.747439] IP: _raw_spin_lock_irqsave+0x10/0x30 [ 54.752719] PGD 0 [ 54.752721] P4D 0 [ 54.755049] [ 54.759109] Oops: 0002 [#1] SMP [ 54.762699] Modules linked in: [ 54.766195] CPU: 0 PID: 5 Comm: kworker/u16:0 Not tainted 4.13.0-rc2.direct #708 [ 54.774536] Hardware name: Supermicro SYS-7047GR-TRF/X9DRG-QF, BIOS 3.0a 12/05/2013 [ 54.783182] Workqueue: iw_cxgb4 process_work [ 54.788036] task: ffff880276a5ee80 task.stack: ffffc900000c4000 [ 54.794728] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30 [ 54.800552] RSP: 0018:ffffc900000c7c70 EFLAGS: 00010046 [ 54.806473] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 54.814524] RDX: 0000000000000001 RSI: 0000000000000058 RDI: 0000000000000058 [ 54.822583] RBP: ffff880470484600 R08: 0000000000000001 R09: 0000000000000001 [ 54.830663] R10: 0000000000000040 R11: ffff88047420b400 R12: 0000000000000282 [ 54.838744] R13: ffffc900000c7dc0 R14: 0000000000000001 R15: ffff880470484600 [ 54.846825] FS: 0000000000000000(0000) GS:ffff880277c00000(0000) knlGS:0000000000000000 [ 54.855997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.862522] CR2: 0000000000000058 CR3: 0000000001e0a000 CR4: 00000000000406f0 [ 54.870602] Call Trace: [ 54.873442] ? ib_uverbs_comp_handler+0x20/0xe0 [ 54.878610] ? flush_qp+0x6e/0x2b0 [ 54.882514] ? c4iw_modify_qp+0x11c2/0x1870 [ 54.887295] ? close_con_rpl+0xe7/0x170 [ 54.891686] ? kfree_skb+0x33/0x90 [ 54.895592] ? skb_dequeue+0x52/0x60 [ 54.899690] ? process_work+0x4a/0x60 [ 54.903887] ? process_one_work+0x1c2/0x3e0 [ 54.908664] ? worker_thread+0x47/0x3d0 [ 54.913056] ? kthread+0xfc/0x130 [ 54.916864] ? create_worker+0x180/0x180 [ 54.921353] ? kthread_create_on_node+0x40/0x40 [ 54.926521] ? ret_from_fork+0x22/0x30 [ 54.930811] Code: c0 74 05 e8 b3 1c 73 ff 48 89 d8 5b c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 05 48 89 d8 5b c3 89 c6 e8 9c 09 73 ff 48 [ 54.952099] RIP: _raw_spin_lock_irqsave+0x10/0x30 RSP: ffffc900000c7c70 [ 54.959598] CR2: 0000000000000058 [ 54.963405] ---[ end trace 896cfe0234c949d2 ]--- [ 102.633421] random: crng init done
git bisect start # good: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11 git bisect good a351e9b9fc24e982ec2f0e76379a49826036da12 # bad: [2ea659a9ef488125eb46da6eb571de5eae5c43f6] Linux 4.12-rc1 git bisect bad 2ea659a9ef488125eb46da6eb571de5eae5c43f6 # good: [221656e7c4ce342b99c31eca96c1cbb6d1dce45f] Merge tag 'sound-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect good 221656e7c4ce342b99c31eca96c1cbb6d1dce45f # bad: [c6a677c6f37bb7abc85ba7e3465e82b9f7eb1d91] Merge tag 'staging-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad c6a677c6f37bb7abc85ba7e3465e82b9f7eb1d91 # bad: [e579dde654fc2c6b0d3e4b77a9a4b2d2405c510e] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect bad e579dde654fc2c6b0d3e4b77a9a4b2d2405c510e # bad: [a96480723c287c502b02659f4b347aecaa651ea1] Merge tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip git bisect bad a96480723c287c502b02659f4b347aecaa651ea1 # good: [16a12fa9aed176444fc795b09e796be41902bb08] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 16a12fa9aed176444fc795b09e796be41902bb08 # bad: [1684096b1ed813f621fb6cbd06e72235c1c2a0ca] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma git bisect bad 1684096b1ed813f621fb6cbd06e72235c1c2a0ca # bad: [e821303c428eedcc20746224d590b11c7000a7e5] iw_cxgb4: Use dsgl by default git bisect bad e821303c428eedcc20746224d590b11c7000a7e5 # bad: [515ed4f3aab4e8a0855d0cdfd9753a419ccfb297] IB/IPoIB: Separate control and data related initializations git bisect bad 515ed4f3aab4e8a0855d0cdfd9753a419ccfb297 # bad: [f7b42633720deb5ca8f4bcb175c7dc2933057e7f] IB/hfi1: Ensure VL index is within bounds git bisect bad f7b42633720deb5ca8f4bcb175c7dc2933057e7f # bad: [8688426ba6464f7079649f52cf9108856c419415] IB/hfi1: Cache registers during state change git bisect bad 8688426ba6464f7079649f52cf9108856c419415 # good: [cf8966b3477d5e6545393bb4499f2051ea554c62] IB/core: Add support for fd objects git bisect good cf8966b3477d5e6545393bb4499f2051ea554c62 # bad: [771a52584096c45e4565e8aabb596eece9d73d61] IB/IPoIB: ibX: failed to create mcg debug file git bisect bad 771a52584096c45e4565e8aabb596eece9d73d61 # bad: [cd6ce4a5737829052abc4ffc8befd0adfff8998d] IB/hns: Explicitly include linux/of.h git bisect bad cd6ce4a5737829052abc4ffc8befd0adfff8998d # bad: [1e7710f3f6563940bb6bbc94aa8eadfd344a86af] IB/core: Change completion channel to use the reworked objects schema git bisect bad 1e7710f3f6563940bb6bbc94aa8eadfd344a86af # first bad commit: [1e7710f3f6563940bb6bbc94aa8eadfd344a86af] IB/core: Change completion channel to use the reworked objects schema