Re: NVMe over fabrics Kernel warning with RDMA over RXE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 25, 2018 at 12:59:25PM -0700, Vijay Immanuel wrote:
> On Wed, Jul 25, 2018 at 09:11:34AM +0000, Nalla, Pradeep wrote:
> > Hello Vijay
> > 
> > 
> > Thanks for the help it worked. "fio" random write is working fine.
> > 
> > But now I am facing following error when I tried to do sequential write
> > 
> >
> 
> Hi Pradeep,
> 
> Looks like you are running into packet drop and error recovery. I have a couple of patches in review that may help. Please pick up the following three patches from this mailing list and retry the test:
> [PATCH] IB/rxe: avoid back-to-back retries
> [PATCH] IB/rxe: fixes for rdma read retry
> [PATCH] IB/rxe: fix for duplicate request processing and ack psns
> 

Hi Pradeep,

Did the patches help? Were you able to get sequential writes to work?

Thanks,

Vijay

> > fio --filename=/dev/nvme0n1 --rw=write --ioengine=libaio --direct=1 --blocksize=128K --size=10G --iodepth=32 --group_reporting --name=myjob
> > 
> > ----------------------------------------
> > 
> > [ 1632.960773] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 4.17.0-rc4 #1
> > [ 1632.960774] Hardware name: Dell Inc. PowerEdge T130/06FW8M, BIOS 2.1.4 04/13/2017
> > [ 1632.960775] RIP: 0010:rxe_completer+0xb3b/0xbd0 [rdma_rxe]
> > [ 1632.960776] RSP: 0018:ffff8d05efd03e80 EFLAGS: 00010246
> > [ 1632.960777] RAX: 0000000000000000 RBX: ffff8d051d3dfa28 RCX: ffff8d05d7eda000
> > [ 1632.960777] RDX: ffffffffc0deda67 RSI: 0000000000000002 RDI: 0000000000000008
> > [ 1632.960778] RBP: ffffb109c3600580 R08: 00000000000000dd R09: 0000000000000020
> > [ 1632.960778] R10: 0000012800000000 R11: 0000000000037400 R12: 0000000000000000
> > [ 1632.960779] R13: 000000000000000c R14: ffff8d051d3dfa00 R15: ffff8d051ce8a040
> > [ 1632.960779] FS:  0000000000000000(0000) GS:ffff8d05efd00000(0000) knlGS:0000000000000000
> > [ 1632.960780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1632.960781] CR2: 0000560cc00e2948 CR3: 00000003f300a003 CR4: 00000000003606e0
> > [ 1632.960782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 1632.960782] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 1632.960782] Call Trace:
> > [ 1632.960783]  <IRQ>
> > [ 1632.960786]  rxe_do_task+0x8b/0x100 [rdma_rxe]
> > [ 1632.960788]  tasklet_action_common.isra.20+0xf3/0x100
> > [ 1632.960790]  __do_softirq+0xd2/0x280
> > [ 1632.960791]  irq_exit+0xd5/0xe0
> > [ 1632.960792]  do_IRQ+0x4c/0xd0
> > [ 1632.960793]  common_interrupt+0xf/0xf
> > [ 1632.960794]  </IRQ>
> > [ 1632.960795] RIP: 0010:cpuidle_enter_state+0xd9/0x260
> > [ 1632.960796] RSP: 0018:ffffb109c195be88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd5
> > [ 1632.960797] RAX: ffff8d05efd22500 RBX: ffff8d05efd2bb00 RCX: 000000000000001f
> > [ 1632.960797] RDX: 0000000000000000 RSI: ffffffe8710e09a3 RDI: 0000000000000000
> > [ 1632.960798] RBP: 0000000000000001 R08: 0000000000000004 R09: 00000000ffffffff
> > [ 1632.960798] R10: 0000000000000032 R11: 0000000000000008 R12: 0000000000000004
> > [ 1632.960798] R13: 0000017c3406fe48 R14: 0000000000000004 R15: 0000017c340749ac
> > [ 1632.960800]  ? cpuidle_enter_state+0xc7/0x260
> > [ 1632.960802]  do_idle+0x1d8/0x280
> > [ 1632.960803]  cpu_startup_entry+0x6f/0x80
> > [ 1632.960805]  start_secondary+0x1a5/0x200
> > [ 1632.960806]  secondary_startup_64+0xa5/0xb0
> > 
> > I am doing this testing to find out the bottlenecks in the soft roce performance.
> > 
> > Thanks
> > 
> > Pradeep.
> > 
> > ________________________________
> > From: Vijay Immanuel <vijayi@xxxxxxxxxxxxxxxxx>
> > Sent: Wednesday, July 25, 2018 1:18:23 AM
> > To: Nalla, Pradeep
> > Cc: linux-rdma@xxxxxxxxxxxxxxx
> > Subject: Re: NVMe over fabrics Kernel warning with RDMA over RXE
> > 
> > External Email
> > 
> > On Tue, Jul 24, 2018 at 03:14:05PM +0000, Nalla, Pradeep wrote:
> > > Hi all,
> > >
> > >
> > > I am testing NVMe over fabrics on linux-4.17.0-rc4 (on CentOS Linux release 7.4) with Soft RoCE as transport.
> > >
> > > I was using to nvme-cli to connect to the NVMe target over fabrics. Was successful in connecting and listing the device.
> > >
> > > ./nvme connect -t rdma -n testsubsystem -a 15.15.15.2 -s 4420
> > > ./nvme list
> > > Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
> > > ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> > > /dev/nvme0n1     be76ebbcf555a121     Linux                                    1         400.09  GB / 400.09  GB    512   B +  0 B   4.17.0-r
> > >
> > >
> > > But when I use "fio" to do random write to the NVMe device I see a kernel warning and after some time the target server is in accessible.
> > >
> > > fio --filename=/dev/nvme0n1 --ioengine=libaio --direct=1 --norandommap --randrepeat=0 --runtime=600 --blocksize=4K --rw=randwrite --iodepth=32 --numjobs=8 --group_reporting --name=myjob
> > >
> > >
> > > -----------------------------------------------------------------------------
> > >
> > > Jul 24 20:08:54 compute-559 kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.17.0-rc4 #1
> > > Jul 24 20:08:54 compute-559 kernel: Hardware name: Dell Inc. PowerEdge T130/06FW8M, BIOS 2.1.4 04/13/2017
> > > Jul 24 20:08:54 compute-559 kernel: RIP: 0010:__local_bh_enable_ip+0x35/0x60
> > > Jul 24 20:08:54 compute-559 kernel: RSP: 0018:ffff9889afd43a78 EFLAGS: 00010006
> > > Jul 24 20:08:54 compute-559 kernel: RAX: 0000000080010200 RBX: ffff98898e80aa08 RCX: 0000000000000000
> > > Jul 24 20:08:54 compute-559 kernel: RDX: 000000000000003c RSI: 0000000000000200 RDI: ffffffffc015bbb2
> > > Jul 24 20:08:54 compute-559 kernel: RBP: ffff98899b44fc1e R08: 0000000000000001 R09: ffff98899a892a00
> > > Jul 24 20:08:54 compute-559 kernel: R10: ffff9889977163c0 R11: ffffffffc09d1300 R12: ffff98898e80aa78
> > > Jul 24 20:08:54 compute-559 kernel: R13: ffffffffc0160618 R14: ffff9888fdfe1d00 R15: ffff98898f702000
> > > Jul 24 20:08:54 compute-559 kernel: FS:  0000000000000000(0000) GS:ffff9889afd40000(0000) knlGS:0000000000000000
> > > Jul 24 20:08:54 compute-559 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > Jul 24 20:08:54 compute-559 kernel: CR2: 00007f2c7b5565b0 CR3: 00000003be00a001 CR4: 00000000003606e0
> > > Jul 24 20:08:54 compute-559 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > Jul 24 20:08:54 compute-559 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Jul 24 20:08:54 compute-559 kernel: Call Trace:
> > > Jul 24 20:08:54 compute-559 kernel: <IRQ>
> > > Jul 24 20:08:54 compute-559 kernel: ipt_do_table+0x34e/0x650 [ip_tables]
> > > Jul 24 20:08:54 compute-559 kernel: ? unwind_get_return_address+0x1c/0x30
> > > Jul 24 20:08:54 compute-559 kernel: ? __save_stack_trace+0x75/0x100
> > > Jul 24 20:08:54 compute-559 kernel: ? nf_ct_get_tuple+0x61/0xa0 [nf_conntrack]
> > > Jul 24 20:08:54 compute-559 kernel: ? udp_packet+0x79/0x80 [nf_conntrack]
> > > Jul 24 20:08:54 compute-559 kernel: ? nf_conntrack_in+0x1ba/0x540 [nf_conntrack]
> > > Jul 24 20:08:54 compute-559 kernel: iptable_mangle_hook+0x7d/0xf0 [iptable_mangle]
> > > Jul 24 20:08:54 compute-559 kernel: nf_hook_slow+0x3d/0xb0
> > > Jul 24 20:08:54 compute-559 kernel: __ip_local_out+0xf6/0x120
> > > Jul 24 20:08:54 compute-559 kernel: ? neigh_key_eq32+0x10/0x10
> > > Jul 24 20:08:54 compute-559 kernel: ip_local_out+0x17/0x40
> > > Jul 24 20:08:54 compute-559 kernel: rxe_send+0x9a/0x110 [rdma_rxe]
> > > Jul 24 20:08:54 compute-559 kernel: rxe_requester+0x97e/0x11f0 [rdma_rxe]
> > > Jul 24 20:08:54 compute-559 kernel: rxe_do_task+0x8b/0x100 [rdma_rxe]
> > > Jul 24 20:08:54 compute-559 kernel: rxe_post_send+0x3f4/0x550 [rdma_rxe]
> > > Jul 24 20:08:54 compute-559 kernel: nvmet_rdma_queue_response+0xeb/0x1a0 [nvmet_rdma]
> > > Jul 24 20:08:54 compute-559 kernel: ? i40e_clean_rx_irq+0x3b5/0xcf0 [i40e]
> > > Jul 24 20:08:54 compute-559 kernel: nvmet_req_complete+0x11/0x40 [nvmet]
> > > Jul 24 20:08:54 compute-559 kernel: nvmet_bio_done+0x2b/0x40 [nvmet]
> > > Jul 24 20:08:54 compute-559 kernel: blk_update_request+0x95/0x2f0
> > > Jul 24 20:08:54 compute-559 kernel: blk_mq_end_request+0x1a/0xc0
> > > Jul 24 20:08:54 compute-559 kernel: blk_mq_complete_request+0xa1/0x110
> > > Jul 24 20:08:54 compute-559 kernel: nvme_irq+0x12f/0x1e0 [nvme]
> > > Jul 24 20:08:54 compute-559 kernel: __handle_irq_event_percpu+0x40/0x1a0
> > > Jul 24 20:08:54 compute-559 kernel: handle_irq_event_percpu+0x30/0x70
> > > Jul 24 20:08:54 compute-559 kernel: handle_irq_event+0x36/0x60
> > > Jul 24 20:08:54 compute-559 kernel: handle_edge_irq+0x90/0x190
> > > Jul 24 20:08:54 compute-559 kernel: handle_irq+0xb1/0x130
> > > Jul 24 20:08:54 compute-559 kernel: ? tick_irq_enter+0x9c/0xb0
> > > Jul 24 20:08:54 compute-559 kernel: do_IRQ+0x43/0xd0
> > > Jul 24 20:08:54 compute-559 kernel: common_interrupt+0xf/0xf
> > > ----------------------------------------------------
> > >
> > >
> > > Please any one of you let me know a way out.
> > >
> > > Thanks for the support
> > >
> > >
> > > Regards,
> > >
> > > Pradeep.
> > 
> > You'll need commit <1661d3b0e2183ce90f6611641c350a5aa02aaa80>. Please upgrade to v4.17 and retry.
> > 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux