On Sun, Mar 10, 2019 at 07:37:56PM +0000, Parav Pandit wrote: > > > > -----Original Message----- > > From: Parav Pandit > > Sent: Sunday, March 10, 2019 2:24 PM > > To: 'Yuval Shaia' <yuval.shaia@xxxxxxxxxx> > > Cc: Bart Van Assche <bvanassche@xxxxxxx>; Ira Weiny > > <ira.weiny@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; Dennis > > Dalessandro <dennis.dalessandro@xxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; > > Marcel Apfelbaum <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib > > <kheib@xxxxxxxxxx> > > Subject: RE: [EXPERIMENTAL v1 0/4] RDMA loopback device > > > > > > > > > -----Original Message----- > > > From: Yuval Shaia <yuval.shaia@xxxxxxxxxx> > > > Sent: Sunday, March 10, 2019 2:23 PM > > > To: Parav Pandit <parav@xxxxxxxxxxxx> > > > Cc: Bart Van Assche <bvanassche@xxxxxxx>; Ira Weiny > > > <ira.weiny@xxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; Dennis > > > Dalessandro <dennis.dalessandro@xxxxxxxxx>; > > > linux-rdma@xxxxxxxxxxxxxxx; Marcel Apfelbaum > > > <marcel.apfelbaum@xxxxxxxxx>; Kamal Heib <kheib@xxxxxxxxxx> > > > Subject: Re: [EXPERIMENTAL v1 0/4] RDMA loopback device > > > > > > > (hint, as a starting point please provide a fix to avoid crash in > > > > memory > > > registration in rxe:-) ). > > > > > > I'm not aware of a crash in memory registration, can you describe the > > > steps to reproduce? > > > > > ib_send_bw -x 1 -d rxe0 -a > > ib_send_bw -x 1 -d rxe0 -a <ip_address> > I did a quick run now on 5.0.0.-rc7, it is not crashing, which used to crash for me on 5.0.0.-rc5. > Seems better now. > > Its running at 1.6Gbps compare to loopback at 50Gbps, but hey we can ignore the 50x performance. :-) No, we can't ignore it - this is a huge motivation to enhance RXE with memcpy!! > > With write bw I hit a hit soft lockup, > kernel:watchdog: BUG: soft lockup - CPU#63 stuck for 22s! [ksoftirqd/63:328] > > kernel: irq event stamp: 354570533 > kernel: hardirqs last enabled at (354570532): [<ffffffff92c23f12>] _raw_read_unlock_irqrestore+0x32/0x60 > kernel: hardirqs last disabled at (354570533): [<ffffffff92403717>] trace_hardirqs_off_thunk+0x1a/0x1c > kernel: softirqs last enabled at (20353810): [<ffffffff93000325>] __do_softirq+0x325/0x3cf > kernel: softirqs last disabled at (20353815): [<ffffffff9249eea5>] run_ksoftirqd+0x35/0x50 > kernel: CPU: 32 PID: 173 Comm: ksoftirqd/32 Kdump: loaded Tainted: G L 5.0.0-rc7-vdevbus+ #2 > kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016 > kernel: rxe_responder+0x941/0x1ff0 [rdma_rxe] > kernel: ? __lock_acquire+0x240/0xf60 > kernel: ? find_held_lock+0x31/0xa0 > kernel: ? find_held_lock+0x31/0xa0 > kernel: ? rxe_do_task+0x7e/0xf0 [rdma_rxe] > kernel: ? _raw_spin_unlock_irqrestore+0x32/0x51 > kernel: rxe_do_task+0x85/0xf0 [rdma_rxe] > kernel: rxe_rcv+0x346/0x840 [rdma_rxe] > kernel: ? copy_data+0x113/0x240 [rdma_rxe] > kernel: rxe_requester+0x7c8/0x1060 [rdma_rxe] > kernel: rxe_do_task+0x85/0xf0 [rdma_rxe] > kernel: tasklet_action_common.isra.19+0x187/0x1a0 > kernel: __do_softirq+0xd0/0x3cf > kernel: run_ksoftirqd+0x35/0x50 > kernel: smpboot_thread_fn+0xfe/0x150 > kernel: kthread+0xf5/0x130 > kernel: ? sort_range+0x20/0x20 > kernel: ? kthread_bind+0x10/0x10 > kernel: ret_from_fork+0x24/0x30 > kernel: rcu: INFO: rcu_sched self-detected stall on CPU > kernel: rcu: #01132-....: (64452 ticks this GP) idle=586/1/0x4000000000000002 softirq=184257/184259 fqs=16251 > kernel: rcu: #011 (t=65008 jiffies g=8870789 q=3260) > kernel: NMI backtrace for cpu 32 > kernel: CPU: 32 PID: 173 Comm: ksoftirqd/32 Kdump: loaded Tainted: G L 5.0.0-rc7-vdevbus+ #2 Is this the dump from rc5 or it is still happening with rc7? >