insight into a WARNING from softROCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Can somebody give me an insight into to following WARNING (at the end
of the message)  that I see logged in var log messages while using
softROCE (NFSoRDMA)? This is typically associated with a hiccup in
communication I see happening over RDMA (long delays).

It's coming form the WARN here in rxe_comp.c:

                case COMPST_ERROR:
                        WARN_ON_ONCE(wqe->status == IB_WC_SUCCESS);
                        do_complete(qp, wqe);
                        rxe_qp_error(qp);

                        if (pkt) {
                                rxe_drop_ref(pkt->qp);

With a little bit of printks I tracked it to:
COMPST_ERROR is coming from "retrying counter exceeding"
(RXE_CNT_RETRY_EXCEEDED)  in COMPST_ERROR_RETRY. COMPST_ERROR_RETRY is
coming from check_psn(). I see that packet psn is greater then the wqe
psn. I have noticed that can happen (but not always) after
update_wqe_psn() has number of packets left to send some number larger
than 1.

Goal is to figure out why the hiccups are happening and I think this is a clue.

Thank you for any info.

Dec  5 16:42:16 localhost kernel: ------------[ cut here ]------------
Dec  5 16:42:16 localhost kernel: WARNING: CPU: 0 PID: 0 at
drivers/infiniband/sw/rxe/rxe_comp.c:741 rxe_completer+0xd84/0xe30
[rdma_rxe]
Dec  5 16:42:16 localhost kernel: Modules linked in: rpcrdma ib_ucm
ib_umad rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm
ib_cm ib_uverbs ib_core rfcomm fuse ip6t_rpfilter ipt_REJECT
nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat
ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security
ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep
snd_seq_midi snd_seq_midi_event coretemp crc32_pclmul ext4
ghash_clmulni_intel mbcache jbd2 aesni_intel snd_ens1371
snd_ac97_codec glue_helper ppdev lrw ac97_bus snd_seq gf128mul
uvcvideo ablk_helper cryptd vmw_balloon videobuf2_vmalloc
videobuf2_memops
Dec  5 16:42:16 localhost kernel: btusb snd_pcm videobuf2_core pcspkr
btrtl videodev btbcm btintel snd_timer snd_rawmidi bluetooth
snd_seq_device snd vmw_vmci rfkill shpchp i2c_piix4 soundcore
parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom vmwgfx sd_mod crc_t10dif
crct10dif_generic drm_kms_helper ata_generic syscopyarea sysfillrect
sysimgblt fb_sys_fops ttm drm pata_acpi crct10dif_pclmul ahci
crct10dif_common mptspi crc32c_intel libahci scsi_transport_spi
mptscsih serio_raw ata_piix libata mptbase e1000 i2c_core dm_mirror
dm_region_hash dm_log dm_mod
Dec  5 16:42:16 localhost kernel: CPU: 0 PID: 0 Comm: swapper/0 Not
tainted 3.10.0 #2
Dec  5 16:42:16 localhost kernel: Hardware name: VMware, Inc. VMware
Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00
07/02/2015
Dec  5 16:42:16 localhost kernel: Call Trace:
Dec  5 16:42:16 localhost kernel: <IRQ>  [<ffffffff94cb9865>]
dump_stack+0x19/0x1b
Dec  5 16:42:16 localhost kernel: [<ffffffff94686968>] __warn+0xd8/0x100
Dec  5 16:42:16 localhost kernel: [<ffffffff94686aad>]
warn_slowpath_null+0x1d/0x20
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a73734>]
rxe_completer+0xd84/0xe30 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a7f24f>]
rxe_do_task+0x9f/0x110 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a7f3b8>]
rxe_run_task+0x18/0x40 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a729a5>]
rxe_comp_queue_pkt+0x45/0x50 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a77bf8>]
rxe_rcv+0x2a8/0x920 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc03bcc5f>] ?
ipt_do_table+0x31f/0x4f0 [ip_tables]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a7fa10>] ?
net_to_rxe+0x80/0x80 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a7fa73>]
rxe_udp_encap_recv+0x63/0xa0 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffffc0a7fa73>] ?
rxe_udp_encap_recv+0x63/0xa0 [rdma_rxe]
Dec  5 16:42:16 localhost kernel: [<ffffffff94c148bb>]
udp_queue_rcv_skb+0x1bb/0x4a0
Dec  5 16:42:16 localhost kernel: [<ffffffff94c15108>]
__udp4_lib_rcv+0x568/0xb90
Dec  5 16:42:16 localhost kernel: [<ffffffffc09281de>] ?
ipv4_confirm+0x4e/0x100 [nf_conntrack_ipv4]
Dec  5 16:42:16 localhost kernel: [<ffffffff94c15b9a>] udp_rcv+0x1a/0x20
Dec  5 16:42:16 localhost kernel: [<ffffffff94be30ce>]
ip_local_deliver_finish+0x8e/0x1d0
Dec  5 16:42:16 localhost kernel: [<ffffffff94be33b9>]
ip_local_deliver+0x59/0xd0
Dec  5 16:42:16 localhost kernel: [<ffffffff94be3040>] ?
ip_rcv_finish+0x300/0x300
Dec  5 16:42:16 localhost kernel: [<ffffffff94be2db8>] ip_rcv_finish+0x78/0x300
Dec  5 16:42:16 localhost kernel: [<ffffffff94be36e6>] ip_rcv+0x2b6/0x410
Dec  5 16:42:16 localhost kernel: [<ffffffff94be2d40>] ?
inet_del_offload+0x40/0x40
Dec  5 16:42:16 localhost kernel: [<ffffffff94b9f9d4>]
__netif_receive_skb_core+0x2e4/0x820
Dec  5 16:42:16 localhost kernel: [<ffffffff94b9ff28>]
__netif_receive_skb+0x18/0x60
Dec  5 16:42:16 localhost kernel: [<ffffffff94b9ffb0>]
netif_receive_skb_internal+0x40/0xc0
Dec  5 16:42:16 localhost kernel: [<ffffffff94ba0b58>]
napi_gro_receive+0xd8/0x100
Dec  5 16:42:16 localhost kernel: [<ffffffffc01f33e8>]
e1000_clean_rx_irq+0x2b8/0x510 [e1000]
Dec  5 16:42:16 localhost kernel: [<ffffffffc01f4078>]
e1000_clean+0x278/0x8d0 [e1000]
Dec  5 16:42:16 localhost kernel: [<ffffffff94ba0483>] net_rx_action+0x123/0x320
Dec  5 16:42:16 localhost kernel: [<ffffffff9468fb4f>] __do_softirq+0xef/0x280
Dec  5 16:42:16 localhost kernel: [<ffffffff94ccc51c>] call_softirq+0x1c/0x30
Dec  5 16:42:16 localhost kernel: [<ffffffff9462c4c5>] do_softirq+0x65/0xa0
Dec  5 16:42:16 localhost kernel: [<ffffffff9468fed5>] irq_exit+0x105/0x110
Dec  5 16:42:16 localhost kernel: [<ffffffff94ccd036>] do_IRQ+0x56/0xe0
Dec  5 16:42:16 localhost kernel: [<ffffffff94cc1b2d>]
common_interrupt+0x6d/0x6d
Dec  5 16:42:16 localhost kernel: <EOI>  [<ffffffff94cc0dd6>] ?
native_safe_halt+0x6/0x10
Dec  5 16:42:16 localhost kernel: [<ffffffff94cc0c6e>] ? default_idle+0x1e/0xc0
Dec  5 16:42:16 localhost kernel: [<ffffffff94633f86>] ? arch_cpu_idle+0x26/0x30
Dec  5 16:42:16 localhost kernel: [<ffffffff946e6efa>] ?
cpu_startup_entry+0x14a/0x1c0
Dec  5 16:42:16 localhost kernel: [<ffffffff94ca8e17>] ? rest_init+0x77/0x80
Dec  5 16:42:16 localhost kernel: [<ffffffff9516c05a>] ?
start_kernel+0x433/0x454
Dec  5 16:42:16 localhost kernel: [<ffffffff9516ba30>] ?
repair_env_string+0x5c/0x5c
Dec  5 16:42:16 localhost kernel: [<ffffffff9516b120>] ?
early_idt_handler_array+0x120/0x120
Dec  5 16:42:16 localhost kernel: [<ffffffff9516b5ef>] ?
x86_64_start_reservations+0x24/0x26
Dec  5 16:42:16 localhost kernel: [<ffffffff9516b740>] ?
x86_64_start_kernel+0x14f/0x172
Dec  5 16:42:16 localhost kernel: [<ffffffff946001a5>] ? start_cpu+0x5/0x14
Dec  5 16:42:16 localhost kernel: ---[ end trace c96ed928ed9503ca ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux