Hi folks, Can somebody give me an insight into to following WARNING (at the end of the message) that I see logged in var log messages while using softROCE (NFSoRDMA)? This is typically associated with a hiccup in communication I see happening over RDMA (long delays). It's coming form the WARN here in rxe_comp.c: case COMPST_ERROR: WARN_ON_ONCE(wqe->status == IB_WC_SUCCESS); do_complete(qp, wqe); rxe_qp_error(qp); if (pkt) { rxe_drop_ref(pkt->qp); With a little bit of printks I tracked it to: COMPST_ERROR is coming from "retrying counter exceeding" (RXE_CNT_RETRY_EXCEEDED) in COMPST_ERROR_RETRY. COMPST_ERROR_RETRY is coming from check_psn(). I see that packet psn is greater then the wqe psn. I have noticed that can happen (but not always) after update_wqe_psn() has number of packets left to send some number larger than 1. Goal is to figure out why the hiccups are happening and I think this is a clue. Thank you for any info. Dec 5 16:42:16 localhost kernel: ------------[ cut here ]------------ Dec 5 16:42:16 localhost kernel: WARNING: CPU: 0 PID: 0 at drivers/infiniband/sw/rxe/rxe_comp.c:741 rxe_completer+0xd84/0xe30 [rdma_rxe] Dec 5 16:42:16 localhost kernel: Modules linked in: rpcrdma ib_ucm ib_umad rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm iw_cm ib_cm ib_uverbs ib_core rfcomm fuse ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep snd_seq_midi snd_seq_midi_event coretemp crc32_pclmul ext4 ghash_clmulni_intel mbcache jbd2 aesni_intel snd_ens1371 snd_ac97_codec glue_helper ppdev lrw ac97_bus snd_seq gf128mul uvcvideo ablk_helper cryptd vmw_balloon videobuf2_vmalloc videobuf2_memops Dec 5 16:42:16 localhost kernel: btusb snd_pcm videobuf2_core pcspkr btrtl videodev btbcm btintel snd_timer snd_rawmidi bluetooth snd_seq_device snd vmw_vmci rfkill shpchp i2c_piix4 soundcore parport_pc parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom vmwgfx sd_mod crc_t10dif crct10dif_generic drm_kms_helper ata_generic syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm pata_acpi crct10dif_pclmul ahci crct10dif_common mptspi crc32c_intel libahci scsi_transport_spi mptscsih serio_raw ata_piix libata mptbase e1000 i2c_core dm_mirror dm_region_hash dm_log dm_mod Dec 5 16:42:16 localhost kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0 #2 Dec 5 16:42:16 localhost kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 Dec 5 16:42:16 localhost kernel: Call Trace: Dec 5 16:42:16 localhost kernel: <IRQ> [<ffffffff94cb9865>] dump_stack+0x19/0x1b Dec 5 16:42:16 localhost kernel: [<ffffffff94686968>] __warn+0xd8/0x100 Dec 5 16:42:16 localhost kernel: [<ffffffff94686aad>] warn_slowpath_null+0x1d/0x20 Dec 5 16:42:16 localhost kernel: [<ffffffffc0a73734>] rxe_completer+0xd84/0xe30 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a7f24f>] rxe_do_task+0x9f/0x110 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a7f3b8>] rxe_run_task+0x18/0x40 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a729a5>] rxe_comp_queue_pkt+0x45/0x50 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a77bf8>] rxe_rcv+0x2a8/0x920 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc03bcc5f>] ? ipt_do_table+0x31f/0x4f0 [ip_tables] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a7fa10>] ? net_to_rxe+0x80/0x80 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a7fa73>] rxe_udp_encap_recv+0x63/0xa0 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffffc0a7fa73>] ? rxe_udp_encap_recv+0x63/0xa0 [rdma_rxe] Dec 5 16:42:16 localhost kernel: [<ffffffff94c148bb>] udp_queue_rcv_skb+0x1bb/0x4a0 Dec 5 16:42:16 localhost kernel: [<ffffffff94c15108>] __udp4_lib_rcv+0x568/0xb90 Dec 5 16:42:16 localhost kernel: [<ffffffffc09281de>] ? ipv4_confirm+0x4e/0x100 [nf_conntrack_ipv4] Dec 5 16:42:16 localhost kernel: [<ffffffff94c15b9a>] udp_rcv+0x1a/0x20 Dec 5 16:42:16 localhost kernel: [<ffffffff94be30ce>] ip_local_deliver_finish+0x8e/0x1d0 Dec 5 16:42:16 localhost kernel: [<ffffffff94be33b9>] ip_local_deliver+0x59/0xd0 Dec 5 16:42:16 localhost kernel: [<ffffffff94be3040>] ? ip_rcv_finish+0x300/0x300 Dec 5 16:42:16 localhost kernel: [<ffffffff94be2db8>] ip_rcv_finish+0x78/0x300 Dec 5 16:42:16 localhost kernel: [<ffffffff94be36e6>] ip_rcv+0x2b6/0x410 Dec 5 16:42:16 localhost kernel: [<ffffffff94be2d40>] ? inet_del_offload+0x40/0x40 Dec 5 16:42:16 localhost kernel: [<ffffffff94b9f9d4>] __netif_receive_skb_core+0x2e4/0x820 Dec 5 16:42:16 localhost kernel: [<ffffffff94b9ff28>] __netif_receive_skb+0x18/0x60 Dec 5 16:42:16 localhost kernel: [<ffffffff94b9ffb0>] netif_receive_skb_internal+0x40/0xc0 Dec 5 16:42:16 localhost kernel: [<ffffffff94ba0b58>] napi_gro_receive+0xd8/0x100 Dec 5 16:42:16 localhost kernel: [<ffffffffc01f33e8>] e1000_clean_rx_irq+0x2b8/0x510 [e1000] Dec 5 16:42:16 localhost kernel: [<ffffffffc01f4078>] e1000_clean+0x278/0x8d0 [e1000] Dec 5 16:42:16 localhost kernel: [<ffffffff94ba0483>] net_rx_action+0x123/0x320 Dec 5 16:42:16 localhost kernel: [<ffffffff9468fb4f>] __do_softirq+0xef/0x280 Dec 5 16:42:16 localhost kernel: [<ffffffff94ccc51c>] call_softirq+0x1c/0x30 Dec 5 16:42:16 localhost kernel: [<ffffffff9462c4c5>] do_softirq+0x65/0xa0 Dec 5 16:42:16 localhost kernel: [<ffffffff9468fed5>] irq_exit+0x105/0x110 Dec 5 16:42:16 localhost kernel: [<ffffffff94ccd036>] do_IRQ+0x56/0xe0 Dec 5 16:42:16 localhost kernel: [<ffffffff94cc1b2d>] common_interrupt+0x6d/0x6d Dec 5 16:42:16 localhost kernel: <EOI> [<ffffffff94cc0dd6>] ? native_safe_halt+0x6/0x10 Dec 5 16:42:16 localhost kernel: [<ffffffff94cc0c6e>] ? default_idle+0x1e/0xc0 Dec 5 16:42:16 localhost kernel: [<ffffffff94633f86>] ? arch_cpu_idle+0x26/0x30 Dec 5 16:42:16 localhost kernel: [<ffffffff946e6efa>] ? cpu_startup_entry+0x14a/0x1c0 Dec 5 16:42:16 localhost kernel: [<ffffffff94ca8e17>] ? rest_init+0x77/0x80 Dec 5 16:42:16 localhost kernel: [<ffffffff9516c05a>] ? start_kernel+0x433/0x454 Dec 5 16:42:16 localhost kernel: [<ffffffff9516ba30>] ? repair_env_string+0x5c/0x5c Dec 5 16:42:16 localhost kernel: [<ffffffff9516b120>] ? early_idt_handler_array+0x120/0x120 Dec 5 16:42:16 localhost kernel: [<ffffffff9516b5ef>] ? x86_64_start_reservations+0x24/0x26 Dec 5 16:42:16 localhost kernel: [<ffffffff9516b740>] ? x86_64_start_kernel+0x14f/0x172 Dec 5 16:42:16 localhost kernel: [<ffffffff946001a5>] ? start_cpu+0x5/0x14 Dec 5 16:42:16 localhost kernel: ---[ end trace c96ed928ed9503ca ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html