Hi Mike- > On Jul 14, 2021, at 12:40 PM, Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx> wrote: > > Chuck, > > We are now seeing this in the first RC: > > > [31868.644165] ------------[ cut here ]------------ > [31868.650059] failed to drain recv queue: -22 > [31868.655191] WARNING: CPU: 32 PID: 559 at drivers/infiniband/core/verbs.c:2738 __ib_drain_rq+0x163/0x1a0 [ib_core] > [31868.657234] ------------[ cut here ]------------ > [31868.667133] Modules linked in: nfsv3 > [31868.672832] failed to drain send queue: -22 > [31868.677279] nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs tcp_diag udp_diag raw_diag inet_diag rfkill ib_isert iscsi_target_mod target_core_mod rpcrdma ib_iser rdma_ucm opa_vnic rdma_cm ib_umad libiscsi ib_ipoib scsi_transport_iscsi ib_cm iw_cm sunrpc hfi1 mgag200 intel_rapl_msr intel_rapl_common drm_kms_helper sb_edac syscopyarea rdmavt x86_pkg_temp_thermal sysfillrect intel_powerclamp ipmi_si ib_uverbs sysimgblt coretemp fb_sys_fops cec ipmi_devintf drm crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel ib_core mei_me rapl intel_cstate mei lpc_ich mxm_wmi i2c_i801 > [31868.682425] WARNING: CPU: 65 PID: 608575 at drivers/infiniband/core/verbs.c:2705 __ib_drain_sq+0x14d/0x190 [ib_core] The above warnings tell us ib_modify_qp() is returning -EINVAL, twice in a row. ib_drain_qp() is not able to put the QP in the ERR state, so it didn't try to post the drain sentinels. > On the same tests, the mount command fails with a connection refused... > > Any ideas on this? > > 5.13.1 (the first 5.13.y release) tests fine. There is exactly one change to the client components in net/sunrpc/xprtrdma/ in v5.14-rc1: e86be3a04bc4 ("SUNRPC: More fixes for backlog congestion") Based on these two facts, my first inclination is that this is a problem with the verbs provider, not with rpcrdma.ko. Let's collect a little more information. Enable tracing on your client before trying your test again: # trace-cmd record -e sunrpc -e rpcrdma -e rdma_core -e rdma_cma When the test fails, ^C the trace-cmd, and have a look at the trace.dat file (and/or, send it to me). -- Chuck Lever