Re: NFS dmesg errors in 5.14-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mike-

> On Jul 14, 2021, at 12:40 PM, Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> Chuck,
> 
> We are now seeing this in the first RC:
> 
> 
> [31868.644165] ------------[ cut here ]------------
> [31868.650059] failed to drain recv queue: -22
> [31868.655191] WARNING: CPU: 32 PID: 559 at drivers/infiniband/core/verbs.c:2738 __ib_drain_rq+0x163/0x1a0 [ib_core]
> [31868.657234] ------------[ cut here ]------------
> [31868.667133] Modules linked in: nfsv3
> [31868.672832] failed to drain send queue: -22
> [31868.677279]  nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs tcp_diag udp_diag raw_diag inet_diag rfkill ib_isert iscsi_target_mod target_core_mod rpcrdma ib_iser rdma_ucm opa_vnic rdma_cm ib_umad libiscsi ib_ipoib scsi_transport_iscsi ib_cm iw_cm sunrpc hfi1 mgag200 intel_rapl_msr intel_rapl_common drm_kms_helper sb_edac syscopyarea rdmavt x86_pkg_temp_thermal sysfillrect intel_powerclamp ipmi_si ib_uverbs sysimgblt coretemp fb_sys_fops cec ipmi_devintf drm crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel ib_core mei_me rapl intel_cstate mei lpc_ich mxm_wmi i2c_i801
> [31868.682425] WARNING: CPU: 65 PID: 608575 at drivers/infiniband/core/verbs.c:2705 __ib_drain_sq+0x14d/0x190 [ib_core]

The above warnings tell us ib_modify_qp() is returning -EINVAL,
twice in a row. ib_drain_qp() is not able to put the QP in the
ERR state, so it didn't try to post the drain sentinels.


> On the same tests, the mount command fails with a connection refused...
> 
> Any ideas on this?
> 
> 5.13.1 (the first 5.13.y release) tests fine.

There is exactly one change to the client components in
net/sunrpc/xprtrdma/ in v5.14-rc1:

  e86be3a04bc4 ("SUNRPC: More fixes for backlog congestion")

Based on these two facts, my first inclination is that this is
a problem with the verbs provider, not with rpcrdma.ko.

Let's collect a little more information. Enable tracing on
your client before trying your test again:

 # trace-cmd record -e sunrpc -e rpcrdma -e rdma_core -e rdma_cma

When the test fails, ^C the trace-cmd, and have a look at the
trace.dat file (and/or, send it to me).


--
Chuck Lever







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux