> I am attempting to use NFS over RDMA (over infiniband), but there is some > problem. The NFS filesystem can be mounted on the client, and things > will work for some time (can read, modify, etc. the files over the mount), > but then (at a seemingly random time) the NFS server will dump these > lines to the logs: > > [ 4380.623922] svcrdma: Error fast registering memory for xprt ffff8803307d7400 > [ 4413.343161] svcrdma: error fast registering xdr for xprt ffff8803319edc00 Digging into it further, it seems like the Mellanox Infiniband driver could somehow be involved. Adding some trace's to the code, it's obvious something like this is happening: At some time sq_cq_reap() is called, which ends up like this: sq_cq_reap() ib_poll_cq() mlx4_ib_poll_cq() mlx4_ib_poll_one() mlx4_ib_handle_error_cqe() - Which then sets wc->status to IB_WC_WR_FLUSH_ERR rather often, but the killer blow seems to be when IB_WC_REM_ACCESS_ERR is set. - Because of the error previously, sq_cq_reap sets the XPT_CLOSE flag Then, sometime later: fast_reg_read_chunks() svc_rdma_fastreg() svc_rdma_send() svc_rdma_send() - XPT_CLOSE is set and hence -ENOTCONN is returned - Since svc_rdma_fastreg() had an error fast_reg_read_chunks() bails and the client seems to then hang. I'd ask the infiband guys, what does IB_WC_WR_FLUSH_ERR and IB_WC_REM_ACCESS_ERR mean? Is it something drastic that should result in hangs? nog. > Both client and server are running the latest vanilla 2.6.34.1 kernel > with Mellanox Connect-X infiniband cards. If more information is > required, please do ask. > > BTW: I can reproduce the problem quite reliably by running the bonnie++ > "benchmark" on the NFS mounted filesystem. > > nog. > > ps: I'm not subscribed to the list, please CC me on all replies. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html