> On Aug 15, 2020, at 1:45 AM, Dan Aloni <dan@xxxxxxxxxxxx> wrote: > > On Fri, Aug 14, 2020 at 04:21:54PM -0400, Chuck Lever wrote: >> >> >>> On Aug 14, 2020, at 3:10 PM, Dan Aloni <dan@xxxxxxxxxxxx> wrote: >>> >>> On Fri, Aug 14, 2020 at 02:12:48PM -0400, Chuck Lever wrote: >>>> Hi Dan- >>>> >>>>> On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@xxxxxxxxxxxx> wrote: >>>>> >>>>> It was observed that on disconnections, these unmaps don't occur. The >>>>> relevant path is rpcrdma_mrs_destroy(), being called from >>>>> rpcrdma_xprt_disconnect(). >>>> >>>> MRs are supposed to be unmapped right after they are used, so >>>> during disconnect they should all be unmapped already. How often >>>> do you see a DMA mapped MR in this code path? Do you have a >>>> reproducer I can try? >>> >>> These are not graceful disconnections but abnormal ones, where many large >>> IOs are still in flight, while the remote server suddenly breaks the >>> connection, the remote IP is still reachable but refusing to accept new >>> connections only for a few seconds. >> >> Ideally that's not supposed to matter. I'll see if I can reproduce >> with my usual tricks. >> >> Why is your server behaving this way? > > It's a dedicated storage cluster under a specific testing scenario, > implementing floating IPs. Haven't tried, but maybe the same scenario > can be reproduced with a standard single Linux NFSv3 server by fiddling > with nfsd open ports. Hi Dan, I was able to reproduce the DMA-map leak with a simple server-side disconnect injection test. I'll try some root cause analysis tomorrow. -- Chuck Lever