> On Jan 29, 2018, at 3:01 PM, Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > Hi Chuck, > >> For NFS/RDMA, I think of the "failover" case where a device is >> removed, then a new one is plugged in (or an existing cold >> replacement is made available) with the same IP configuration. >> On a "hard" NFS mount, we want the upper layers to wait for >> a new suitable device to be made available, and then to use >> it to resend any pending RPCs. The workload should continue >> after a new device is available. > > Really? so the context is held forever (in case the device never > comes back)? I didn't say this was the best approach :-) And it certainly can change if we have something better. But yes, with a hard mount, the NFS and RPC client stack keeps the pending RPCs around and continues to attempt reconnection with the NFS server. The idea is that after an unplug, another device with the proper IP configuration can be made available, and then rdma_resolve_addr() can figure out how to reconnect. The associated NFS workload will be suspended until it can reconnect. Now on the NFS server (target) an unplug results in connection abort. Any context at the transport layer is gone, though the NFS server maintains duplicate reply caches that can hold RPC replies for some time. Those are all bounded in size. The clients continue to attempt to reconnect until there is another device available that can allow the server to accept connections. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html