Thanks for the comments, I'm working on them. On Wed, Jun 16, 2021 at 12:07:50PM +0200, Aurélien Aptel wrote: > I guess it looks ok as a quick fix for the issue at hand but: > - should we check for more error codes (use is_retryable_error()?) > - all syscalls will need something similar, the session can be closed at > any point > * either we add a loop everywhere > * or we change cifs_send_receive to retry (most?) calls > * might be worth looking at what nfs does here Some syscall can return EAGAIN, so we would need to check which operation is retryable and if the error is retryable. I don't know if all Linux syscalls mappable to smb2 operations in the wire, but it may be worth mapping. NFS requeues the calls, and fails after a configurable timeout for soft mounts, or issues an error message and requeues the request for hard mounts (retrans and timeo mount options). > - Should this be done for both soft (default) and hard mounts? I guess > for hard we would retry indefinitely Good point, but the correct option is to retry on hard mounts until the operation succeeds. NFS hard mounts create issues like being unable to do a clean shutdown on a server failure, because oustanding I/O blocks it. The NFS has atempted to fix this by adding options to kill all outstanding I/O, but I'm not current on the efforts/issues in this front. So this would create the same issue with outstanding mounts on cifs shares, so a similar solution to NFS may be adapted/created in the future. One thing that I forgot to mention in the patch is that this uncovered a bug in compuond requests, where wait_for_compund_request will block the readdir request with EDEADLK, if the share needs reconnect. I'm looking into this patch, and will submmit it separatelly, as this looks like a corner case uncovered by the patch and specific conditions in the tests. > Cheers, Best regards, Thiago