On 7/19/2023 8:10 AM, Shyam Prasad N wrote:
Hi all, cifs.ko seems to be leaking handles occasionally when put under some stress testing. I was scanning the code for potential places where we could be leaking, and one particular instance caught my attention. In _cifsFileInfo_put, when the ref count of a cifs_file reaches 0, we remove it from the lists and then send the close request over the wire... if (!tcon->need_reconnect && !cifs_file->invalidHandle) { struct TCP_Server_Info *server = tcon->ses->server; unsigned int xid; xid = get_xid(); if (server->ops->close_getattr) server->ops->close_getattr(xid, tcon, cifs_file); else if (server->ops->close) server->ops->close(xid, tcon, &cifs_file->fid); _free_xid(xid); } But as you can see above, we do not have return value from the above handlers.
Yeah, that's a problem. The most obvious is if the network becomes partitioned and the close(s) fail. Are you injecting that kind of error? Still, the logic is going to grow some serious hair if we start retrying every error. What will bound the retries, and what kind of error(s) might be considered fatal? Tying up credits, message id's, handles, etc can be super problematic. Also, have you considered using some sort of laundromat? Or is it critical that these closes happen before proceeding? Tom.
What would happen if the above close_getattr or close calls fail? Particularly, what would happen if the failure happens even before the request is put in flight? In this stress testing we have the server giving out lesser credits. So with the testing, the credit counters on the active connections are expected to be low in general. I'm assuming that the above condition will leak handles. I was thinking about making a change to let the above handlers return rc rather than void. And then to check the status. If failure, create a delayed work for closing the remote handle. But I'm not convinced that this is the right approach. I'd like to know more comments on this.