Re: Potential leak in file put

Tom Talpey <tom@xxxxxxxxxx> · Thu, 20 Jul 2023 10:30:27 -0400

On 7/19/2023 8:10 AM, Shyam Prasad N wrote:
Hi all,

cifs.ko seems to be leaking handles occasionally when put under some
stress testing.
I was scanning the code for potential places where we could be
leaking, and one particular instance caught my attention.

In _cifsFileInfo_put, when the ref count of a cifs_file reaches 0, we
remove it from the lists and then send the close request over the
wire...
         if (!tcon->need_reconnect && !cifs_file->invalidHandle) {
                 struct TCP_Server_Info *server = tcon->ses->server;
                 unsigned int xid;

                 xid = get_xid();
                 if (server->ops->close_getattr)
                         server->ops->close_getattr(xid, tcon, cifs_file);
                 else if (server->ops->close)
                         server->ops->close(xid, tcon, &cifs_file->fid);
                 _free_xid(xid);
         }

But as you can see above, we do not have return value from the above handlers.

Yeah, that's a problem. The most obvious is if the network becomes
partitioned and the close(s) fail. Are you injecting that kind of
error?

Still, the logic is going to grow some serious hair if we start
retrying every error. What will bound the retries, and what kind
of error(s) might be considered fatal? Tying up credits, message
id's, handles, etc can be super problematic.

Also, have you considered using some sort of laundromat? Or is it
critical that these closes happen before proceeding?

Tom.

What would happen if the above close_getattr or close calls fail?
Particularly, what would happen if the failure happens even before the
request is put in flight?
In this stress testing we have the server giving out lesser credits.
So with the testing, the credit counters on the active connections are
expected to be low in general.
I'm assuming that the above condition will leak handles.

I was thinking about making a change to let the above handlers return
rc rather than void. And then to check the status.
If failure, create a delayed work for closing the remote handle. But
I'm not convinced that this is the right approach.
I'd like to know more comments on this.