> Some code relies on server->* values so if you swap the server pointer > it at the last moment I'm not sure those values will match the new > server ptr. I'm not sure that I understand this. I'm not suggesting that we swap. I'm only saying that on getting a EDEADLK error from compound_send_recv(), try another channel instead of returning that to userspace. Please let me know if I'm missing something here. Regards, Shyam On Mon, Mar 8, 2021 at 5:22 PM Aurélien Aptel <aaptel@xxxxxxxx> wrote: > > Hi Shyam, > > Thanks for the review. I also realized we cannot make this failproof so > I went in with the idea to just avoid picking easy, non-confusing cases > of unusable channels. If that's not good we can drop the patch all > together. > > Shyam Prasad N <nspmangalore@xxxxxxxxx> writes: > > Spent some time in this code path. Seems like this is more complicated > > than I initially thought. > > @Aurélien Aptel A few things to consider: > > 1. What if the credits that will be needed by the request is more than > > 3 (or any constant). We may still end up returning -EDEADLK to the > > user when we don't find enough credits, and there are not enough > > in_flight to satisfy the request. Ideally, we should still try other > > channels. > > Yes you're right, it won't prevent failing if more credits are > needed. The patch wasn't meant to be failproof, just to avoid most > occurences of the problem. It's a simple sanity check with some > false-positives and false-negatives. > > > 2. Echo and oplock use 1 reserved credit each, which the regular > > operations can steal, in case of shortage. > > There's a dedicated server->echo_credit I think. > > > 3. Reading server->credits without a lock can result in wildly > > different values, since some CPU architectures may not update it > > atomically. At worse, we may select a channel which may not have > > enough credits when we get to it > > If we are just doing a read on an aligned int, at least on x86 you will > get either a stale value or the new value, you cannot get a garbage mix > of both. Which CPU architecture doesn't provide cache coherency at that > level? This is a complex question I couldn't find an easy answer to. > > In any case cifs.ko is already assuming it's valid in various places. We > are doing it for some usage of the server->tcpStatus, ses->status, > tcon->tidStatus at least. > > The problems of atomic read in pick_channel() aside, the credits might > change from another process between the moment the channel is picked and > the moment the credits needed are spent (server->credit -= XYZ). In > which case it will also EDEADLK later on. > > > What if we do this... > > wait_for_compound_request() can return -EDEADLK when it doesn't get > > enough credits, and there are no requests in_flight. > > In compound_send_recv(), after we call wait_for_compound_request(), we > > can check it's return value. If it's -EDEADLK, we keep calling > > cifs_pick_another_channel(ses, bitmask) (bitmask has bits set for > > channels already selected and rejected) and > > wait_for_compound_request() in a loop till we find a channel which has > > enough credits, or we run out of channels; in which case we can return > > -EDEADLK. > > Makes sense? Do you see a problem with that approach? > > Some code relies on server->* values so if you swap the server pointer > it at the last moment I'm not sure those values will match the new > server ptr. > > Cheers, > -- > Aurélien Aptel / SUSE Labs Samba Team > GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München) > -- Regards, Shyam