Re: [PATCH] Revert "mountd: handle allocation failures in auth_unix_ip upcall"

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 26 Nov 2012 18:10:18 -0500

On Nov 26, 2012, at 5:51 PM, "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:

> On Mon, Nov 26, 2012 at 05:38:49PM -0500, Chuck Lever wrote:
>> 
>> On Nov 26, 2012, at 5:15 PM, "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
>> 
>>> On Mon, Nov 26, 2012 at 05:05:22PM -0500, Chuck Lever wrote:
>>>> 
>>>> On Nov 26, 2012, at 5:03 PM, "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
>>>> 
>>>>> From: "J. Bruce Fields" <bfields@xxxxxxxxxx>
>>>>> 
>>>>> This reverts commit 485f7a21e1649797f29317b865cbb094c1f6a71d.  The
>>>>> failures handled there could be any sort of name resolution failure, not
>>>>> just an allocation, and failing to downcall (hence leaving the client
>>>>> hanging) is not the correct thing to do in those cases.
>>>> 
>>>> The problem is in the kernel, then: a downcall should be allowed to fail, IMO.
>>> 
>>> In this case, after a revert, a failure here will result in the downcall
>>> passing down a client named "DEFAULT".  Presumably that won't be
>>> permitted access to the export, so the client will end up getting an
>>> error.
>> 
>> "A failure here" can mean either malloc() returned NULL in client_resolve() or client_compose(), or . . . ?
> 
> Looks like it'd also fail if we couldn't map the client's ip address to
> a name.

That's the common failure mode here, which is now treated just like a malloc(3) failure.

>> What exactly is the problem with the current code?

Anyway, it would help my addled brain if the problem description in the next version of this patch was more clear about what the code is doing wrong now.

>> The kernel won't get any downcall reply in that
>> case!  Is that what you are trying to fix?
>> 
>> WRT my original objection: In general I don't see how to make it
>> impossible for mountd to fail.
> 
> Sure, but mountd is required for the server to function, so it's just a
> question of how we fail.
> 
>> Thus the kernel needs to be better about recovering when mountd
>> suddenly disappears.
> 
> Currently it drops and lets the client retry.  I suspect that's the
> correct thing to do, but alternatives are welcomed.

By "drops" do you mean the server drops the NFS request, and the client retransmits the request?  That's actually pretty unfriendly behavior, IMO, since an application on a client is typically stuck at that point until that RPC gets some sort of result (after possibly several RTOs).

Maybe the server could signal an NFS error of some kind and let the client decide if it wants to retry until the server is working again, or fail that request immediately.

(Also, if NFSv4 is dependent on a mountd upcall, it is not supposed to drop a request, AFAIK).

OK, but this is a separate issue from the case you are trying to fix.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html