Re: Inconsistency when mounting a directory that 'world' cannot access.

Steve Dickson <SteveD@xxxxxxxxxx> · Mon, 08 Oct 2012 07:42:34 -0400

On 08/10/12 02:03, NeilBrown wrote:
> On Thu, 4 Oct 2012 12:07:39 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> wrote:
> 
>> On Thu, Oct 04, 2012 at 08:46:59AM +1000, NeilBrown wrote:
>>> On Wed, 3 Oct 2012 12:27:28 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
>>> wrote:
>>>
>>>> On Wed, Oct 03, 2012 at 03:48:43PM +0000, Myklebust, Trond wrote:
>>>>> On Wed, 2012-10-03 at 11:13 -0400, J. Bruce Fields wrote:
>>>>>> On Wed, Oct 03, 2012 at 01:46:29PM +1000, NeilBrown wrote:
>>>>>>> On Tue, 2 Oct 2012 10:33:34 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I guess you're right.  So it starts to sound more like: "you have a
>>>>>>>> confusing setup.  Your export configuration says one thing, and your
>>>>>>>> filesystem permissions say another.  Under NFSv3 the confusion didn't
>>>>>>>> matter, but now it does--time to fix it."
>>>>>>>>
>>>>>>>
>>>>>>> That's the best I could come to - I'm glad to have it confirmed.  Thanks!
>>>>>>>
>>>>>>> It is unfortunate that Linux NFS uses an anon credential to mount when krb5
>>>>>>> is in use, and uses 'root' when auth_sys is used (which might be anon if
>>>>>>> "root_squash" is active, but might not).
>>>>>>> I wonder if it would work to use auth_none for the mount-time lookup, just
>>>>>>> for consistency..
>>>>>>>
>>>>>>> Is the following appropriate?  Is there somewhere better to put this caveat?
>>>>>>
>>>>>> Unfortunately, it's more complicated than this, as it depends on client
>>>>>> implementation and configuration details.
>>>>>>
>>>>>> Something like this would be more accurate but possibly too long:
>>>>>>
>>>>>> 	Note that under NFSv2 and NFSv3, the mount path is traversed by
>>>>>> 	mountd acting as root, but under NFSv4 the mount path is looked
>>>>>> 	up using the client's credentials.  This means that, for
>>>>>> 	example, if a client mounts using a krb5 credential that the
>>>>>> 	server maps to an "anonmyous" user, then the mount will only
>>>>>> 	succeed if that directory and all its parents allow eXecute
>>>>>> 	permissions.
>>>>>
>>>>> So you're listing this as a "feature" rather than a bug? There should be
>>>>> no reason to constrain the pseudofs to use the permission checks from
>>>>> the underlying filesystem.
>>>>
>>>> I'd be fine with that.
>>>>
>>>> (That still leaves some subtle v3/v4 difference in the case of mount
>>>> paths underneath an export?
>>>>
>>>> What *is* the existing mountd behavior there, exactly?  I'm inclined to
>>>> think allowing mounts of arbitrary subdirectories is a bug, but maybe
>>>> there's some historical reason for it or maybe someone already depends
>>>> on it.)
>>>>
>>>> --b.
>>>
>>> The behaviour is simple that you mount a filehandle (typically belonging to a
>>> directory) and that filehandle can be anything inside any exported filesystem.
>>
>> It's not the nfsd behavior that bothers me--there's nothing we can do
>> about the fact that access by filehandle can bypass directory
>> permissions.
>>
>> What bothers is that mountd will apparently allow anyone to do a lookup
>> anywhere in an exported filesystem.
> 
> Not anyone - it requires a privileged source port from a known host.
> So it is only "anyone who can get 'root'".
> 
>>
>> I don't know--maybe I shouldn't be so concerned about the possibility a
>> rogue user could figure out that my "Music" directory includes an
>> unreasonable number of Miles Davis titles.
>>
>>> Yes, please do depend on being able to mount filehandles that aren't to root
>>> of a filesystem.
>>>
>>> The case the brought this issue to my attention involved the server having
>>> a directory containing hundreds of home directories.  This directory is
>>> exported.
>>>
>>> If they mount that top level directory they get horrible performance.  If
>>> they use an automounter to just mount the homes that are accessed it works
>>> better.  They weren't able to explain why but my guess is that some tools
>>> (GUI filesystem browser) would occasionally do the equivalent of "ls  -l" of
>>> the top level directory which would hammer nfs-idmapd and probably ldap....
>>> though you would think that would get cached and not be a problem for long.
>>> So maybe it is more subtle than that.
>>
>> Getting all the id->name mappings for a 100-entry directory is going to
>> require a 100 serialized upcalls to idmapd (and then possibly ldap), and
>> by default it looks like the idmapd cache will go cold after 10
>> minutes....  Not hard to imagine that could be a problem.
>>
>> Running multiple idmapd process would be easy and might help?  Though
>> not if the client's just giving us the getattrs one at a time.
>>
>> Or maybe the problem's somewhere else entirely, but that's a real bug if
>> we aren't giving good performance on /home.
> 
> I did some experimenting..
> On both 'client' and 'server':
>   for i in `seq 2000 3000`; do echo u$i:x:$i:1000::/nohome:/bin/false; done
>>> /etc/passwd
> 
> On server in suitable directory
> 
>   for i in `seq 2000 3000`; do mkdir $i ; chown u$i $i ; done
> 
> Mount that directory onto the client with NFSv3 and "time ls -l" takes a
> little under 4 seconds.
> Mount with NFSv4 and it takes about the same.  However:
> 
> .....
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2974
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2975
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2976
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2977
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2978
> drwxr-xr-x 2 u2979      root 4096 Oct  8 16:19 2979
> drwxr-xr-x 2 u2980      root 4096 Oct  8 16:19 2980
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2981
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2982
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2983
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2984
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2985
> drwxr-xr-x 2 4294967294 root 4096 Oct  8 16:19 2986
> ....
> 
> 
> tcpdump shows the server is returning the write stuff, but something if going
> wrong on the client.  I've tried unmounting/remounting and killing/restarting
> rpc.idmapd.
> I had some config problems previously .. is there any chance that these
> unknown entries are in a cache?  Any easy way to view or flush the cache?
Assuming you are using the keyring based idmapper, "nfsidmap -cv" will
clear the keyring of user and group ids. See nfsidmap(5).

If you using rpc.idmapd, I believe 
    echo `date +'%s'` > /proc/net/rpc/nfs4.idtoname/flush
will do the trick.... The CITI faq 
    http://www.citi.umich.edu/projects/nfsv4/linux/faq/
has a section on work with this cache...

steved.

> 
> Of course this is with text-file password lookup.  LDAP might be slower but
> I'd be surprised if it was much slower.
> 
> NeilBrown
> 
> 
> 
>>
>> --b.
>>
>>> I've built similar setups before.  There is something attractive about
>>> everyone's home directory being /home/$USERNAME even though they are on
>>> different servers and different filesystems.
>>>
>>> In the particular problem scenario, local policy requires that the 'staff'
>>> directory on the server to not be world-accessible, but they still want to
>>> mount the individual home directories from there onto client machines as
>>> required.
>>> I cannot easily justify that policy, but the point is that it works with
>>> NFSv3 and with AUTH_SYS/no_root_squash, but not with NFSv4/kerb5.  I don't
>>> think we can fix this inconsistency but maybe we can explain it.
>>>
>>> I think your text is more accurate than mine, but also a little more vague so
>>> the important may not be immediately obvious.  That might be a price we have
>>> to pay for accuracy.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html