-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/14/2014 11:45 AM, J. Bruce Fields wrote: > On Fri, Feb 14, 2014 at 11:38:25AM -0500, Josef Bacik wrote: > > > On 02/14/2014 11:14 AM, J. Bruce Fields wrote: >>>> On Fri, Feb 14, 2014 at 07:49:35AM -0800, Eric W. Biederman >>>> wrote: >>>>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes: >>>>> >>>>>> On Thu, Feb 13, 2014 at 08:25:43PM -0800, Eric W. >>>>>> Biederman wrote: >>>>>>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes: >>>>>>> >>>>>>>> On Thu, Feb 13, 2014 at 03:45:16PM -0800, Eric W. >>>>>>>> Biederman wrote: >>>>>>>>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes: >>>>>>>>> >>>>>>>>>> Yesterday you passed on a report of this printk >>>>>>>>>> from nfsdfh.c firing: >>>>>>>>>> >>>>>>>>>> printk("nfsd: find_fh_dentry returned a >>>>>>>>>> DISCONNECTED directory: %pd2\n", dentry); >>>>>>>>>> >>>>>>>>>> I think the dentry probably comes from the >>>>>>>>>> FILEID_ROOT case of: >>>>>>>>>> >>>>>>>>>> if (fileid_type == FILEID_ROOT) dentry = >>>>>>>>>> dget(exp->ex_path.dentry); else { dentry = >>>>>>>>>> exportfs_decode_fh(exp->ex_path.mnt, fid, >>>>>>>>>> data_left, fileid_type, nfsd_acceptable, exp); } >>>>>>>>>> >>>>>>>>>> In that case the dentry was found using ordinary >>>>>>>>>> filesystem lookups, so doesn't go through the >>>>>>>>>> same DISCONNECTED-clearing logic as in the case >>>>>>>>>> of lookups by filehandle. >>>>>>>>>> >>>>>>>>>> Probably they have an export root that's not a >>>>>>>>>> filesystem root, and the lookups happened in the >>>>>>>>>> right order? >>>>>>>>>> >>>>>>>>>> I suspect that's fine, and that the printk is >>>>>>>>>> just stupid, but maybe we should clear >>>>>>>>>> DISCONNECTED when possible on normal lookups. >>>>>>>>>> The following is my attempt, though I'm not sure >>>>>>>>>> if d_alloc is the right place to do this. In any >>>>>>>>>> case it might help confirm this is what's >>>>>>>>>> happening. >>>>>>>>>> >>>>>>>>>> So if you pass along this patch to the person who >>>>>>>>>> was seeing that printk I'd be interested in the >>>>>>>>>> results. >>>>>>>>> >>>>>>>>> I have been reading through the dentry code for >>>>>>>>> other reasons and your patch definitely won't >>>>>>>>> change anything. __d_alloc sets d_flags = 0. >>>>>>>>> Therefore d_alloc always returns with d_flags == >>>>>>>>> 0. >>>>>>>> >>>>>>>> You're right, of course. I wasn't thinking >>>>>>>> straight. >>>>>>>> >>>>>>>> So the only dentries with DISCONNECTED set are those >>>>>>>> created with d_obtain_alias, which is normally only >>>>>>>> used when you're looking up by filehandle. >>>>>>>> >>>>>>>> Except btrfs has a weird use in get_default_root(). >>>>>>>> So maybe they were running into the dentry that >>>>>>>> created? >>>>>>>> >>>>>>>> So btrfs should probably be using something else, I'm >>>>>>>> not sure what. >>>>>>> >>>>>>> The nfs client also has the case where it uses >>>>>>> DISCONNECTED dentries for directories that are not root >>>>>>> on the server. Which seems very similiar to the btrfs >>>>>>> case. >>>>>> >>>>>> I don't think there's any reason for those to be flagged >>>>>> DISCONNECTED either. >>>>> >>>>> The only practical difference between the two cases is how >>>>> quickly it is desirable to connect the entries. >>>>> >>>>> The disconnected dentries processed by exportfs are >>>>> dentries that we want to connect immediately, and it is an >>>>> error/problem to have the disconnected after processing. >>>>> >>>>> The dentries that are the roots of file systems we want to >>>>> connect them if we get the chance with d_materialise_unique >>>>> but we don't care if they go long periods without being >>>>> connected. >>>>> >>>>> I believe we want both groups of dentries on the s_anon >>>>> list so that if they remain disconnected when the >>>>> filesystem is unmounted we can find them and deal with >>>>> them. >>>> >>>> Note it's IS_ROOT(), not DCACHE_DISCONNECTED, that >>>> determines whether a hashed dentry is on s_anon or not. (See >>>> 7632e465feb182cadc3c9aa1282a057201818a8c for more detailed >>>> discussion.) >>>> >>>>> I can see distinguishing between dentries that are supposed >>>>> to be disconnected for a short time, and dentries that are >>>>> supposed to be disconnected indefinitely but we currently >>>>> (as of 3.14-rc1) don't have that distinction. >>>> >>>> I believe we do: DCACHE_DISCONNECTED is for the former, and >>>> the latter should be IS_ROOT(), !DCACHE_DISCONNECTED. >>>> >>>> DCACHE_DISCONNECTED was intended to be used only for >>>> dentries created while performing lookup-by-filehandle which >>>> have not yet been confirmed to be linked back to filesystem >>>> root by a chain of ->d_parent pointers. >>>> >>>>> But a blanket statement that the long term disconnected >>>>> dentries are doing it wrong seems off base. >>>>> >>>>> If those dentires can tolerate not being on the s_anon list >>>>> d_alloc_pseudo or d_make_root looks like it will serve >>>>> just as well >>>> >>>> The difference is that d_alloc_pseudo and d_make_root >>>> unconditionally create new dentries, whereas >>>> d_materialise_unique lets us reuse an existing dentry. >>>> >>>> (In the NFS client case, I believe this happens for example >>>> when you mount export A from a server, then export B from the >>>> same server, and then one day you look up A/foo and find out >>>> that it's the same directory as B's root. You don't want to >>>> duplicate the inode or give it multiple dentries, so you >>>> instead reuse the existing IS_ROOT dentry for B to represent >>>> A/foo. >>>> >>>> In the btrfs case I guess it's a similar situation but with >>>> two subvolumes instead of two exports?) >>>> >>>>> from the perspective of d_materialise_unique, but that >>>>> leaves me with the queasy feeling that we will leak >>>>> dentries and inodes when we unmount the filesystems in >>>>> question, if those dentries have never been attached. >>>> >>>> In the NFS server (lookup-by-filehandle) case, I believe >>>> dentries in the process of being reconnected are all children >>>> of some IS_ROOT dentry which is on the s_anon list, so >>>> everyone's accounted for. >>>> >>>> >>>> Thanks for looking at this as I've found myself easily >>>> confused by it all.... (And judging by some of the code in >>>> the tree I'm not alone.) >>>> > > Ok using d_make_root makes us leak inodes on umount (I don't know > why, I'm not drunk enough yet.) > >> Oh, well, a little early in the morning, but that's a solveable >> problem. > > I haven't tried it yet but I'm almost 100% positive if I just clear > out DISCONNECTED we will BUG_ON() in d_splice_alias() when we go to > walk into the subvol we've mounted with the default subvol option. > >> I'd expect it not to BUG_ON(), but instead to fall into the !new >> case and create a second alias for that directory. Which >> probably actually causes the same problems d_make_root does. > (weird that thunderbird is treating my replies as the most recent reply, but whatever) So neither of these cases happened, I just didn't see _anything_ in the directory when I mounted the fs. Once I mounted the parent subvol and walked into the default subvol I could see everything properly. This is with clearing DISCONNECTED and still using d_splice_alias() in lookup. Once I switched d_splice_alias() to d_materialise_unique() in btrfs_lookup() everything went back to working as normal. So I'm calling it a win and forgetting we ever had this conversation. Thanks, Josef -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJS/kwfAAoJEANb+wAKly3BHSIP/jx01XDxT98Tk4TCRz2R12Cb HmgWZ8etu82MPp/KChuXnidxJU00H7BOfaY4vs1q8XFz+MZg3ZCiazp5zu2RVPct yjMYRY7XPEQcce0Oj5h40dfhclJ61eSupuUd7u+gNpNErsUdaFCpbtXGqVeGAuI4 lMbi8Tqd5YKNa3e+bs8OHEGRG+pQv1Ak2D+dcS+KDn3/Np38wC9eGRsQsZocubQf EpjERcX3R8ug3k5C/sX15SkpVfHWZS2ydL0ypnOVAI5XZD9ufT+HyyAqhgFZFPlK WhDTWH54gnq6e2RxU3bKN+q7VzwBD2L/82wHk0/bykXDL+56JRJ0MMptBimWt2M2 QH7+aUN3pM/hm0TiW3qC6WkfM4WBdEME/Mzf8s5vVJT4gQj7u0Xh3RlOX0Y+OEwV MvICzlXNNT9jbp7EHHzDeJuSbvpLmvVOL24Tf9UmjYO2WXHIJK2bSW1ai2V5v+zx jZ+NeAg8tfkpxKANbDxtOAzrehHTk8gPimsO3QXFlRvUyUggZdd524Chl07dxnQZ JNvn4C9JrRkYE32d4dKrjmJvnQXsFKw1KlCvp14nLpQP6q5+J70KRo85NtcpK3fo tUU24jzCD7m6W+Y5tT5rJgDLz7IAmTZ/1WTI9bc9YMk61ci7GlLE0g4hYbTWfS5h Q0A5+xddyQT/I5HtCoPZ =dBiR -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html