Re: __d_find_alias() problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2011-12-22 at 08:11 -0800, Sage Weil wrote:
> On Thu, 22 Dec 2011, Alex Elder wrote:
> > I'll send all this to ceph-devel later, I just wanted
> [added cc]
> 
> > to document it while it's fresh and send it to *someone*
> > and you made this change so I'm sending it to you.  The
> > change in question is:
> > 
> >     be655596b3de5873f994ddbe205751a5ffb4de39
> >     ceph: use i_ceph_lock instead of i_lock
> > 
> > First, after updating my ceph tree and rebuilding I found
> > that every time I completed a mount in my UML environment
> > it would immediately crash the UML.
> > 
> > This led to me learning how to attach a debugger to
> > the UML (good) and this morning I was able to capture
> > this situation in gdb.
> > 
> > The problem is in ceph_dir_set_complete(), which is
> > blindly dereferencing the return value of
> > __d_find_any_alias(inode), and that function is
> > returning NULL because the inode's i_dentry list
> > is empty.
> > 
> > So that's problem 1.
> 
> Yeah, looks like I broke that with 
> 9f6f8f50663e6cf42cca64f1f5ee0ca438c33c46.  Just moving the dentry->d_sb 
> inside the if should do it!  Strange that this didn't trigger on sepia... 
> :/

I have a patch that fixes this one and I have confirmed I
can now get my UML filesystem mounted with it applied.

> > In looking at this I find that __d_find_any_alias()
> > is duplicated in ceph, to (almost) match the Linux
> > implementation which is not exported.  But unlike
> > the Linux version, the ceph version does NOT grab
> > a reference to the found alias.  I consider this a
> > bug, because a function with an identical name should
> > perform identically to the original, otherwise it
> > should be named something different.
> > 
> > That's problem 2.
> > Furthermore, I looked at the locking that's done for
> > the Linux version of __d_find_any_alias, and that
> > truly does require taking the Linux inode lock--the
> > (new) ceph inode i_ceph_lock is not sufficient.  Note
> > that __d_find_any_alias() really only operates on the
> > Linux inode and does nothing with the ceph inode, so
> > it may be that simply changing the locking used *in
> > this case* back to using the Linux inode lock is
> > enough to fix the problem.  But I'm not familiar
> > enough at this point to know the locking context
> > where these calls are made, so can't be sure doing
> > this would cause problems (like lock inversions).
> > 
> > There are three places this __d_find_any_alias()
> > is called:  ceph_dir_{set,clear,test}_complete().
> > Those are straightforward, but the calls to *those*
> > functions are mostly embedded in long strings of
> > logical conditions so it's going to take a little
> > work to unwind this.
> > 
> > So that's problem 3, which is the biggest one.
> 
> I think the __d_find_any_alias() needs to take the i_lock and a reference 
> to make it fully safe.  i_lock is ordered inside i_ceph_lock, now, so that 
> should be safe in all of the call chains.  Better yet, we should just 
> export the VFS version and use that.

I'll put together a patch that both adds the reference,
and fixes callers so they release it properly.

> What do you think?  A simple fix is best so we can get it into 3.2 (which 
> is about a week away?).

What I described is simplest.  I don't want to suggest
a VFS change at this point.

> > Potential problem 4 is that this suggests the
> > mechanical switch to using i_ceph_lock is a little
> > suspect and might need a closer look.
> 
> Yeah.  Well, this is one of the few cases where i_lock where we muddle 
> with the dcache internal structure.  The others should pretty much all be 
> in dir.c, and maybe inode.c's splice_dentry() and friends.  The dcache 
> readdir is by far the most suspect of all of those, but also probably the 
> lowest priority at the moment.  Maybe your visit in January would be a 
> good time to go over it in detail?

Sure.  Or I could review the committed patch.  For now
I'll get something togehter for the 3.2 release.

					-Alex

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux