Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 12, 2011 at 08:56:35PM -0700, Linus Torvalds wrote:

> It would take the locks a few more times, but it would avoid the nasty
> lock ordering issues exactly because it drops them in between rather
> than try to hold them all at once. And d_move() is *not* a performance
> critical area, afaik, so the point would be to make the locking more
> straightforward and avoid the horrible subtlety.
> 
> Crazy? Stupid? What am I missing? It's probably something obvious.

	It's loops over d_subdirs of given dentry.  IOW, dentry can
be found not just via the hash chains and those iterators (ones outside
of fs/dcache.c) don't play with rename seqlock.  They just rely on ->d_lock
on dentry being enough to stabilize the list of children.

It's not all that horrible, since i_mutex on parent (held in quite a
few of those) is enough to prevent d_move() altogether.  Or iterator
sits on fs that never does d_move/d_materialise_unique at all.  There
are exceptions. The most generic one is __fsnotify_update_child_dentry_flags(),
but that's not all.  E.g. coda ->d_revalidate(), possibly some ceph code...
Overall, it might be doable, but it'll take some massage in strange places.
Hell knows; I think at that point in release cycle we don't want to go
there.  After looking through ->d_lock users I'm mostly convinced that
unlazy_walk() fix + making damn sure we don't try to create loops in
d_materialise_unique() + adding missing ->i_mutex where needed is the
way to go for now.  We have several places where ->d_lock overlap is
not parent-to-child, but these really can't lead to deadlocks - fs is
specialized enough to prevent them.

However, looking through these loops over ->d_subdirs shows *another* pile of
bugs, like this one:
        mutex_lock(&bus->d_inode->i_mutex);

        list_for_each_entry(dev, &bus->d_subdirs, d_u.d_child)
                if (dev->d_inode)
                        update_dev(dev);

        mutex_unlock(&bus->d_inode->i_mutex);
in drivers/usb/core/inode.c:update_bus().  See the problem?  Sure,
it's ramfs-like: all dentries are pinned down until unlink/rmdir,
which is blocked by i_mutex on parent.  And yes, d_move() is also
not a problem (ditto).  However, if you open a file and unlink it,
dentry will be unpinned, but stay around.  It will be unhashed,
but it'll be on ->d_subdir.  Close the file and dentry will be killed,
without i_mutex on parent.  Note that ->d_locked will be taken -
both on victim and parent.  But the quoted sucker takes no ->d_lock
at all.  Bummer...

IOW, iterators relying _purely_ on i_mutex on parent are currently fucked.
We have more such cases.  Another one is debugfs_remove_recursive().

I wonder if we simply ought to evict the sucker from d_subdirs/d_child
in simple_unlink()...  It would require verifying that users of that
animal are OK with that, but I suspect that most of them will be happy.
It might simplify life for readdir() and llseek() on such directories,
while we are at it...  The "hunt for dentry leaks" logics on umount
might become slightly less useful for those filesystems (we'll get
"parent is misteriously busy" instead of "child is unlinked, but somehow
busy on fs shutdown"), but I doubt it's that serious.  Hell knows -
I never had to hunt dentry leaks on debugfs/usbfs/etc.

BTW, usbfs tree iterators should've been killed a long time ago - we
simply need ->permission() and ->getattr() added for usbfs to DTRT
instead of this "iterate over all inodes, patch new i_uid/i_gid/i_mode
when remount asks to do that" crap.

BTW^2: in clk_debugfs_register_one() in assorted arch/arm/mach-*/clock.c
        d = c->dent;
        list_for_each_entry_safe(child, child_tmp, &d->d_subdirs, d_u.d_child)
                debugfs_remove(child);
        debugfs_remove(c->dent);
should be debugfs_remove_recursive().  Here we don't even bother with
i_mutex...

BTW^3: sel_remove_classes() plays very unsafe games - no ->i_mutex at all, no
->d_lock until we get leaf directories.  Root-only, but...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux