Re: [PATCH v3 00/13] VFS hot tracking

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Wed, 3 Jul 2013 14:30:51 +0100

On Mon, Jul 01, 2013 at 09:19:08PM +0800, Zhi Yong Wu wrote:

> >         * lookups (both for inode and range) leak on race with unlink.
> > You grab a reference, drop the lock, check if HOT_DELETING is set and
> > return an error if it is; how the hell could the caller possibly undo
> > that reference grab, when it has no idea what object had been grabbed
> I don't get what you mean, do you elaborate it with more details?
> which caller? which scenario will it take place in?

Sigh...  Suppose that test has failed and you've returned ERR_PTR(-ENOENT).
If that ever happens (and I don't see what would prevent that, seeing
that e.g. unlink(2) can happen at any point, that you are not holding any
locks at that point and that unlink(2) wouldn't care for any of your locks
anyway) you are going to have a leak - you've grabbed a reference a few
lines above and once you return, there's no way to tell which object had
been grabbed, let alone do the matching kref_put().

> > debugfs-related issues - debugfs is completely unsuitable for dynamic
> > objects and you step into that big way, in addition to races of your
> **** Is debugfs also completely unsuitable for dynamic objects even
> though we fix the following issues listed by you? Do you have any
> better way about this?

The last issue in that list (IO hours after object removal) is just about
unfixable without debugfs overhaul.

> >         * creation of hot_debugfs_root is racy - WTF prevents
> hot_debugfs_root is public for all disk volumes, and is debugfs root
> for hot tracking.
> Why do you think it is racy?
> > two hot_track_init() in parallel?
> I think that mount() will make sure hot_track_init() will be done once
> for the same super block, right? hot_track_init will be done *only*
> when mount is issued.
> That is, mount() can not make sure hot_track_init() is done serially?

What the devil would serialize mount on completely unrelated devices?
And what for?

> >         * what guarantees that sb->s_id is unique?
> sorry, can you let me know where sb->s_id will be not unique?

On any number of filesystem types it isn't; are you making that a restriction
on fs types that can use your stuff?

> >         * removal on failure - screwed; first of all, list_empty()
> On failure, it must be removed? no.

Then what would eventually remove it?  And why bother with lazy creation,
if so?

> >         * hot_debugfs_exit() - screwed in the same way.
> Ditto.

Again, if you don't mind that thing sticking around indefinitely, just
because somebody happened to do ls on it at the moment it would've been
removed otherwise, WTF remove it at all?

> >         * debugfs files get ->i_private set to corresponding sb->s_hot_root.
> > It's copied to seq->private on open() and used by iterators.  WTF prevents
> > open on debugfs, followed by umount of corresponding btrfs volume, freeing of
> > sb->s_hot_root and then read() on our file stepping into kfree'd hot_root
> > in ->start()?
> Good catch, will fix it.

How?

As for the lifetime rules suggestions - depends on what you want to achieve.
How long should those objects live?  In which cases can we get an attempt
of ...unlink... more than once on the same object?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html