Re: [PATCH v3 00/13] VFS hot tracking

Zhi Yong Wu <zwu.kernel@xxxxxxxxx> · Mon, 1 Jul 2013 21:19:08 +0800



On Sat, Jun 29, 2013 at 12:03 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Jun 21, 2013 at 08:17:09PM +0800, zwu.kernel@xxxxxxxxx wrote:
>> From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx>
>>
>>   The patchset is trying to introduce hot tracking function in
>> VFS layer, which will keep track of real disk I/O in memory.
>> By it, you will easily know more details about disk I/O, and
>> then detect where disk I/O hot spots are. Also, specific FS
>> can take use of it to do accurate defragment, and hot relocation
>> support, etc.
>>
>>   After V1 was sent out, Chandra Seetharaman has reviewed and
>> made a lot of comments, thanks a lot to him. Now it's time to
>> send out its V3 for external review, any comments or ideas are
>> appreciated, thanks.
>
> First of all, my apologies for obscenely long delay with review.  I've
It is so exciting when you made the comments, thanks a lot.
> started doing it several times and dropped these attempts getting mired
> in the nightmare of lifetime rules in there.  What I should've done was
> sending the obvious low-hanging fruits right then and waiting for the
> variant with that stuff sanitized ;-/
>
> First of all, one general observation: please, separate inode and range stuff
> clearly; do *not* use the same functions on both, it only makes harder to read.
> I.e. kill hot_comm_item and split the functions that take it.  This code is
OK, i will follow up with it.
> trying to be too generic for its own good and ends up being obfuscated to hell
> and back...
>
> refcounting:
>         * the whole refcount + "DELETING" flag approach is a bad idea.
Do you have any better suggestion? i adopt this way mainly to void
this object which has been deleted will be relinked to the hot list
when the hot worker is issued.
> hot_comm_item_unlink() tries to be idempotent *and* includes dropping the
> reference on its first call.  Schemes like that tend to be either pointless
> (i.e. we know that it won't be called twice for the same object) or prone
> to stepping on dangling pointers.  This one does the latter (at least).
Do you mean we should the condition determination move
test_and_set_bit() out of hot_comm_item_unlink()?
>         * lookups (both for inode and range) leak on race with unlink.
> You grab a reference, drop the lock, check if HOT_DELETING is set and
> return an error if it is; how the hell could the caller possibly undo
> that reference grab, when it has no idea what object had been grabbed
I don't get what you mean, do you elaborate it with more details?
which caller? which scenario will it take place in?
> in the first place?  Moreover, that check does *not* prevent getting
> an object with HOT_DELETING from those functions - unlink coming just
> after that check will be unnoticed.
Good catch, thanks.
>
> hot_inode_item_delete() mess:
>         * hot_inode_item_delete() is done on unlink(2), no matter how many
> links are there.  Why?
Good catch, will fix it.
>         * hot_inode_item_delete() is done even if unlink() fails (e.g. with
> EBUSY, or on whatever error ->unlink() might return).
Good catch, will fix it.
>         * hot_inode_item_delete() grabs a reference to hot_inode_item,
> *drops* *it*, then does hot_comm_item_unlink().  What protects it from being
> freed right as we drop the damn reference?
Good catch, we should do hot_comm_item_unlink() at first, then *drop* this ref.
>
> debugfs-related issues - debugfs is completely unsuitable for dynamic
> objects and you step into that big way, in addition to races of your
**** Is debugfs also completely unsuitable for dynamic objects even
though we fix the following issues listed by you? Do you have any
better way about this?
> own:
>         * creation of hot_debugfs_root is racy - WTF prevents
hot_debugfs_root is public for all disk volumes, and is debugfs root
for hot tracking.
Why do you think it is racy?
> two hot_track_init() in parallel?
I think that mount() will make sure hot_track_init() will be done once
for the same super block, right? hot_track_init will be done *only*
when mount is issued.
That is, mount() can not make sure hot_track_init() is done serially?

>         * if creation fails, we leave hot_debugfs_root ERR_PTR(something);
> from that point on, no attempts to create it will be done (check for
> hot_debugfs_root being *NULL*)
Good catch, will fix it.
>         * what guarantees that sb->s_id is unique?
sorry, can you let me know where sb->s_id will be not unique?
>         * removal on failure - screwed; first of all, list_empty()
On failure, it must be removed? no.
> will *not* be true if we have somebody open it (cursors are inserted into
yes, in this case, it will not be removed. what issue come here?
> that list).  Moreover, what's to stop another hot_debugfs_init() from
> being called just as we are doing debugfs_remove(hot_debugfs_root) and
> see that it's non-NULL *before* we get to assigning NULL there?
Good catch, it need to be reentrance.
>         * hot_debugfs_exit() - screwed in the same way.
Ditto.
>         * debugfs files get ->i_private set to corresponding sb->s_hot_root.
> It's copied to seq->private on open() and used by iterators.  WTF prevents
> open on debugfs, followed by umount of corresponding btrfs volume, freeing of
> sb->s_hot_root and then read() on our file stepping into kfree'd hot_root
> in ->start()?
Good catch, will fix it.


--
Regards,

Zhi Yong Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html