On Sat, Jun 29, 2013 at 12:03 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Fri, Jun 21, 2013 at 08:17:09PM +0800, zwu.kernel@xxxxxxxxx wrote: >> From: Zhi Yong Wu <wuzhy@xxxxxxxxxxxxxxxxxx> >> >> The patchset is trying to introduce hot tracking function in >> VFS layer, which will keep track of real disk I/O in memory. >> By it, you will easily know more details about disk I/O, and >> then detect where disk I/O hot spots are. Also, specific FS >> can take use of it to do accurate defragment, and hot relocation >> support, etc. >> >> After V1 was sent out, Chandra Seetharaman has reviewed and >> made a lot of comments, thanks a lot to him. Now it's time to >> send out its V3 for external review, any comments or ideas are >> appreciated, thanks. > > First of all, my apologies for obscenely long delay with review. I've It is so exciting when you made the comments, thanks a lot. > started doing it several times and dropped these attempts getting mired > in the nightmare of lifetime rules in there. What I should've done was > sending the obvious low-hanging fruits right then and waiting for the > variant with that stuff sanitized ;-/ > > First of all, one general observation: please, separate inode and range stuff > clearly; do *not* use the same functions on both, it only makes harder to read. > I.e. kill hot_comm_item and split the functions that take it. This code is OK, i will follow up with it. > trying to be too generic for its own good and ends up being obfuscated to hell > and back... > > refcounting: > * the whole refcount + "DELETING" flag approach is a bad idea. Do you have any better suggestion? i adopt this way mainly to void this object which has been deleted will be relinked to the hot list when the hot worker is issued. > hot_comm_item_unlink() tries to be idempotent *and* includes dropping the > reference on its first call. Schemes like that tend to be either pointless > (i.e. we know that it won't be called twice for the same object) or prone > to stepping on dangling pointers. This one does the latter (at least). Do you mean we should the condition determination move test_and_set_bit() out of hot_comm_item_unlink()? > * lookups (both for inode and range) leak on race with unlink. > You grab a reference, drop the lock, check if HOT_DELETING is set and > return an error if it is; how the hell could the caller possibly undo > that reference grab, when it has no idea what object had been grabbed I don't get what you mean, do you elaborate it with more details? which caller? which scenario will it take place in? > in the first place? Moreover, that check does *not* prevent getting > an object with HOT_DELETING from those functions - unlink coming just > after that check will be unnoticed. Good catch, thanks. > > hot_inode_item_delete() mess: > * hot_inode_item_delete() is done on unlink(2), no matter how many > links are there. Why? Good catch, will fix it. > * hot_inode_item_delete() is done even if unlink() fails (e.g. with > EBUSY, or on whatever error ->unlink() might return). Good catch, will fix it. > * hot_inode_item_delete() grabs a reference to hot_inode_item, > *drops* *it*, then does hot_comm_item_unlink(). What protects it from being > freed right as we drop the damn reference? Good catch, we should do hot_comm_item_unlink() at first, then *drop* this ref. > > debugfs-related issues - debugfs is completely unsuitable for dynamic > objects and you step into that big way, in addition to races of your **** Is debugfs also completely unsuitable for dynamic objects even though we fix the following issues listed by you? Do you have any better way about this? > own: > * creation of hot_debugfs_root is racy - WTF prevents hot_debugfs_root is public for all disk volumes, and is debugfs root for hot tracking. Why do you think it is racy? > two hot_track_init() in parallel? I think that mount() will make sure hot_track_init() will be done once for the same super block, right? hot_track_init will be done *only* when mount is issued. That is, mount() can not make sure hot_track_init() is done serially? > * if creation fails, we leave hot_debugfs_root ERR_PTR(something); > from that point on, no attempts to create it will be done (check for > hot_debugfs_root being *NULL*) Good catch, will fix it. > * what guarantees that sb->s_id is unique? sorry, can you let me know where sb->s_id will be not unique? > * removal on failure - screwed; first of all, list_empty() On failure, it must be removed? no. > will *not* be true if we have somebody open it (cursors are inserted into yes, in this case, it will not be removed. what issue come here? > that list). Moreover, what's to stop another hot_debugfs_init() from > being called just as we are doing debugfs_remove(hot_debugfs_root) and > see that it's non-NULL *before* we get to assigning NULL there? Good catch, it need to be reentrance. > * hot_debugfs_exit() - screwed in the same way. Ditto. > * debugfs files get ->i_private set to corresponding sb->s_hot_root. > It's copied to seq->private on open() and used by iterators. WTF prevents > open on debugfs, followed by umount of corresponding btrfs volume, freeing of > sb->s_hot_root and then read() on our file stepping into kfree'd hot_root > in ->start()? Good catch, will fix it. -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html