On Thu, 2008-08-14 at 12:05 -0700, Andrew Morton wrote: > On Thu, 14 Aug 2008 14:13:40 -0400 > Chris Mason <chris.mason@xxxxxxxxxx> wrote: > > > write_cache_pages uses i_mapping->writeback_index to pick up where it > > left off the last time a given inode was found by pdflush or > > balance_dirty_pages (or anyone else who sets wbc->range_cyclic) > > > > alloc_inode should set it to a sane value so that writeback doesn't start > > in the middle of a file. It is somewhat difficult to notice the bug since > > write_cache_pages will loop around to the start of the file and the > > elevator helps hide the resulting seeks. > > > > For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of > > the linux kernel in series runs at 47MB/s on a single sata drive. With > > this fix, it jumps to 62MB/s. [ ... ] > inode_cachep's ctor memsets the whole inode* to zero. > > Does btrfs have an ->alloc_inode handler? If so, perhaps it is btrfs > which needs to zap that field. > Yes, but I wasn't touching ->i_mapping at all. I do have a patch in btrfs now to zero the writeback_index field for now though. > Formally, the way slab works (which is arguably silly from a CPU cache > efficiency POV) is that the object should be returned to slab > (kmem_cache_free()) in a "constructed" manner. Which means that this > zeroing should happen in destroy_inode(). > > otoh, the fs/inode.c inode creation/destruction code isn't designed > that way - it chose to initialise everything synchronously in > alloc_inode(), which is arguably wrong. > At least for filesystems, it seems much less error prone to force us to init on alloc. > One wonders if that big memset in inode_init_once() actually does > anything useful. > > > Ho hum, so what to do? I'm suspecting that we should just zap the > kmem_cachep constructor altogether, use a synchronous call to > inode_init_once() to initialise all inode fields and then remove all > those open-coded zeroings. > > > Something like this? > [ looks good ] > > Note that this assumes that a filesystem which implements > ->alloc_inode() will call inode_init_once() within its ->alloc_inode(). > > Which means, I think, that we can just remove inode_init_once() > altogether and move its initialisations into alloc_inode() along with > all the existing ones. > > What do you think? > There's the silent breakage of out of tree FS, but, your patch seems much cleaner to me. > > I suppose that's all a fun project but doesn't preclude merging your > patchlet. I'll do that for now ;) > ;) thanks. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html