On Fri 22-04-11 10:32:26, Wu Fengguang wrote: > On Fri, Apr 22, 2011 at 12:41:54AM +0800, Jan Kara wrote: > > On Thu 21-04-11 14:05:56, Wu Fengguang wrote: > > > On Thu, Apr 21, 2011 at 12:39:40PM +0800, Christoph Hellwig wrote: > > > > On Thu, Apr 21, 2011 at 11:33:25AM +0800, Wu Fengguang wrote: > > > > > I collected the writeback_single_inode() traces (patch attached for > > > > > your reference) each for several test runs, and find much more > > > > > I_DIRTY_PAGES after patchset. Dave, do you know why there are so many > > > > > I_DIRTY_PAGES (or radix tag) remained after the XFS ->writepages() call, > > > > > even for small files? > > > > > > > > What is your defintion of a small file? As soon as it has multiple > > > > extents or holes there's absolutely no way to clean it with a single > > > > writepage call. > > > > > > It's writing a kernel source tree to XFS. You can find in the below > > > trace that it often leaves more dirty pages behind (indicated by the > > > I_DIRTY_PAGES flag) after writing as less as 1 page (indicated by the > > > wrote=1 field). > > As Dave said, it's probably just a race since XFS redirties the inode on > > IO completion. So I think the inodes are just small so they have only a few > > dirty pages so you don't have much to write and they are written and > > redirtied before you check the I_DIRTY flags. You could use radix tree > > dirty tag to verify whether there are really dirty pages or not... > > Yeah, Dave and Christoph root caused it in the other email -- XFS sets > I_DIRTY which accidentally sets I_DIRTY_PAGES. We can safely bet there > are no real dirty pages -- otherwise it would have turned up as > performance regressions. Yes, but then the question what we actually do better is still open, right? :) I'm really curious what it could be because especially in your copy-kernel case I should not make much different - maybe except if we occasionally managed to block on PageLock behind the writing thread and now we don't because we queue the inode later but I find that highly unlikely. > > BTW a quick check of kernel tree shows the following distribution of > > sizes (in KB): > > Count KB Cumulative Percent > > 257 0 0.9% > > 13309 4 45% > > 5553 8 63% > > 2997 12 73% > > 1879 16 80% > > 1275 20 83% > > 987 24 87% > > 685 28 89% > > 540 32 91% > > 387 36 ... > > 309 40 > > 264 44 > > 249 48 > > 170 52 > > 143 56 > > 144 60 > > 132 64 > > 100 68 > > ... > > Total 30155 > > > > And the distribution of your 'wrote=xxx' roughly corresponds to this... > > Nice numbers! How do you manage to account them? :) Easy shell command (and I handcomputed the percentages because I was lazy to write a script for that): find . -type f -name "*.[ch]" -exec du {} \; | cut -d ' ' -f 1 | sort -n | uniq -c Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>