On Fri, Aug 08, 2014 at 08:18:45PM -0700, Darrick J. Wong wrote: > Hi all, > > Since I this email last week, I rewrote the prefetch algorithms for pass 1 and "Since I last replied to the e2fsck readahead patch last week..." > 2 and separated thread support into a separate patch. Upon discovering that > issuing a POSIX_FADV_DONTNEED call caused a noticeable increase (of about 2-5% > points) on fsck runtime, I dropped that part out. > > In pass 1, we now walk the group descriptors looking for inode table blocks to > read until we have found enough to issue a $readahead_kb size readahead > command. The patch also computes the number of the first inode of the last > inode buffer block of the last group of the readahead group and schedules the > next readahead to occur when we reach that inode. This keeps the readahead > running at closer to full speed and eliminates conflicting IOs between the > checker thread and the readahead. > > For pass 2, readahead is broken up into $readahead_kb sized chunks instead of > issuing all of them at once. This should increase the likelihood that a block > is not evicted before pass2 tries to read it. > > Pass 4's readahead remains unchanged. > > The raw numbers from my performance evaluation of the new code live here: > https://docs.google.com/spreadsheets/d/1hTCfr30TebXcUV8HnSatNkm4OXSyP9ezbhtMbB_UuLU > > This time, I repeatedly ran e2fsck -Fnfvtt with various sizes of readahead > buffer to see how that affected fsck runtime. The run times are listed in the > table at row 22, and I've created a table at row 46 to show % reduction in > e2fsck runtime. I tried (mostly) power-of-two buffer sizes from 1MB to 1GB; as > you can see, even a small amount of readahead can speed things up quite a lot, > though the returns diminish as the buffer sizes get exponentially larger. USB > disks suffer across the board, probably due to their slow single-issue nature. > Hopefully UAS will eliminate that gap, though currently it just crashes my > machines. > > Note that all of these filesystems are formatted ext4 with an per-group inode > table size of 2MB, which is probably why readahead=2MB seems to win most often. > I think 2MB is a small enough amount that we needn't worry about thrashing > memory in the case of parallel e2fsck, particularly because with a small > readahead amount, e2fsck is most likely going to demand the blocks fairly soon > anyway. The design of the new pass1 RA code won't issue RA for a fraction of a > block group's inode table blocks, so I propose setting RA to blocksize * > inode_blocks_per_group. I forgot to mention that I'll disable RA if the buffer size is greater than 1/100th of RAM. --D > > On a lark I fired up an old ext3 filesystem to see what would happen, and the > results generally follow the ext4 results. I haven't done much digging into > ext3 though. Potentially, one could prefetch the block map blocks when reading > in another inode_buffer_block's worth of inode tables. > > Will send patches soon. > > --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html