On Mon, Mar 01, 2010 at 01:34:35PM +0100, Michal Schmidt wrote: > mpage_da_submit_io() may process tens of thousands of pages at a time. > Unless full preemption is enabled, it causes scheduling latencies in the order > of tens of milliseconds. > > It can be reproduced simply by writing a big file on ext4 repeatedly with > dd if=/dev/zero of=/tmp/dummy bs=10M count=50 > > The patch fixes it by allowing to reschedule in the loop. > > cyclictest can be used to measure the latency. I tested with: > $ cyclictest -t1 -p 80 -n -i 5000 -m -l 20000 > > The results from an UP AMD Turion 2GHz with voluntary preemption: > > Without the patch: > T: 0 ( 2535) P:80 I:5000 C: 20000 Min: 12 Act: 23 Avg: 3166 Max: 70524 > (i.e. Average latency was more than 3 ms. Max observed latency was 71 ms.) > > With the patch: > T: 0 ( 2588) P:80 I:5000 C: 20000 Min: 13 Act: 33 Avg: 49 Max: 11009 > (i.e. Average latency was only 49 us. Max observed latency was 11 ms.) Have you tested for any performance regressions as a result of this patch, using some file system benchmarks? I don't think this is the best way to fix this problem, though. The real right answer is to change how the code is structued. All of the callsites that call mpage_da_submit_io() are immediately preceeded by mpage_da_map_blocks(). These two functions should be combined and instead of calling ext4_writepage() for each page, mpage_da_map_and_write_blocks() should make a single call to submit_bio() for each extent. That should far more CPU efficient, solving both your scheduling latency issue as well as helping out for benchmarks that strive to stress both the disk and CPU simultaneously (such as for example the TPC benchmarks). This will also make our blktrace results much more compact, and Chris Mason will be very happy about that! - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html