On Tue 24-01-12 15:13:40, Jeff Moyer wrote: > Jan Kara <jack@xxxxxxx> writes: > > > On Tue 24-01-12 14:14:14, Jeff Moyer wrote: > >> Chris Mason <chris.mason@xxxxxxxxxx> writes: > >> > >> >> All three filesystems use the generic mpages code for reads, so they > >> >> all get the same (bad) I/O patterns. Looks like we need to fix this up > >> >> ASAP. > >> > > >> > Can you easily run btrfs through the same rig? We don't use mpages and > >> > I'm curious. > >> > >> The readahead code was to blame, here. I wonder if we can change the > >> logic there to not break larger I/Os down into smaller sized ones. > >> Fengguang, doing a dd if=file of=/dev/null bs=1M results in 128K I/Os, > >> when 128KB is the read_ahead_kb value. Is there any heuristic you could > >> apply to not break larger I/Os up like this? Does that make sense? > > Well, not breaking up I/Os would be fairly simple as ondemand_readahead() > > already knows how much do we want to read. We just trim the submitted I/O to > > read_ahead_kb artificially. And that is done so that you don't trash page > > cache (possibly evicting pages you have not yet copied to userspace) when > > there are several processes doing large reads. > > Do you really think applications issue large reads and then don't use > the data? I mean, I've seen some bad programming, so I can believe that > would be the case. Still, I'd like to think it doesn't happen. ;-) No, I meant a cache thrashing problem. Suppose that we always readahead as much as user asks and there are say 100 processes each wanting to read 4 MB. Then you need to find 400 MB in the page cache so that all reads can fit. And if you don't have them, reads for process 50 may evict pages we already preread for process 1, but process one didn't yet get to CPU to copy the data to userspace buffer. So the read becomes wasted. > > Maybe 128 KB is a too small default these days but OTOH noone prevents you > > from raising it (e.g. SLES uses 1 MB as a default). > > For some reason, I thought it had been bumped to 512KB by default. Must > be that overactive imagination I have... Anyway, if all of the distros > start bumping the default, don't you think it's time to consider bumping > it upstream, too? I thought there was a lot of work put into not being > too aggressive on readahead, so the downside of having a larger > read_ahead_kb setting was fairly small. Yeah, I believe 512KB should be pretty safe these days except for embedded world. OTOH average desktop user doesn't really care so it's mostly servers with beefy storage that care... (note that I wrote we raised the read_ahead_kb for SLES but not for openSUSE or SLED (desktop enterprise distro)). Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel