On Wed, Jan 25, 2012 at 04:40:23PM +0000, Steven Whitehouse wrote: > Hi, > > On Wed, 2012-01-25 at 11:22 -0500, Loke, Chetan wrote: > > > If the reason for not setting a larger readahead value is just that it > > > might increase memory pressure and thus decrease performance, is it > > > possible to use a suitable metric from the VM in order to set the value > > > automatically according to circumstances? > > > > > > > How about tracking heuristics for 'read-hits from previous read-aheads'? If the hits are in acceptable range(user-configurable knob?) then keep seeking else back-off a little on the read-ahead? > > > > > Steve. > > > > Chetan Loke > > I'd been wondering about something similar to that. The basic scheme > would be: > > - Set a page flag when readahead is performed > - Clear the flag when the page is read (or on page fault for mmap) > (i.e. when it is first used after readahead) > > Then when the VM scans for pages to eject from cache, check the flag and > keep an exponential average (probably on a per-cpu basis) of the rate at > which such flagged pages are ejected. That number can then be used to > reduce the max readahead value. > > The questions are whether this would provide a fast enough reduction in > readahead size to avoid problems? and whether the extra complication is > worth it compared with using an overall metric for memory pressure? > > There may well be better solutions though, The caveat is, on a consistently thrashed machine, the readahead size should better be determined for each read stream. Repeated readahead thrashing typically happen in a file server with large number of concurrent clients. For example, if there are 1000 read streams each doing 1MB readahead, since there are 2 readahead window for each stream, there could be up to 2GB readahead pages that will sure be thrashed in a server with only 1GB memory. Typically the 1000 clients will have different read speeds. A few of them will be doing 1MB/s, most others may be doing 100KB/s. In this case, we shall only decrease readahead size for the 100KB/s clients. The 1MB/s clients actually won't see readahead thrashing at all and we'll want them to do large 1MB I/O to achieve good disk utilization. So we need something better than the "global feedback" scheme, and we do have such a solution ;) As said in my other email, the number of history pages remained in the page cache is a good estimation of that particular read stream's thrashing safe readahead size. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html