On Thu, Apr 8, 2021 at 8:57 PM riteshh <riteshh@xxxxxxxxxxxxx> wrote: > > On 21/04/07 09:18PM, Matthew Wilcox (Oracle) wrote: > > As requested, fix up readahead_expand() so as to not confuse the ondemand > > algorithm. Also make the documentation slightly better. Dave, could you > > put in some debug and check this actually works? I don't generally test > > with any filesystems that use readahead_expand(), but printing (index, > > nr_to_read, lookahead_size) in page_cache_ra_unbounded() would let a human > > (such as your good self) determine whether it's working approximately > > as designed. > > Hello, > > Sorry about the silly question here, since I don't have much details of how > readahead algorithm code path. > > 1. Do we know of a way to measure efficiency of readahead in Linux? > 2. And if there is any way to validate readahead is working correctly and as > intended in Linux? I created a bpftrace tool for measuring readahead efficiency for my LSFMM 2019 keynote, where it showed the age of readahead pages when they were finally used: https://www.slideshare.net/brendangregg/lsfmm-2019-bpf-observability-143092820/29 If they were mostly of a young age, one might conclude that readahead is not only working, but could be tuned higher. Mostly of an old age, and one might conclude readahead was tuned too high, and was reading too many pages (that were later used unrelated to the original workload). I think my tool is just the start. What else should we measure for understanding readahead efficiency? Code it today as a bpftrace tool (and share it)! > > Like is there anything designed already to measure above two things? > If not, are there any stats which can be collected and later should be parsed to > say how efficient readahead is working in different use cases and also can > verify if it's working correctly? > > I guess, we can already do point 1 from below. What about point 2 & 3? > 1. Turn on/off the readahead and measure file reads timings for different > patterns. - I guess this is already doable. > > 2. Collecting runtime histogram showing how readahead window is > increasing/decreasing based on changing read patterns. And collecting how > much IOs it takes to increase/decrease the readahead size. > Are there any tracepoints needed to be enabled for this? > > 3. I guess it won't be possible w/o a way to also measure page cache > efficiency. Like in case of a memory pressure, if the page which was read > using readahead is thrown out only to re-read it again. > So a way to measure page cache efficiency also will be required. > > Any idea from others on this? > > I do see below page[1] by Brendan showing some ways to measure page cache > efficiency using cachestat. But there are also some problems mentioned in the > conclusion section, which I am not sure of what is the latest state of that. > Also it doesn't discusses much on the readahead efficiency measurement. > > [1]: http://www.brendangregg.com/blog/2014-12-31/linux-page-cache-hit-ratio.html Coincidentally, during the same LSFMMBPF keynote I showed cachestat and described it as a "sandcastle," as kernel changes easily wash it away. The MM folk discussed the various issues in measuring this accurately: while cachestat worked for my workloads, I think there's a lot more work to do to make it a robust tool for all workloads. I still think it should be /proc metrics instead, as I commonly want a page cache hit ratio metric (whereas many of my other tracing tools are more niche, and can stay as tracing tools). I don't think there's a video of the talk, but there was a writeup: https://lwn.net/Articles/787131/ People keep porting my cachestat tool and building other things upon it, but aren't updating the code, which is getting a bit annoying. You're all assuming I solved it. But in my original Ftrace cachestat code I thought I made it clear that it was a proof of concept for 3.13!: #!/bin/bash # # cachestat - show Linux page cache hit/miss statistics. # Uses Linux ftrace. # # This is a proof of concept using Linux ftrace capabilities on older kernels, # and works by using function profiling for in-kernel counters. Specifically, # four kernel functions are traced: # # mark_page_accessed() for measuring cache accesses # mark_buffer_dirty() for measuring cache writes # add_to_page_cache_lru() for measuring page additions # account_page_dirtied() for measuring page dirties # # It is possible that these functions have been renamed (or are different # logically) for your kernel version, and this script will not work as-is. # This script was written on Linux 3.13. This script is a sandcastle: the # kernel may wash some away, and you'll need to rebuild. [...] Brendan -- Brendan Gregg, Senior Performance Architect, Netflix