On Mon, Jan 29 2024 at 5:12P -0500, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Mon, Jan 29, 2024 at 12:19:02PM -0500, Mike Snitzer wrote: > > While I'm sure this legacy application would love to not have to > > change its code at all, I think we can all agree that we need to just > > focus on how best to advise applications that have mixed workloads > > accomplish efficient mmap+read of both sequential and random. > > > > To that end, I heard Dave clearly suggest 2 things: > > > > 1) update MADV/FADV_SEQUENTIAL to set file->f_ra.ra_pages to > > bdi->io_pages, not bdi->ra_pages * 2 > > > > 2) Have the application first issue MADV_SEQUENTIAL to convey that for > > the following MADV_WILLNEED is for sequential file load (so it is > > desirable to use larger ra_pages) > > > > This overrides the default of bdi->ra_pages and _should_ provide the > > required per-file duality of control for readahead, correct? > > I just discovered MADV_POPULATE_READ - see my reply to Ming > up-thread about that. The applicaiton should use that instead of > MADV_WILLNEED because it gives cache population guarantees that > WILLNEED doesn't. Then we can look at optimising the performance of > MADV_POPULATE_READ (if needed) as there is constrained scope we can > optimise within in ways that we cannot do with WILLNEED. Nice find! Given commit 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables"), I've cc'd David Hildenbrand just so he's in the loop. FYI, I proactively raised feedback and questions to the reporter of this issue: CONTEXT: madvise(WILLNEED) doesn't convey the nature of the access, sequential vs random, just the range that may be accessed. Q1: Is your application's sequential vs random (or smaller sequential) access split on a per-file basis? Or is the same file accessed both sequentially and randomly? A1: The same files can be accessed either randomly or sequentially, depending on certain access patterns and optimizing logic. Q2: Can the application be changed to use madvise() MADV_SEQUENTIAL and MADV_RANDOM to indicate its access pattern? A2: No, the application is a Java application. Java does not expose MADVISE API directly. Our application uses Java NIO API via MappedByteBuffer.load() (https://docs.oracle.com/javase/8/docs/api/java/nio/MappedByteBuffer.html#load--) that calls MADVISE_WILLNEED at the low level. There is no way for us to switch this behavior, but we take advantage of this behavior to optimize large file sequential I/O with great success. So it's looking like it'll be hard to help this reporter avoid changes... but that's not upstream's problem! Mike