On Mon, Nov 21, 2022 at 09:45:49AM -0500, Brian Foster wrote: > On Tue, Nov 15, 2022 at 10:29:00AM -0800, Nhat Pham wrote: > > Implement a new syscall that queries cache state of a file and > > summarizes the number of cached pages, number of dirty pages, number of > > pages marked for writeback, number of (recently) evicted pages, etc. in > > a given range. > > > > NAME > > cachestat - query the page cache status of a file. > > > > SYNOPSIS > > #include <sys/mman.h> > > > > struct cachestat { > > unsigned long nr_cache; > > unsigned long nr_dirty; > > unsigned long nr_writeback; > > unsigned long nr_evicted; > > unsigned long nr_recently_evicted; > > }; > > > > int cachestat(unsigned int fd, off_t off, size_t len, > > struct cachestat *cstat); > > > > Do you have a strong use case for a user specified range vs. just > checking the entire file? If not, have you considered whether it might > be worth expanding statx() to include this data? That call is already > designed to include "extended" file status and avoids the need for a new > syscall. For example, the fields could be added individually with > multiple flags, or the entire struct tied to a new STATX_CACHE flag or > some such. Whole-file stats are only useful for data that is structured in directory trees. It doesn't work for structured files. For example, understanding (and subsequently advising/influencing) the readahead and dirty flushing in certain sections of a larger database file. Fadvise/madvise/sync_file_range etc. give the user the ability to influence cache behavior in sub-ranges, so it makes sense to also allow querying at that granularity. > > DESCRIPTION > > cachestat() queries the number of cached pages, number of dirty > > pages, number of pages marked for writeback, number of (recently) > > evicted pages, in the bytes range given by `off` and `len`. > > > > These values are returned in a cachestat struct, whose address is > > given by the `cstat` argument. > > > > The `off` argument must be a non-negative integers, If `off` + `len` > > >= `off`, the queried range is [`off`, `off` + `len`]. Otherwise, we > > will query in the range from `off` to the end of the file. > > > > (off + len < off) is an error condition on some (most?) other syscalls. > At least some calls (i.e. fadvise(), sync_file_range()) use len == 0 to > explicitly specify "to EOF." Good point, it would make sense to stick to that precedent.