On Fri, 16 Dec 2022 11:21:48 -0800 Nhat Pham <nphamcs@xxxxxxxxx> wrote: > Implement a new syscall that queries cache state of a file and > summarizes the number of cached pages, number of dirty pages, number of > pages marked for writeback, number of (recently) evicted pages, etc. in > a given range. > > NAME > cachestat - query the page cache status of a file. > > SYNOPSIS > #include <sys/mman.h> > > struct cachestat { > __u64 nr_cache; > __u64 nr_dirty; > __u64 nr_writeback; > __u64 nr_evicted; > __u64 nr_recently_evicted; > }; > > int cachestat(unsigned int fd, off_t off, size_t len, > size_t cstat_size, struct cachestat *cstat, > unsigned int flags); > > DESCRIPTION > cachestat() queries the number of cached pages, number of dirty > pages, number of pages marked for writeback, number of (recently) > evicted pages, in the bytes range given by `off` and `len`. I suggest this be spelled out better: "number of evicted and number or recently evicted pages". I suggest this clearly tell readers what an "evicted" page is - they aren't kernel programmers! What is the benefit of the "recently evicted" pages? "recently" seems very vague - what use is this to anyone? > These values are returned in a cachestat struct, whose address is > given by the `cstat` argument. > > The `off` and `len` arguments must be non-negative integers. If > `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` == > 0, we will query in the range from `off` to the end of the file. > > `cstat_size` allows users to obtain partial results. The syscall > will copy the first `csstat_size` bytes to the specified userspace > memory. `cstat_size` must be a non-negative value that is no larger > than the current size of the cachestat struct. > > The `flags` argument is unused for now, but is included for future > extensibility. User should pass 0 (i.e no flag specified). Why is `flags' here? We could add an unused flags arg to any syscall, but we don't. What's the plan? Are there security implications? If I know that some process has a file open, I can use cachestat() to infer which parts of that file they're looking at (like mincore(), I guess). And I can infer which parts they're writing to, unlike mincore(). I suggest the [patch 1/4] fixup be separated from this series.