Re: [PATCH v13 2/3] cachestat: implement cachestat syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 4, 2023 at 10:26 AM Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> wrote:
>
> Hi Nhat,
>
> On Wed, May 3, 2023 at 3:38 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> > There is currently no good way to query the page cache state of large
> > file sets and directory trees. There is mincore(), but it scales poorly:
> > the kernel writes out a lot of bitmap data that userspace has to
> > aggregate, when the user really doesn not care about per-page
> > information in that case. The user also needs to mmap and unmap each
> > file as it goes along, which can be quite slow as well.
> >
> > Some use cases where this information could come in handy:
> >   * Allowing database to decide whether to perform an index scan or
> >     direct table queries based on the in-memory cache state of the
> >     index.
> >   * Visibility into the writeback algorithm, for performance issues
> >     diagnostic.
> >   * Workload-aware writeback pacing: estimating IO fulfilled by page
> >     cache (and IO to be done) within a range of a file, allowing for
> >     more frequent syncing when and where there is IO capacity, and
> >     batching when there is not.
> >   * Computing memory usage of large files/directory trees, analogous to
> >     the du tool for disk usage.
> >
> > More information about these use cases could be found in the following
> > thread:
> >
> > https://lore.kernel.org/lkml/20230315170934.GA97793@xxxxxxxxxxx/
> >
> > This patch implements a new syscall that queries cache state of a file
> > and summarizes the number of cached pages, number of dirty pages, number
> > of pages marked for writeback, number of (recently) evicted pages, etc.
> > in a given range. Currently, the syscall is only wired in for x86
> > architecture.
> >
> > NAME
> >     cachestat - query the page cache statistics of a file.
> >
> > SYNOPSIS
> >     #include <sys/mman.h>
> >
> >     struct cachestat_range {
> >         __u64 off;
> >         __u64 len;
> >     };
> >
> >     struct cachestat {
> >         __u64 nr_cache;
> >         __u64 nr_dirty;
> >         __u64 nr_writeback;
> >         __u64 nr_evicted;
> >         __u64 nr_recently_evicted;
> >     };
> >
> >     int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
> >         struct cachestat *cstat, unsigned int flags);
> >
> > DESCRIPTION
> >     cachestat() queries the number of cached pages, number of dirty
> >     pages, number of pages marked for writeback, number of evicted
> >     pages, number of recently evicted pages, in the bytes range given by
> >     `off` and `len`.
> >
> >     An evicted page is a page that is previously in the page cache but
> >     has been evicted since. A page is recently evicted if its last
> >     eviction was recent enough that its reentry to the cache would
> >     indicate that it is actively being used by the system, and that
> >     there is memory pressure on the system.
> >
> >     These values are returned in a cachestat struct, whose address is
> >     given by the `cstat` argument.
> >
> >     The `off` and `len` arguments must be non-negative integers. If
> >     `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
> >     0, we will query in the range from `off` to the end of the file.
> >
> >     The `flags` argument is unused for now, but is included for future
> >     extensibility. User should pass 0 (i.e no flag specified).
> >
> >     Currently, hugetlbfs is not supported.
> >
> >     Because the status of a page can change after cachestat() checks it
> >     but before it returns to the application, the returned values may
> >     contain stale information.
> >
> > RETURN VALUE
> >     On success, cachestat returns 0. On error, -1 is returned, and errno
> >     is set to indicate the error.
> >
> > ERRORS
> >     EFAULT cstat or cstat_args points to an invalid address.
> >
> >     EINVAL invalid flags.
> >
> >     EBADF  invalid file descriptor.
> >
> >     EOPNOTSUPP file descriptor is of a hugetlbfs file
> >
> > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>
> > ---
> >  arch/x86/entry/syscalls/syscall_32.tbl |   1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>
> This should be wired up on each and every architecture.
> Currently we're getting
>
>     <stdin>:1567:2: warning: #warning syscall cachestat not implemented [-Wcpp]
>
> in linux-next for all the missing architectures.
Hi Geert,

I saw that there are several instances where we have separate
patches to wire up a syscall to these architectures, so I was doing
something similar.

For e.g:

ARM: wire up process_vm_writev and process_vm_readv syscalls
(e5489847d6fc0ff176048b6e1cf5034507bf703a)

MIPS: Hook up process_vm_readv and process_vm_writev system calls.
(8ff8584e51d4d3fbe08ede413c4a221223766323)

As for these non-x86 architecture wiring patches, I can give it a shot
and cross-compile to see if it builds, but I have limited abilities for
runtime tests as I don't have machines with these architectures. I
would really appreciate it if there are arch people that could help
wire it up.

(cc-ing linux-arch as well)


>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux