Re: [PATCH v5 1/2] fsx: support reads/writes from buffers backed by hugepages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 3, 2025 at 8:21 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Mon, Feb 03, 2025 at 01:54:32PM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 03, 2025 at 01:40:39PM -0800, Joanne Koong wrote:
> > > On Mon, Feb 3, 2025 at 12:01 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Feb 03, 2025 at 11:57:23AM -0800, Joanne Koong wrote:
> > > > > On Mon, Feb 3, 2025 at 11:41 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Feb 03, 2025 at 11:23:20AM -0800, Joanne Koong wrote:
> > > > > > > On Mon, Feb 3, 2025 at 10:59 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 03, 2025 at 10:04:04AM -0800, Joanne Koong wrote:
> > > > > > > > > On Sun, Feb 2, 2025 at 6:25 AM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 21, 2025 at 01:56:40PM -0800, Joanne Koong wrote:
> > > > > > > > > > > Add support for reads/writes from buffers backed by hugepages.
> > > > > > > > > > > This can be enabled through the '-h' flag. This flag should only be used
> > > > > > > > > > > on systems where THP capabilities are enabled.
> > > > > > > > > > >
> > > > > > > > > > > This is motivated by a recent bug that was due to faulty handling of
> > > > > > > > > > > userspace buffers backed by hugepages. This patch is a mitigation
> > > > > > > > > > > against problems like this in the future.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > > > > > > > > > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > > > > > > > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > Those two test cases fail on old system which doesn't support
> > > > > > > > > > MADV_COLLAPSE:
> > > > > > > > > >
> > > > > > > > > >    fsx -N 10000 -l 500000 -h
> > > > > > > > > >   -fsx -N 10000 -o 8192 -l 500000 -h
> > > > > > > > > >   -fsx -N 10000 -o 128000 -l 500000 -h
> > > > > > > > > >   +MADV_COLLAPSE not supported. Can't support -h
> > > > > > > > > >
> > > > > > > > > > and
> > > > > > > > > >
> > > > > > > > > >    fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > > > >   -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > > > >   -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > > > >   +mapped writes DISABLED
> > > > > > > > > >   +MADV_COLLAPSE not supported. Can't support -h
> > > > > > > > > >
> > > > > > > > > > I'm wondering ...
> > > > > > > > > >
> > > > > > > > > > >  ltp/fsx.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> > > > > > > > > > >  1 file changed, 153 insertions(+), 13 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > > > > > > > > > index 41933354..3be383c6 100644
> > > > > > > > > > > --- a/ltp/fsx.c
> > > > > > > > > > > +++ b/ltp/fsx.c
> > > > > > > > > > >  static struct option longopts[] = {
> > > > > > > > > > >       {"replay-ops", required_argument, 0, 256},
> > > > > > > > > > >       {"record-ops", optional_argument, 0, 255},
> > > > > > > > > > > @@ -2883,7 +3023,7 @@ main(int argc, char **argv)
> > > > > > > > > > >       setvbuf(stdout, (char *)0, _IOLBF, 0); /* line buffered stdout */
> > > > > > > > > > >
> > > > > > > > > > >       while ((ch = getopt_long(argc, argv,
> > > > > > > > > > > -                              "0b:c:de:fg:i:j:kl:m:no:p:qr:s:t:uw:xyABD:EFJKHzCILN:OP:RS:UWXZ",
> > > > > > > > > > > +                              "0b:c:de:fg:hi:j:kl:m:no:p:qr:s:t:uw:xyABD:EFJKHzCILN:OP:RS:UWXZ",
> > > > > > > > > > >                                longopts, NULL)) != EOF)
> > > > > > > > > > >               switch (ch) {
> > > > > > > > > > >               case 'b':
> > > > > > > > > > > @@ -2916,6 +3056,14 @@ main(int argc, char **argv)
> > > > > > > > > > >               case 'g':
> > > > > > > > > > >                       filldata = *optarg;
> > > > > > > > > > >                       break;
> > > > > > > > > > > +             case 'h':
> > > > > > > > > > > +#ifndef MADV_COLLAPSE
> > > > > > > > > > > +                     fprintf(stderr, "MADV_COLLAPSE not supported. "
> > > > > > > > > > > +                             "Can't support -h\n");
> > > > > > > > > > > +                     exit(86);
> > > > > > > > > > > +#endif
> > > > > > > > > > > +                     hugepages = 1;
> > > > > > > > > > > +                     break;
> > > > > > > > > >
> > > > > > > > > > ...
> > > > > > > > > > if we could change this part to:
> > > > > > > > > >
> > > > > > > > > > #ifdef MADV_COLLAPSE
> > > > > > > > > >         hugepages = 1;
> > > > > > > > > > #endif
> > > > > > > > > >         break;
> > > > > > > > > >
> > > > > > > > > > to avoid the test failures on old systems.
> > > > > > > > > >
> > > > > > > > > > Or any better ideas from you :)
> > > > > > > > >
> > > > > > > > > Hi Zorro,
> > > > > > > > >
> > > > > > > > > It looks like MADV_COLLAPSE was introduced in kernel version 6.1. What
> > > > > > > > > do you think about skipping generic/758 and generic/759 if the kernel
> > > > > > > > > version is older than 6.1? That to me seems more preferable than the
> > > > > > > > > paste above, as the paste above would execute the test as if it did
> > > > > > > > > test hugepages when in reality it didn't, which would be misleading.
> > > > > > > >
> > > > > > > > Now that I've gotten to try this out --
> > > > > > > >
> > > > > > > > There's a couple of things going on here.  The first is that generic/759
> > > > > > > > and 760 need to check if invoking fsx -h causes it to spit out the
> > > > > > > > "MADV_COLLAPSE not supported" error and _notrun the test.
> > > > > > > >
> > > > > > > > The second thing is that userspace programs can ensure the existence of
> > > > > > > > MADV_COLLAPSE in multiple ways.  The first way is through sys/mman.h,
> > > > > > > > which requires that the underlying C library headers are new enough to
> > > > > > > > include a definition.  glibc 2.37 is new enough, but even things like
> > > > > > > > Debian 12 and RHEL 9 aren't new enough to have that.  Other C libraries
> > > > > > > > might not follow glibc's practice of wrapping and/or redefining symbols
> > > > > > > > in a way that you hope is the same as...
> > > > > > > >
> > > > > > > > The second way is through linux/mman.h, which comes from the kernel
> > > > > > > > headers package; and the third way is for the program to define it
> > > > > > > > itself if nobody else does.
> > > > > > > >
> > > > > > > > So I think the easiest way to fix the fsx.c build is to include
> > > > > > > > linux/mman.h in addition to sys/mman.h.  Sorry I didn't notice these
> > > > > > >
> > > > > > > Thanks for your input. Do we still need sys/mman.h if linux/mman.h is added?
> > > > > >
> > > > > > Yes, because glibc provides the mmap() function that wraps
> > > > > > syscall(__NR_mmap, ...);
> > > > > >
> > > > > > > For generic/758 and 759, does it suffice to gate this on whether the
> > > > > > > kernel version if 6.1+ and _notrun if not? My understanding is that
> > > > > > > any kernel version 6.1 or newer will have MADV_COLLAPSE in its kernel
> > > > > > > headers package and support the feature.
> > > > > >
> > > > > > No, because some (most?) vendors backport new features into existing
> > > > > > kernels without revving the version number of that kernel.
> > > > >
> > > > > Oh okay, I see. That makes sense, thanks for the explanation.
> > > > >
> > > > > >
> > > > > > Maybe the following fixes things?
> > > > > >
> > > > > > --D
> > > > > >
> > > > > > generic/759,760: fix MADV_COLLAPSE detection and inclusion
> > > > > >
> > > > > > On systems with "old" C libraries such as glibc 2.36 in Debian 12, the
> > > > > > MADV_COLLAPSE flag might not be defined in any of the header files
> > > > > > pulled in by sys/mman.h, which means that the fsx binary might not get
> > > > > > built with any of the MADV_COLLAPSE code.  If the kernel supports THP,
> > > > > > the test will fail with:
> > > > > >
> > > > > > >  QA output created by 760
> > > > > > >  fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > > > > > +mapped writes DISABLED
> > > > > > > +MADV_COLLAPSE not supported. Can't support -h
> > > > > >
> > > > > > Fix both tests to detect fsx binaries that don't support MADV_COLLAPSE,
> > > > > > then fix fsx.c to include the mman.h from the kernel headers (aka
> > > > > > linux/mman.h) so that we can actually test IOs to and from THPs if the
> > > > > > kernel is newer than the rest of userspace.
> > > > > >
> > > > > > Cc: <fstests@xxxxxxxxxxxxxxx> # v2025.02.02
> > > > > > Fixes: 627289232371e3 ("generic: add tests for read/writes from hugepages-backed buffers")
> > > > > > Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> > > > > > ---
> > > > > >  ltp/fsx.c         |    1 +
> > > > > >  tests/generic/759 |    3 +++
> > > > > >  tests/generic/760 |    3 +++
> > > > > >  3 files changed, 7 insertions(+)
> > > > > >
> > > > > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > > > > index 634c496ffe9317..cf9502a74c17a7 100644
> > > > > > --- a/ltp/fsx.c
> > > > > > +++ b/ltp/fsx.c
> > > > > > @@ -20,6 +20,7 @@
> > > > > >  #include <strings.h>
> > > > > >  #include <sys/file.h>
> > > > > >  #include <sys/mman.h>
> > > > > > +#include <linux/mman.h>
> > > > > >  #include <sys/uio.h>
> > > > > >  #include <stdbool.h>
> > > > > >  #ifdef HAVE_ERR_H
> > > > > > diff --git a/tests/generic/759 b/tests/generic/759
> > > > > > index 6c74478aa8a0e0..3549c5ed6a9299 100755
> > > > > > --- a/tests/generic/759
> > > > > > +++ b/tests/generic/759
> > > > > > @@ -14,6 +14,9 @@ _begin_fstest rw auto quick
> > > > > >  _require_test
> > > > > >  _require_thp
> > > > > >
> > > > > > +$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q 'MADV_COLLAPSE not supported' && \
> > > > > > +       _notrun "fsx binary does not support MADV_COLLAPSE"
> > > > > > +
> > > > > >  run_fsx -N 10000            -l 500000 -h
> > > > > >  run_fsx -N 10000  -o 8192   -l 500000 -h
> > > > > >  run_fsx -N 10000  -o 128000 -l 500000 -h
> > > > > > diff --git a/tests/generic/760 b/tests/generic/760
> > > > > > index c71a196222ad3b..2fbd28502ae678 100755
> > > > > > --- a/tests/generic/760
> > > > > > +++ b/tests/generic/760
> > > > > > @@ -15,6 +15,9 @@ _require_test
> > > > > >  _require_odirect
> > > > > >  _require_thp
> > > > > >
> > > > > > +$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q 'MADV_COLLAPSE not supported' && \
> > > > > > +       _notrun "fsx binary does not support MADV_COLLAPSE"
> > > > > > +
> > > > >
> > > > > I tried this out locally and it works for me:
> > > > >
> > > > > generic/759 8s ... [not run] fsx binary does not support MADV_COLLAPSE
> > > > > Ran: generic/759
> > > > > Not run: generic/759
> > > > > Passed all 1 tests
> > > > >
> > > > > SECTION       -- fuse
> > > > > =========================
> > > > > Ran: generic/759
> > > > > Not run: generic/759
> > > > > Passed all 1 tests
> > > >
> > > > Does the test actually run if you /do/ have kernel/libc headers that
> > > > define MADV_COLLAPSE?  And if so, does that count as a Tested-by?
> > > >
> > >
> > > I'm not sure if I fully understand the subtext of your question but
> > > yes, the test runs on my system when MADV_COLLAPSE is defined in the
> > > kernel/libc headers.
> >
> > Yep, you understood me correctly. :)
> >
> > > For sanity checking the inverse case, (eg MADV_COLLAPSE not defined),
> > > I tried this out by modifying the ifdef/ifndef checks in fsx to
> > > MADV_COLLAPSE_ to see if the '$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 |
> > > grep -q 'MADV_COLLAPSE not supported'' line catches that.
> >
> > <nod> Sounds good then; I'll add this to my test clod and run it
> > overnight.
>
> The arm64 vms with 64k base pages spat out this:
>
> --- /run/fstests/bin/tests/generic/759.out      2025-02-02 08:36:28.007248055 -0800
> +++ /var/tmp/fstests/generic/759.out.bad        2025-02-03 16:51:34.862964640 -0800
> @@ -1,4 +1,5 @@
>  QA output created by 759
>  fsx -N 10000 -l 500000 -h
> -fsx -N 10000 -o 8192 -l 500000 -h
> -fsx -N 10000 -o 128000 -l 500000 -h
> +Seed set to 1
> +madvise collapse for buf: Cannot allocate memory
> +init_hugepages_buf failed for good_buf: Cannot allocate memory
>
> Note that it was trying to create a 512M page(!) on a VM with 8G of
> memory.  Normally one doesn't use large base page size in low memory
> environments, but this /is/ fstestsland. 8-)
>
> From commit 7d8faaf155454f ("mm/madvise: introduce MADV_COLLAPSE sync
> hugepage collapse") :
>
>         ENOMEM  Memory allocation failed or VMA not found
>         EBUSY   Memcg charging failed
>         EAGAIN  Required resource temporarily unavailable.  Try again
>                 might succeed.
>         EINVAL  Other error: No PMD found, subpage doesn't have Present
>                 bit set, "Special" page no backed by struct page, VMA
>                 incorrectly sized, address not page-aligned, ...
>
> It sounds like ENOMEM/EBUSY/EINVAL should result in
> _notrun "Could not generate hugepage" ?  What are your thoughts?
>

Thanks for running it overnight. This sounds good to me, but will this
be robust against false negatives? For example, if it succeeds when
we're doing the

$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q 'MADV_COLLAPSE not
supported' &&  _notrun "fsx binary does not support MADV_COLLAPSE"

check, does that guarantee that the actual run won't error out with
ENOMEM/EBUSY/EINVAL? It seems like there could be the case where it
passes the check but then when it actually runs, the system state
memory state may have changed and now memcg charging or the memory
allocation fails? EAGAIN seems a bit iffy to me - hopefully this
doesn't happen often but if it does, it would likely be a false
negative fail for the generic test?

Maybe instead of "$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q
'MADV_COLLAPSE not supported' &&  _notrun "fsx binary does not support
MADV_COLLAPSE"", we should run the test and then if the output to the
.out file is "MADV_COLLAPSE not supported", then we deem that a
pass/success? Not sure if this is possible in the fstest
infrastructure though for checking against two possible outputs in the
.out file. What are your thoughts?


Thanks,
Joanne

> --D
>
> > --D
> >
> > >
> > > > --D
> > > >
> > > > >
> > > > > Thanks,
> > > > > Joanne
> > > > >
> > > > > >  psize=`$here/src/feature -s`
> > > > > >  bsize=`$here/src/min_dio_alignment $TEST_DIR $TEST_DEV`
> > > > > >
> > > > >
> >





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux