Re: [PATCH v5 1/2] fsx: support reads/writes from buffers backed by hugepages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2025 at 11:23:20AM -0800, Joanne Koong wrote:
> On Mon, Feb 3, 2025 at 10:59 AM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >
> > On Mon, Feb 03, 2025 at 10:04:04AM -0800, Joanne Koong wrote:
> > > On Sun, Feb 2, 2025 at 6:25 AM Zorro Lang <zlang@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jan 21, 2025 at 01:56:40PM -0800, Joanne Koong wrote:
> > > > > Add support for reads/writes from buffers backed by hugepages.
> > > > > This can be enabled through the '-h' flag. This flag should only be used
> > > > > on systems where THP capabilities are enabled.
> > > > >
> > > > > This is motivated by a recent bug that was due to faulty handling of
> > > > > userspace buffers backed by hugepages. This patch is a mitigation
> > > > > against problems like this in the future.
> > > > >
> > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > > > > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> > > > > Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > > > ---
> > > >
> > > > Those two test cases fail on old system which doesn't support
> > > > MADV_COLLAPSE:
> > > >
> > > >    fsx -N 10000 -l 500000 -h
> > > >   -fsx -N 10000 -o 8192 -l 500000 -h
> > > >   -fsx -N 10000 -o 128000 -l 500000 -h
> > > >   +MADV_COLLAPSE not supported. Can't support -h
> > > >
> > > > and
> > > >
> > > >    fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > >   -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > >   -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> > > >   +mapped writes DISABLED
> > > >   +MADV_COLLAPSE not supported. Can't support -h
> > > >
> > > > I'm wondering ...
> > > >
> > > > >  ltp/fsx.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 153 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > > > > index 41933354..3be383c6 100644
> > > > > --- a/ltp/fsx.c
> > > > > +++ b/ltp/fsx.c
> > > > >  static struct option longopts[] = {
> > > > >       {"replay-ops", required_argument, 0, 256},
> > > > >       {"record-ops", optional_argument, 0, 255},
> > > > > @@ -2883,7 +3023,7 @@ main(int argc, char **argv)
> > > > >       setvbuf(stdout, (char *)0, _IOLBF, 0); /* line buffered stdout */
> > > > >
> > > > >       while ((ch = getopt_long(argc, argv,
> > > > > -                              "0b:c:de:fg:i:j:kl:m:no:p:qr:s:t:uw:xyABD:EFJKHzCILN:OP:RS:UWXZ",
> > > > > +                              "0b:c:de:fg:hi:j:kl:m:no:p:qr:s:t:uw:xyABD:EFJKHzCILN:OP:RS:UWXZ",
> > > > >                                longopts, NULL)) != EOF)
> > > > >               switch (ch) {
> > > > >               case 'b':
> > > > > @@ -2916,6 +3056,14 @@ main(int argc, char **argv)
> > > > >               case 'g':
> > > > >                       filldata = *optarg;
> > > > >                       break;
> > > > > +             case 'h':
> > > > > +#ifndef MADV_COLLAPSE
> > > > > +                     fprintf(stderr, "MADV_COLLAPSE not supported. "
> > > > > +                             "Can't support -h\n");
> > > > > +                     exit(86);
> > > > > +#endif
> > > > > +                     hugepages = 1;
> > > > > +                     break;
> > > >
> > > > ...
> > > > if we could change this part to:
> > > >
> > > > #ifdef MADV_COLLAPSE
> > > >         hugepages = 1;
> > > > #endif
> > > >         break;
> > > >
> > > > to avoid the test failures on old systems.
> > > >
> > > > Or any better ideas from you :)
> > >
> > > Hi Zorro,
> > >
> > > It looks like MADV_COLLAPSE was introduced in kernel version 6.1. What
> > > do you think about skipping generic/758 and generic/759 if the kernel
> > > version is older than 6.1? That to me seems more preferable than the
> > > paste above, as the paste above would execute the test as if it did
> > > test hugepages when in reality it didn't, which would be misleading.
> >
> > Now that I've gotten to try this out --
> >
> > There's a couple of things going on here.  The first is that generic/759
> > and 760 need to check if invoking fsx -h causes it to spit out the
> > "MADV_COLLAPSE not supported" error and _notrun the test.
> >
> > The second thing is that userspace programs can ensure the existence of
> > MADV_COLLAPSE in multiple ways.  The first way is through sys/mman.h,
> > which requires that the underlying C library headers are new enough to
> > include a definition.  glibc 2.37 is new enough, but even things like
> > Debian 12 and RHEL 9 aren't new enough to have that.  Other C libraries
> > might not follow glibc's practice of wrapping and/or redefining symbols
> > in a way that you hope is the same as...
> >
> > The second way is through linux/mman.h, which comes from the kernel
> > headers package; and the third way is for the program to define it
> > itself if nobody else does.
> >
> > So I think the easiest way to fix the fsx.c build is to include
> > linux/mman.h in addition to sys/mman.h.  Sorry I didn't notice these
> 
> Thanks for your input. Do we still need sys/mman.h if linux/mman.h is added?

Yes, because glibc provides the mmap() function that wraps
syscall(__NR_mmap, ...);

> For generic/758 and 759, does it suffice to gate this on whether the
> kernel version if 6.1+ and _notrun if not? My understanding is that
> any kernel version 6.1 or newer will have MADV_COLLAPSE in its kernel
> headers package and support the feature.

No, because some (most?) vendors backport new features into existing
kernels without revving the version number of that kernel.

Maybe the following fixes things?

--D

generic/759,760: fix MADV_COLLAPSE detection and inclusion

On systems with "old" C libraries such as glibc 2.36 in Debian 12, the
MADV_COLLAPSE flag might not be defined in any of the header files
pulled in by sys/mman.h, which means that the fsx binary might not get
built with any of the MADV_COLLAPSE code.  If the kernel supports THP,
the test will fail with:

>  QA output created by 760
>  fsx -N 10000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> -fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> -fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z -R -W -h
> +mapped writes DISABLED
> +MADV_COLLAPSE not supported. Can't support -h

Fix both tests to detect fsx binaries that don't support MADV_COLLAPSE,
then fix fsx.c to include the mman.h from the kernel headers (aka
linux/mman.h) so that we can actually test IOs to and from THPs if the
kernel is newer than the rest of userspace.

Cc: <fstests@xxxxxxxxxxxxxxx> # v2025.02.02
Fixes: 627289232371e3 ("generic: add tests for read/writes from hugepages-backed buffers")
Signed-off-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
---
 ltp/fsx.c         |    1 +
 tests/generic/759 |    3 +++
 tests/generic/760 |    3 +++
 3 files changed, 7 insertions(+)

diff --git a/ltp/fsx.c b/ltp/fsx.c
index 634c496ffe9317..cf9502a74c17a7 100644
--- a/ltp/fsx.c
+++ b/ltp/fsx.c
@@ -20,6 +20,7 @@
 #include <strings.h>
 #include <sys/file.h>
 #include <sys/mman.h>
+#include <linux/mman.h>
 #include <sys/uio.h>
 #include <stdbool.h>
 #ifdef HAVE_ERR_H
diff --git a/tests/generic/759 b/tests/generic/759
index 6c74478aa8a0e0..3549c5ed6a9299 100755
--- a/tests/generic/759
+++ b/tests/generic/759
@@ -14,6 +14,9 @@ _begin_fstest rw auto quick
 _require_test
 _require_thp
 
+$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q 'MADV_COLLAPSE not supported' && \
+	_notrun "fsx binary does not support MADV_COLLAPSE"
+
 run_fsx -N 10000            -l 500000 -h
 run_fsx -N 10000  -o 8192   -l 500000 -h
 run_fsx -N 10000  -o 128000 -l 500000 -h
diff --git a/tests/generic/760 b/tests/generic/760
index c71a196222ad3b..2fbd28502ae678 100755
--- a/tests/generic/760
+++ b/tests/generic/760
@@ -15,6 +15,9 @@ _require_test
 _require_odirect
 _require_thp
 
+$here/ltp/fsx -N 0 -h $TEST_DIR 2>&1 | grep -q 'MADV_COLLAPSE not supported' && \
+	_notrun "fsx binary does not support MADV_COLLAPSE"
+
 psize=`$here/src/feature -s`
 bsize=`$here/src/min_dio_alignment $TEST_DIR $TEST_DEV`
 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux