Re: [PATCH 1/2] fsx: support reads/writes from buffers backed by hugepages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 19, 2024 at 6:27 AM Brian Foster <bfoster@xxxxxxxxxx> wrote:
>
> On Wed, Dec 18, 2024 at 01:01:21PM -0800, Joanne Koong wrote:
> > Add support for reads/writes from buffers backed by hugepages.
> > This can be enabled through the '-h' flag. This flag should only be used
> > on systems where THP capabilities are enabled.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx>
> > ---
>
> Firstly, thanks for taking the time to add this. This seems like a nice
> idea. It might be nice to have an extra sentence or two in the commit
> log on the purpose/motivation. For example, has this been used to detect
> a certain class of problem?

Hi Brian,

Thanks for reviewing this. That's a good idea - I'll include the
sentence from the cover letter to this commit message as well: "This
is motivated by a recent bug that was due to faulty handling for
userspace buffers backed by hugepages."

>
> A few other quick comments below...
>
> >  ltp/fsx.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 92 insertions(+), 8 deletions(-)
> >
> > diff --git a/ltp/fsx.c b/ltp/fsx.c
> > index 41933354..3656fd9f 100644
> > --- a/ltp/fsx.c
> > +++ b/ltp/fsx.c
> > @@ -190,6 +190,7 @@ int       o_direct;                       /* -Z */
> >  int  aio = 0;
> > +}
> > +
> > +static void *
> > +init_hugepages_buf(unsigned len, long hugepage_size)
> > +{
> > +     void *buf;
> > +     long buf_size = roundup(len, hugepage_size);
> > +
> > +     if (posix_memalign(&buf, hugepage_size, buf_size)) {
> > +             prterr("posix_memalign for buf");
> > +             return NULL;
> > +     }
> > +     memset(buf, '\0', len);
>
> I'm assuming it doesn't matter, but did you want to use buf_size here to
> clear the whole buffer?

I only saw buf being used up to len in the rest of the code so I
didn't think it was necessary, but I also don't feel strongly about
this and am happy to change this to clear the entire buffer if
preferred.

>
> > +     if (madvise(buf, buf_size, MADV_COLLAPSE)) {
> > +             prterr("madvise collapse for buf");
> > +             free(buf);
> > +             return NULL;
> > +     }
> > +
> > +     return buf;
> > +}
> > @@ -3232,12 +3287,41 @@ main(int argc, char **argv)
> >       original_buf = (char *) malloc(maxfilelen);
> >       for (i = 0; i < maxfilelen; i++)
> >               original_buf[i] = random() % 256;
> > -     good_buf = (char *) malloc(maxfilelen + writebdy);
> > -     good_buf = round_ptr_up(good_buf, writebdy, 0);
> > -     memset(good_buf, '\0', maxfilelen);
> > -     temp_buf = (char *) malloc(maxoplen + readbdy);
> > -     temp_buf = round_ptr_up(temp_buf, readbdy, 0);
> > -     memset(temp_buf, '\0', maxoplen);
> > +     if (hugepages) {
> > +             long hugepage_size;
> > +
> > +             hugepage_size = get_hugepage_size();
> > +             if (hugepage_size == -1) {
> > +                     prterr("get_hugepage_size()");
> > +                     exit(99);
> > +             }
> > +
> > +             if (writebdy != 1 && writebdy != hugepage_size)
> > +                     prt("ignoring write alignment (since -h is enabled)");
> > +
> > +             if (readbdy != 1 && readbdy != hugepage_size)
> > +                     prt("ignoring read alignment (since -h is enabled)");
>
> I'm a little unclear on what these warnings mean. The alignments are
> still used in the read/write paths afaics. The non-huge mode seems to
> only really care about the max size of the buffers in this code.
>
> If your test doesn't actually use read/write alignments and the goal is
> just to keep things simple, perhaps it would be cleaner to add something
> like an if (hugepages && (writebdy != 1 || readbdy != 1)) check after
> option processing and exit out as an unsupported combination..?

My understanding of the 'writebdy' and 'readbdy' options are that
they're for making reads/writes aligned to the passed-in value, which
depends on the starting address of the buffer being aligned to that
value as well. However for hugepages buffers, they must be aligned to
the system hugepage size (eg 2 MiB) or the madvise(... MADV_COLLAPSE)
call will fail. As such, it is not guaranteed that the requested
alignment will actually be abided by. For that reason, I thought it'd
be useful to print this out to the user so they know requested
alignments will be ignored, but it didn't seem severe enough of an
issue to error out and exit altogether. But maybe it'd be less
confusing for the user if this instead does just error out if the
alignment isn't a multiple of the hugepage size.

>
> BTW, it might also be nice to factor out this whole section of buffer
> initialization code (including original_buf) into an init_buffers() or
> some such. That could be done as a prep patch, but just a suggestion
> either way.

Good idea - i'll do this refactoring for v2.


Thanks,
Joanne
>
> Brian
>
> > +
> > +             good_buf = init_hugepages_buf(maxfilelen, hugepage_size);
> > +             if (!good_buf) {
> > +                     prterr("init_hugepages_buf failed for good_buf");
> > +                     exit(100);
> > +             }
> > +
> > +             temp_buf = init_hugepages_buf(maxoplen, hugepage_size);
> > +             if (!temp_buf) {
> > +                     prterr("init_hugepages_buf failed for temp_buf");
> > +                     exit(101);
> > +             }
> > +     } else {
> > +             good_buf = (char *) malloc(maxfilelen + writebdy);
> > +             good_buf = round_ptr_up(good_buf, writebdy, 0);
> > +             memset(good_buf, '\0', maxfilelen);
> > +
> > +             temp_buf = (char *) malloc(maxoplen + readbdy);
> > +             temp_buf = round_ptr_up(temp_buf, readbdy, 0);
> > +             memset(temp_buf, '\0', maxoplen);
> > +     }
> >       if (lite) {     /* zero entire existing file */
> >               ssize_t written;
> >
> > --
> > 2.47.1
> >
> >
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux