On Thu, Dec 19, 2024 at 02:34:01PM -0800, Joanne Koong wrote: > On Thu, Dec 19, 2024 at 6:27 AM Brian Foster <bfoster@xxxxxxxxxx> wrote: > > > > On Wed, Dec 18, 2024 at 01:01:21PM -0800, Joanne Koong wrote: > > > Add support for reads/writes from buffers backed by hugepages. > > > This can be enabled through the '-h' flag. This flag should only be used > > > on systems where THP capabilities are enabled. > > > > > > Signed-off-by: Joanne Koong <joannelkoong@xxxxxxxxx> > > > --- > > > > Firstly, thanks for taking the time to add this. This seems like a nice > > idea. It might be nice to have an extra sentence or two in the commit > > log on the purpose/motivation. For example, has this been used to detect > > a certain class of problem? > > Hi Brian, > > Thanks for reviewing this. That's a good idea - I'll include the > sentence from the cover letter to this commit message as well: "This > is motivated by a recent bug that was due to faulty handling for > userspace buffers backed by hugepages." > Thanks. Got a link or anything, for my own curiosity? Also, I presume the followup fstest is a reproducer? > > > > A few other quick comments below... > > > > > ltp/fsx.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 92 insertions(+), 8 deletions(-) > > > > > > diff --git a/ltp/fsx.c b/ltp/fsx.c > > > index 41933354..3656fd9f 100644 > > > --- a/ltp/fsx.c > > > +++ b/ltp/fsx.c > > > @@ -190,6 +190,7 @@ int o_direct; /* -Z */ > > > int aio = 0; > > > +} > > > + > > > +static void * > > > +init_hugepages_buf(unsigned len, long hugepage_size) > > > +{ > > > + void *buf; > > > + long buf_size = roundup(len, hugepage_size); > > > + > > > + if (posix_memalign(&buf, hugepage_size, buf_size)) { > > > + prterr("posix_memalign for buf"); > > > + return NULL; > > > + } > > > + memset(buf, '\0', len); > > > > I'm assuming it doesn't matter, but did you want to use buf_size here to > > clear the whole buffer? > > I only saw buf being used up to len in the rest of the code so I > didn't think it was necessary, but I also don't feel strongly about > this and am happy to change this to clear the entire buffer if > preferred. > Yeah.. at first it looked like a bug to me, then I realized the same thing later. I suspect it might be wise to just clear it entirely to avoid any future landmines, but that could just be my internal bias talking too. No big deal either way. > > > > > + if (madvise(buf, buf_size, MADV_COLLAPSE)) { > > > + prterr("madvise collapse for buf"); > > > + free(buf); > > > + return NULL; > > > + } > > > + > > > + return buf; > > > +} > > > @@ -3232,12 +3287,41 @@ main(int argc, char **argv) > > > original_buf = (char *) malloc(maxfilelen); > > > for (i = 0; i < maxfilelen; i++) > > > original_buf[i] = random() % 256; > > > - good_buf = (char *) malloc(maxfilelen + writebdy); > > > - good_buf = round_ptr_up(good_buf, writebdy, 0); > > > - memset(good_buf, '\0', maxfilelen); > > > - temp_buf = (char *) malloc(maxoplen + readbdy); > > > - temp_buf = round_ptr_up(temp_buf, readbdy, 0); > > > - memset(temp_buf, '\0', maxoplen); > > > + if (hugepages) { > > > + long hugepage_size; > > > + > > > + hugepage_size = get_hugepage_size(); > > > + if (hugepage_size == -1) { > > > + prterr("get_hugepage_size()"); > > > + exit(99); > > > + } > > > + > > > + if (writebdy != 1 && writebdy != hugepage_size) > > > + prt("ignoring write alignment (since -h is enabled)"); > > > + > > > + if (readbdy != 1 && readbdy != hugepage_size) > > > + prt("ignoring read alignment (since -h is enabled)"); > > > > I'm a little unclear on what these warnings mean. The alignments are > > still used in the read/write paths afaics. The non-huge mode seems to > > only really care about the max size of the buffers in this code. > > > > If your test doesn't actually use read/write alignments and the goal is > > just to keep things simple, perhaps it would be cleaner to add something > > like an if (hugepages && (writebdy != 1 || readbdy != 1)) check after > > option processing and exit out as an unsupported combination..? > > My understanding of the 'writebdy' and 'readbdy' options are that > they're for making reads/writes aligned to the passed-in value, which > depends on the starting address of the buffer being aligned to that > value as well. However for hugepages buffers, they must be aligned to > the system hugepage size (eg 2 MiB) or the madvise(... MADV_COLLAPSE) > call will fail. As such, it is not guaranteed that the requested > alignment will actually be abided by. For that reason, I thought it'd > be useful to print this out to the user so they know requested > alignments will be ignored, but it didn't seem severe enough of an > issue to error out and exit altogether. But maybe it'd be less > confusing for the user if this instead does just error out if the > alignment isn't a multiple of the hugepage size. > Ahh, I see. I missed the round_ptr_up() adjustments. That makes more sense now. IMO it would be a little cleaner to just bail out earlier as such. But either way, I suppose if you could add a small comment with this alignment context you've explained above with the error checks then that is good enough for me. Thanks! Brian > > > > BTW, it might also be nice to factor out this whole section of buffer > > initialization code (including original_buf) into an init_buffers() or > > some such. That could be done as a prep patch, but just a suggestion > > either way. > > Good idea - i'll do this refactoring for v2. > > > Thanks, > Joanne > > > > Brian > > > > > + > > > + good_buf = init_hugepages_buf(maxfilelen, hugepage_size); > > > + if (!good_buf) { > > > + prterr("init_hugepages_buf failed for good_buf"); > > > + exit(100); > > > + } > > > + > > > + temp_buf = init_hugepages_buf(maxoplen, hugepage_size); > > > + if (!temp_buf) { > > > + prterr("init_hugepages_buf failed for temp_buf"); > > > + exit(101); > > > + } > > > + } else { > > > + good_buf = (char *) malloc(maxfilelen + writebdy); > > > + good_buf = round_ptr_up(good_buf, writebdy, 0); > > > + memset(good_buf, '\0', maxfilelen); > > > + > > > + temp_buf = (char *) malloc(maxoplen + readbdy); > > > + temp_buf = round_ptr_up(temp_buf, readbdy, 0); > > > + memset(temp_buf, '\0', maxoplen); > > > + } > > > if (lite) { /* zero entire existing file */ > > > ssize_t written; > > > > > > -- > > > 2.47.1 > > > > > > > > >