On 4/28/22 3:01 PM, Daniel P. Berrangé wrote: > On Wed, Apr 27, 2022 at 11:13:20PM +0200, Claudio Fontana wrote: >> this is in preparation for a minor refactoring of the copy >> function itself out of runIO(). >> >> Signed-off-by: Claudio Fontana <cfontana@xxxxxxx> >> --- >> src/util/iohelper.c | 95 +++++++++++++++++++++++++-------------------- >> 1 file changed, 53 insertions(+), 42 deletions(-) >> >> diff --git a/src/util/iohelper.c b/src/util/iohelper.c >> index 2c91bf4f93..c13746a547 100644 >> --- a/src/util/iohelper.c >> +++ b/src/util/iohelper.c >> @@ -45,6 +45,16 @@ >> # define O_DIRECT 0 >> #endif >> >> +struct runIOParams { >> + bool isBlockDev; >> + bool isDirect; >> + bool isWrite; >> + int fdin; >> + const char *fdinname; >> + int fdout; >> + const char *fdoutname; >> +}; >> + >> static int >> runIO(const char *path, int fd, int oflags) >> { >> @@ -53,13 +63,9 @@ runIO(const char *path, int fd, int oflags) >> size_t buflen = 1024*1024; >> intptr_t alignMask = 64*1024 - 1; >> int ret = -1; >> - int fdin, fdout; >> - const char *fdinname, *fdoutname; >> - unsigned long long total = 0; >> - bool direct = O_DIRECT && ((oflags & O_DIRECT) != 0); >> - off_t end = 0; >> + off_t total = 0; >> struct stat sb; >> - bool isBlockDev = false; >> + struct runIOParams p; >> >> #if WITH_POSIX_MEMALIGN >> if (posix_memalign(&base, alignMask + 1, buflen)) >> @@ -77,34 +83,23 @@ runIO(const char *path, int fd, int oflags) >> fd, path); >> goto cleanup; >> } >> - isBlockDev = S_ISBLK(sb.st_mode); >> + p.isBlockDev = S_ISBLK(sb.st_mode); >> + p.isDirect = O_DIRECT && (oflags & O_DIRECT); >> >> switch (oflags & O_ACCMODE) { >> case O_RDONLY: >> - fdin = fd; >> - fdinname = path; >> - fdout = STDOUT_FILENO; >> - fdoutname = "stdout"; >> - /* To make the implementation simpler, we give up on any >> - * attempt to use O_DIRECT in a non-trivial manner. */ >> - if (!isBlockDev && direct && ((end = lseek(fd, 0, SEEK_CUR)) != 0)) { >> - virReportSystemError(end < 0 ? errno : EINVAL, "%s", >> - _("O_DIRECT read needs entire seekable file")); >> - goto cleanup; >> - } >> + p.isWrite = false; >> + p.fdin = fd; >> + p.fdinname = path; >> + p.fdout = STDOUT_FILENO; >> + p.fdoutname = "stdout"; >> break; >> case O_WRONLY: >> - fdin = STDIN_FILENO; >> - fdinname = "stdin"; >> - fdout = fd; >> - fdoutname = path; >> - /* To make the implementation simpler, we give up on any >> - * attempt to use O_DIRECT in a non-trivial manner. */ >> - if (!isBlockDev && direct && (end = lseek(fd, 0, SEEK_END)) != 0) { >> - virReportSystemError(end < 0 ? errno : EINVAL, "%s", >> - _("O_DIRECT write needs empty seekable file")); >> - goto cleanup; >> - } >> + p.isWrite = true; >> + p.fdin = STDIN_FILENO; >> + p.fdinname = "stdin"; >> + p.fdout = fd; >> + p.fdoutname = path; >> break; >> >> case O_RDWR: >> @@ -114,6 +109,22 @@ runIO(const char *path, int fd, int oflags) >> (oflags & O_ACCMODE)); >> goto cleanup; >> } >> + /* To make the implementation simpler, we give up on any >> + * attempt to use O_DIRECT in a non-trivial manner. */ >> + if (!p.isBlockDev && p.isDirect) { >> + off_t off; >> + if (p.isWrite) { >> + if ((off = lseek(fd, 0, SEEK_END)) != 0) { >> + virReportSystemError(off < 0 ? errno : EINVAL, "%s", >> + _("O_DIRECT write needs empty seekable file")); >> + goto cleanup; >> + } >> + } else if ((off = lseek(fd, 0, SEEK_CUR)) != 0) { >> + virReportSystemError(off < 0 ? errno : EINVAL, "%s", >> + _("O_DIRECT read needs entire seekable file")); >> + goto cleanup; >> + } > > I'm puzzelled about the current code doing SEEK_END in one case > and SEEK_CUR in the other case. I think this method should simply > use SEEK_CUR only as a sanity check for being ablt to use O_DIRECT. I think the answer is in the original commit: commit 12291656b135ed788e41dadbd2d15e613ddea9b5 Author: Eric Blake <eblake@xxxxxxxxxx> Date: Tue Jul 12 08:35:05 2011 -0600 save: let iohelper work on O_DIRECT fds Required for a coming patch where iohelper will operate on O_DIRECT fds. There, the user-space memory must be aligned to file system boundaries (at least 512, but using page-aligned works better, and some file systems prefer 64k). Made tougher by the fact that VIR_ALLOC won't work on void *, but posix_memalign won't work on char * and isn't available everywhere. This patch makes some simplifying assumptions - namely, output to an O_DIRECT fd will only be attempted on an empty seekable file (hence, no need to worry about preserving existing data on a partial block, and ftruncate will work to undo the effects of having to round up the size of the last block written), and input from an O_DIRECT fd will only be attempted on a complete seekable file with the only possible short read at EOF. ---- Here the relevant part being "empty seekable file". The check not only ensures the file is seekable, but also double checks that it is empty. I can add a comment to make it explicit. > > If we actually need to position the file offset, then the caller > should do a SEEK_END itself. > >> + } >> >> while (1) { >> ssize_t got; >> @@ -124,16 +135,16 @@ runIO(const char *path, int fd, int oflags) >> * writes will be aligned. >> * In other cases using saferead reduces number of syscalls. >> */ >> - if (fdin == fd && direct) { >> - if ((got = read(fdin, buf, buflen)) < 0 && >> + if (!p.isWrite && p.isDirect) { >> + if ((got = read(p.fdin, buf, buflen)) < 0 && >> errno == EINTR) >> continue; >> } else { >> - got = saferead(fdin, buf, buflen); >> + got = saferead(p.fdin, buf, buflen); >> } > > The distinction for read vs saferead is for correctness with O_DIRECT. > > We only want to use read on the plain file / blockdev, and saferead > on the socket/pipe. Yes that's clear. > > We already call fstat to let us figure out the fd type, so can use > that decide whether to use read vs saferead. > > This ties into my comment on the later patch about simplifying the > parameters passed in, to remove the distinction of read vs write. > As mentioned elsewhere I don't think it can be simplified quite that way, but I think there is an alternative that could work, will propose in the next spin. Thanks Claudio