On Mon, Jun 26, 2023 at 8:14 PM Ahelenia Ziemiańska <nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Jun 26, 2023 at 07:21:16PM +0300, Amir Goldstein wrote: > > On Mon, Jun 26, 2023 at 6:12 PM Ahelenia Ziemiańska > > <nabijaczleweli@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Mon, Jun 26, 2023 at 05:53:46PM +0300, Amir Goldstein wrote: > > > > > So is it really true that the only way to poll a pipe is a > > > > > sleep()/read(O_NONBLOCK) loop? > > > > I don't think so, but inotify is not the way. > > > So what is? What do the kernel developers recommend as a way to see if a > > > file is written to, and that file happens to be a pipe? > > > > > > FTR, I've opened the symmetric Debian#1039488: > > > https://bugs.debian.org/1039488 > > > against coreutils, since, if this is expected, and writing to a pipe > > > should not generate write events on that pipe, then tail -f is currently > > > broken on most systems. > > First of all, it is better to mention that you are trying to fix a real > > world use case when you are reporting a kernel misbehavior. > I hadn't actually realised this affected coreutils tail as well before > re-testing it today. > > > What this makes me wonder is, if tail -f <fifo> is broken as you claim > > it is, how is it that decades go by without anyone noticing this problem? > Most people don't use cat(1) that splice(2)s, I do; > even if they do, they probably haven't filled the whole buffer so the > missed splice(2) write was potentially covered by a later write(2) write. > > > When looking at tail source code I see: > > > > /* Mark as '.ignore'd each member of F that corresponds to a > > pipe or fifo, and return the number of non-ignored members. */ > > static size_t > > ignore_fifo_and_pipe (struct File_spec *f, size_t n_files) > > { > > /* When there is no FILE operand and stdin is a pipe or FIFO > > POSIX requires that tail ignore the -f option. > > Since we allow multiple FILE operands, we extend that to say: with -f, > > ignore any "-" operand that corresponds to a pipe or FIFO. */ > > > > and it looks like tail_forever_inotify() is not being called unless > > there are non pipes: > > > > if (forever && ignore_fifo_and_pipe (F, n_files)) > > { > > > > The semantics of tail -f on a pipe input would be very odd, because > > the writer would need to close before tail can figure out which are > > the last lines. > The semantics of tail -f for FIFOs are formalised in POSIX, which says > (Issue 8 Draft 3): > 115551 −f If the input file is a regular file or if the file operand specifies a FIFO, do not > 115552 terminate after the last line of the input file has been copied, but read and copy > 115553 further bytes from the input file when they become available. If no file operand is > 115554 specified and standard input is a pipe or FIFO, the −f option shall be ignored. If the > 115555 input file is not a FIFO, pipe, or regular file, it is unspecified whether or not the −f > 115556 option shall be ignored. > coreutils sensibly interprets these in accordance with > https://www.mail-archive.com/austin-group-l@xxxxxxxxxxxxx/msg11402.html > > There are no special provisions for pipes/FIFOs before the input is > exhausted, correct: tail is two programs in one; the first bit reads the > input(s) to completion and outputs the bit you wanted, the second bit > (-f) keeps reading the inputs from where the first bit left off. > > (Note that tail with -c +123 and -n +123 doesn't "care" what lines are > last, and just copies from byte/line 123, but that's beside the point. > Indeed, "tail -c+1 fifo > log" is the idealised log collection program > from before: many programs may write to fifo, and all output is > collected in log.) > > But yes: tail -f fifo first reads the entire "contents" of fifo > (where for pipes this is defined as "until all writers hang up"), > then continues reading fifo and copying whatever it reads. > On a strict single-file implementation you can get away with reading and > then sleeping when you get 0 (this is what traditional UNIX tails do). > > When you have multiple files, well, you want to poll them, and since > pipes are unpollable, to avoid waking up every second and reading every > unpollable input file to see if you got something (regular files, fifos), > you use inotify(7) (coreutils) or kqueue(2) (NetBSD, probably others) > to tell you when there's data. > > If inotify(7) for pipes worked, the entire coreutils tail -f semantic > is implementable as a single poll(2): > * of each individual pollable (sockets, chardevs) > * of inotify of unpollables (pipes, regular files) > * of pidfd (if --pid) > this is very attractive. Naturally, I could also fall back to just a > poll of pollables and pidfd with a second timeout if there are pipes in > the inputs, but you see how this is sub-optimal for no real good reason. > And, well, coreutils tail doesn't do this, so it's vulnerable. > Thanks for the explanation. > > So honestly, we could maybe add IN_ACCESS/IN_MODIFY for the > > splice_pipe_to_pipe() case, but I would really like to know what > > the real use case is. > And splice_file_to_pipe() which is what we're hitting here. > The real use case is as I said: I would like to be able to poll pipes > with inotify instead of with sleep()/read(). > > > Another observation is that splice(2) never used to report any > > inotify events at all until a very recent commit in v6.4 > > 983652c69199 ("splice: report related fsnotify events") > > but this commit left out the splice_pipe_to_pipe() case. > > > > CC the author of the patch to ask why this case was left > > out and whether he would be interested in fixing that. > I'm reading the discussion following > <20230322062519.409752-1-cccheng@xxxxxxxxxxxx> as > "people just forget to add inotify hooks to their I/O routines as a rule", > thus my guess on why it was left out was "didn't even cross my mind" > (or, perhaps "didn't think we even supported fsnotify for pipes"). > > Below you'll find a scissor-patch based on current linus HEAD; > I've tested it works as-expected for both tty-to-pipe and pipe-to-pipe > splices in my original reproducer. > -- >8 -- > From: =?UTF-8?q?Ahelenia=20Ziemia=C5=84ska?= > <nabijaczleweli@xxxxxxxxxxxxxxxxxx> > Date: Mon, 26 Jun 2023 19:02:28 +0200 > Subject: [PATCH] splice: always fsnotify_access(in), fsnotify_modify(out) on > success > > The current behaviour caused an asymmetry where some write APIs > (write, sendfile) would notify the written-to/read-from objects, > but splice wouldn't. > > This affected userspace which used inotify, like coreutils tail -f. > > Link: https://lore.kernel.org/linux-fsdevel/jbyihkyk5dtaohdwjyivambb2gffyjs3dodpofafnkkunxq7bu@jngkdxx65pux/t/#u > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@xxxxxxxxxxxxxxxxxx> (Create dependency in case they should be backported) Fixes: 983652c69199 ("splice: report related fsnotify events") Makes sense. Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> > --- > fs/splice.c | 21 ++++++++++++--------- > 1 file changed, 12 insertions(+), 9 deletions(-) > > diff --git a/fs/splice.c b/fs/splice.c > index 3e06611d19ae..94fae24f9d54 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -1154,7 +1154,8 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > if ((in->f_flags | out->f_flags) & O_NONBLOCK) > flags |= SPLICE_F_NONBLOCK; > > - return splice_pipe_to_pipe(ipipe, opipe, len, flags); > + ret = splice_pipe_to_pipe(ipipe, opipe, len, flags); > + goto notify; > } > > if (ipipe) { > @@ -1182,15 +1183,12 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > ret = do_splice_from(ipipe, out, &offset, len, flags); > file_end_write(out); > > - if (ret > 0) > - fsnotify_modify(out); > - > if (!off_out) > out->f_pos = offset; > else > *off_out = offset; > > - return ret; > + goto notify; > } > > if (opipe) { > @@ -1209,18 +1207,23 @@ long do_splice(struct file *in, loff_t *off_in, struct file *out, > > ret = splice_file_to_pipe(in, opipe, &offset, len, flags); > > - if (ret > 0) > - fsnotify_access(in); > - > if (!off_in) > in->f_pos = offset; > else > *off_in = offset; > > - return ret; > + goto notify; > } > > return -EINVAL; > + > +notify: > + if (ret > 0) { > + fsnotify_access(in); > + fsnotify_modify(out); > + } > + > + return ret; > } > > static long __do_splice(struct file *in, loff_t __user *off_in, > -- > 2.39.2 >