On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote: > > Hi copy_file_range experts, > > > > We hit this interesting issue when upgrading Go compiler from 1.13 to > > 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of > > `/sys/kernel/debug/tracing/trace` to a temporary file. > > > > Under the hood, Go now uses `copy_file_range` syscall to optimize the > > copy operation. However, that fails to copy any content when the input > > file is from sysfs/tracefs, with an apparent size of 0 (but there is > > still content when you `cat` it, of course). > > > > A repro case is available in comment7 (adapted from the man page), > > also copied below [2]. > > > > Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and > > 5.10.3 (chromeos)) > > $ ./copyfrom /sys/kernel/debug/tracing/trace x > > 0 bytes copied > > That's basically telling you that copy_file_range() was unable to > copy anything. The man page says: > > RETURN VALUE > Upon successful completion, copy_file_range() will return > the number of bytes copied between files. This could be less > than the length originally requested. If the file offset > of fd_in is at or past the end of file, no bytes are copied, > and copy_file_range() returns zero. > > THe man page explains it perfectly. I'm not that confident the explanation is perfect ,-) How does one define "EOF"? The read manpage (https://man7.org/linux/man-pages/man2/read.2.html) defines it as a zero return value. I don't think using the inode file size is standard. Seems like the kernel is not even trying to read from the source file here. In any case, I can fix this issue by dropping the count check here: https://elixir.bootlin.com/linux/latest/source/fs/read_write.c#L1445 . I'll send a patch so that we can discuss based on that. > Look at the trace file you are > trying to copy: > > $ ls -l /sys/kernel/debug/tracing/trace > -rw-r--r-- 1 root root 0 Jan 19 12:17 /sys/kernel/debug/tracing/trace > $ cat /sys/kernel/debug/tracing/trace > tracer: nop > # > # entries-in-buffer/entries-written: 0/0 #P:8 > # > # _-----=> irqs-off > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / delay > # TASK-PID CPU# |||| TIMESTAMP FUNCTION > # | | | |||| | | > > Yup, the sysfs file reports it's size as zero length, so the CFR > syscall is saying "there's nothing to copy from this empty file" and > so correctly is returning zero without even trying to copy anything > because the file offset is at EOF... > > IOWs, there's no copy_file_range() bug here - it's behaving as > documented. > > 'cat' "works" in this situation because it doesn't check the file > size and just attempts to read unconditionally from the file. Hence > it happily returns non-existent stale data from busted filesystem > implementations that allow data to be read from beyond EOF... `cp` also works, so does `dd` and basically any other file operation. (and I wouldn't call procfs, sysfs, debugfs and friends "busted", they are just... special) > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx