On Mon, Feb 21, 2022 at 2:56 AM Noah Misch <noah@xxxxxxxxxxxx> wrote: > > Hello, > > I originally reported this to Debian > (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1006157), which > advised me to re-report it upstream. The context is an ext4 > filesystem on a sparc64 host. I've observed this with each of the > three sparc64 Debian kernels that I've tested. Those kernels were > 5.16.0-1-sparc64-smp, 5.15.0-2-sparc64-smp, and 4.9.0-13-sparc64-smp. Tested on sparc64 5.17.0-rc5 , still the same behaviour. PS: added linux-ext4@ as well , for the test program see the original email https://lore.kernel.org/sparclinux/20220220213131.GA3754799@xxxxxxxxxxxxxxxx/ > > * What exactly did you do (or not do) that was effective (or > ineffective)? > > See the included file for a minimal test program. It creates two > processes, each of which loops indefinitely. One opens a file, writes > 0x1 to a 256-byte region, and closes the file. The other process > opens the same file, reads the same region, and prints a message if > any byte is not 0x1. > > This thread has more discussion and a more-configurable test program: > https://postgr.es/m/flat/20220116071210.GA735692@xxxxxxxxxxxxxxxx > > * What was the outcome of this action? > > The program prints messages, at least ten per second. The mismatch > always appears at an offset divisible by eight. Some offsets are more > common than others. Here's output from 300s of runtime, filtered > through "sort -nk3 | uniq -c": > > 1729 mismatch at 8: got 0, want 1 > 1878 mismatch at 16: got 0, want 1 > 1030 mismatch at 24: got 0, want 1 > 41 mismatch at 40: got 0, want 1 > 373 mismatch at 48: got 0, want 1 > 24 mismatch at 56: got 0, want 1 > 349 mismatch at 64: got 0, want 1 > 13525 mismatch at 72: got 0, want 1 > 401 mismatch at 80: got 0, want 1 > 365 mismatch at 88: got 0, want 1 > 1 mismatch at 96: got 0, want 1 > 32 mismatch at 104: got 0, want 1 > 34 mismatch at 112: got 0, want 1 > 19 mismatch at 120: got 0, want 1 > 34 mismatch at 128: got 0, want 1 > 253 mismatch at 136: got 0, want 1 > 149 mismatch at 144: got 0, want 1 > 138 mismatch at 152: got 0, want 1 > 1 mismatch at 160: got 0, want 1 > 4 mismatch at 168: got 0, want 1 > 7 mismatch at 176: got 0, want 1 > 4 mismatch at 184: got 0, want 1 > 1 mismatch at 192: got 0, want 1 > 83 mismatch at 200: got 0, want 1 > 58 mismatch at 208: got 0, want 1 > 3301 mismatch at 216: got 0, want 1 > 2 mismatch at 232: got 0, want 1 > 1 mismatch at 248: got 0, want 1 > > If I run the program atop an xfs filesystem (still with sparc64), it > prints nothing. If I run it with x86_64 or powerpc64 (atop ext4), it > prints nothing. > > * What outcome did you expect instead? > > I expected the program to print nothing, indicating that the reader > process observes only 0x1 bytes. That is how x86_64+ext4 behaves. > > POSIX is stricter, requiring read() and write() implementations such > that "each call shall either see all of the specified effects of the > other call, or none of them" > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07). > ext4 does not conform, which may be pragmatic. However, with x86_64 > and powerpc64, readers see each byte as either its before-write value > or its after-write value. They don't see a zero in an offset that > will have been nonzero both before and after the ongoing write(). > > > === sparc64-ext4-zeros.c > /* > * Stress-test read(), and write() to detect a problem seen with sparc64+ext4. > * Readers see zeros when they read concurrently with a write, even if the > * file had no zero at that offset before or after the write. This program > * runs indefinitely and will print "mismatch ..." each time that happens. > */