On Tue, Aug 18, 2020 at 11:49:29AM -0400, Mike Marshall wrote: > upstream commit id: ec95f1dedc9c64ac5a8b0bdb7c276936c70fdedd > > I verified that ec95f1de "orangefs: get rid of knob code..." > will apply to 5.4 and I compiled and ran a patched 5.4 kernel > against my normal xfstests... I wish that ec95f1de could be > in the 5.4 long term stable kernel. > > ec95f1de went upstream in 5.7. When I sent up the patch it was > just a theoretical race condition to me: I accepted what Christoph > said about it. We now have experienced in-the-real-world how > important the patch is... > > Someone was trying to read a whole large (more than 100 meg) > file from orangefs into some kind of cloud bucket. The > resulting read failed with a "Bad address" error. I > immediately thought of this patch. I reproduced the > "Bad address" error with dd in kernel versions that > lack ec95f1de. The "Bad address" error does not occur > in kernels that include ec95f1de: > > 5.7.11-100.fc31.x86_64: > > $ ./wr.sh 10000000 > /pvfsmnt/wr.10000000 > $ dd if=/pvfsmnt/wr.10000000 of=/tmp/wr.10000000 count=10 bs=419430400 > $ ls -l /pvfsmnt/wr.10000000 /tmp/wr.10000000 > -rw-rw-r--. 1 hubcap hubcap 498888897 Aug 14 15:41 /pvfsmnt/wr.10000000 > -rw-rw-r--. 1 hubcap hubcap 498888897 Aug 14 16:51 /tmp/wr.10000000 > $ md5sum /pvfsmnt/wr.10000000 /tmp/wr.10000000 > 669daa04f91f561f5fb2851fb30e4ffe /pvfsmnt/wr.10000000 > 669daa04f91f561f5fb2851fb30e4ffe /tmp/wr.10000000 > > 5.6.0hubcap: > > $ ./wr.sh 10000000 > /pvfsmnt/wr.10000000 > $ dd if=/pvfsmnt/wr.10000000 of=/tmp/wr.10000000 count=10 bs=419430400 > dd: error reading '/pvfsmnt/wr.10000000': Bad address > 0+0 records in > 0+0 records out > 0 bytes copied, 10.3365 s, 0.0 kB/s Sounds reasonable, I'll queue this up after this next round of releases in the next few days, thanks! greg k-h