On 15.05.23 13:31, Lorenzo Stoakes wrote:
On Sun, May 14, 2023 at 10:14:46PM -0700, Christoph Hellwig wrote:
On Sun, May 14, 2023 at 08:20:04PM +0100, Lorenzo Stoakes wrote:
As discussed at LSF/MM, on the flight over I wrote a little repro [0] which
reliably triggers the ext4 warning by recreating the scenario described
above, using a small userland program and kernel module.
This code is not perfect (plane code :) but does seem to do the job
adequately, also obviously this should only be run in a VM environment
where data loss is acceptable (in my case a small qemu instance).
It would be really awesome if you could wire it up with and submit it
to xfstests.
Sure am happy to take a look at that! Also happy if David finds it useful in any
way for this unit tests.
I played with a simple selftest that would reuse the existing gup_test
infrastructure (adding PIN_LONGTERM_TEST_WRITE), and try reproducing an
actual data corruption.
So far, I was not able to reproduce any corruption easily without your
patches, because d824ec2a1546 ("mm: do not reclaim private data from
pinned page") seems to mitigate most of it.
So ... before my patches (adding PIN_LONGTERM_TEST_WRITE) I cannot test
it from a selftest, with d824ec2a1546 ("mm: do not reclaim private data
from pinned page") I cannot reproduce and with your patches long-term
pinning just fails.
Long story short: I'll most probably not add such a test but instead
keep testing that long-term pinning works/fails now as expected, based
on the FS type.
The kernel module interface is a bit sketchy (it takes a user address which it
blindly pins for you) so it's not something that should be run in any unsafe
environment but as long as we are ok with that :)
I can submit the PIN_LONGTERM_TEST_WRITE extension, that would allow to
test with a stock kernel that has the module compiled in. It won't allow
!longterm, though (it would be kind-of hacky to have !longterm
controlled by user space, even if it's a GUP test module).
Finding an actual reproducer using existing pinning functionality would
be preferred. For example, using O_DIRECT (should be possible even
before it starts using FOLL_PIN instead of FOLL_GET). That would be
highly racy then, but most probably not impossible.
Such (racy) tests are not a good fit for selftests.
Maybe I'll have a try later to reproduce with O_DIRECT.
--
Thanks,
David / dhildenb