On 14/01/2021 21:50, Martin Raiber wrote: > On 10.01.2021 17:50 Martin Raiber wrote: >> On 09.01.2021 21:32 Pavel Begunkov wrote: >>> On 09/01/2021 16:58, Martin Raiber wrote: >>>> On 09.01.2021 17:23 Jens Axboe wrote: >>>>> On 1/8/21 4:39 PM, Martin Raiber wrote: >>>>>> Hi, >>>>>> >>>>>> I have a gnarly issue with io_uring and fixed buffers (fixed >>>>>> read/write). It seems the contents of those buffers contain old data in >>>>>> some rare cases under memory pressure after a read/during a write. >>>>>> >>>>>> Specifically I use io_uring with fuse and to confirm this is not some >>>>>> user space issue let fuse print the unique id it adds to each request. >>>>>> Fuse adds this request data to a pipe, and when the pipe buffer is later >>>>>> copied to the io_uring fixed buffer it has the id of a fuse request >>>>>> returned earlier using the same buffer while returning the size of the >>>>>> new request. Or I set the unique id in the buffer, write it to fuse (via >>>>>> writing to a pipe, then splicing) and then fuse returns with e.g. >>>>>> ENOENT, because the unique id is not correct because in kernel it reads >>>>>> the id of the previous, already completed, request using this buffer. >>>>>> >>>>>> To make reproducing this faster running memtester (which mlocks a >>>>>> configurable amount of memory) with a large amount of user memory every >>>>>> 30s helps. So it has something to do with swapping? It seems to not >>>>>> occur if no swap space is active. Problem occurs without warning when >>>>>> the kernel is build with KASAN and slab debugging. >>>>>> >>>>>> If I don't use the _FIXED opcodes (which is easy to do), the problem >>>>>> does not occur. >>>>>> >>>>>> Problem occurs with 5.9.16 and 5.10.5. >>>>> Can you mention more about what kind of IO you are doing, I'm assuming >>>>> it's O_DIRECT? I'll see if I can reproduce this. >>>> It's writing to/reading from pipes (nonblocking, no O_DIRECT). >>> A blind guess, does it handle short reads and writes? If not, can you >>> check whether they happen or not? >> >> Something like this was what I suspected at first as well. It does check for short read/writes and I added (unnecessary -- because the fuse request structure is 40 bytes and it does io in page sizes) code for retrying short reads at some point. I also checked for the pipes to be empty before they are used at some point and let the kernel log allocation failures (idea was that it was short pipe read/writes because of allocation failure or that something doesn't get rewound properly in this case). Beyond that three things that make a user space problem unlikely: >> >> - occurs only when using fixed buffers and does not occur when running same code without fixed buffer opcodes >> - doesn't occur when there is no memory pressure >> - I added print(k/f) logging that pointed me in this direction as well >> >>>> I can reproduce it with https://github.com/uroni/fuseuring on e.g. a 2GB VPS. Modify bench.sh so that fio loops. Add swap, then run 1400M memtester while it runs (so it swaps, I guess). I can try further reducing the reproducer, but I wanted to avoid that work in case it is something obvious. The next step would be to remove fuse from the equation -- it does try to move the pages from the pipe when splicing to it, for example. > > When I use 5.10.7 with 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 ("mm: do_wp_page() simplification"), 1a0cf26323c80e2f1c58fc04f15686de61bfab0c ("mm/ksm: Remove reuse_ksm_page()") and be068f29034fb00530a053d18b8cf140c32b12b3 ("mm: fix misplaced unlock_page in do_wp_page()") reverted the issue doesn't seem to occur. Thanks for tracking it down. Was it reported to Linus and Peter? -- Pavel Begunkov