On Wed, Mar 9, 2022 at 10:42 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote: > > From what I took from the previous discussion, probing at a sub-page > granularity won't be necessary for bytewise copying: when the address > we're trying to access is poisoned, fault_in_*() will fail; when we get > a short result, that will take us to the poisoned address in the next > iteration. Sadly, that isn't actually the case. It's not the case for GUP (that page aligns things), and it's not the case for fault_in_writeable() itself (that also page aligns things). But more importantly, it's not actually the case for the *users* either. Not all of the users are byte-stream oriented, and I think it was btrfs that had a case of "copy a struct at the beginning of the stream". And if that copy failed, it wouldn't advance by as many bytes as it got - it would require that struct to be all fetched, and start from the beginning. So we do need to probe at least a minimum set of bytes. Probably a fairly small minimum, but still... > With a large enough buffer, a simple malloc() will return unmapped > pages, and reading into such a buffer will result in fault-in. So page > faults during read() are actually pretty normal, and it's not the user's > fault. Agreed. But that wasn't the case here: > In my test case, the buffer was pre-initialized with memset() to avoid > those kinds of page faults, which meant that the page faults in > gfs2_file_read_iter() only started to happen when we were out of memory. > But that's not the common case. Exactly. I do not think this is a case that we should - or need to - optimize for. And doing too much pre-faulting is actually counter-productive. > * Get rid of max_size: it really makes no sense to second-guess what the > caller needs. It's not about "what caller needs". It's literally about latency issues. If you can force a busy loop in kernel space by having one unmapped page and then do a 2GB read(), that's a *PROBLEM*. Now, we can try this thing, because I think we end up having other size limitations in the IO subsystem that means that the filesystem won't actually do that, but the moment I hear somebody talk about latencies, that max_size goes back. Linus