Re: Buffered I/O broken on s390x with page faults disabled (gfs2)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 8 Mar 2022 16:22:35 -0800

On Tue, Mar 8, 2022 at 3:25 PM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
>
> Seems to be working on s390x for this test case at least; the kind of
> trace I'm getting is:

Good.

> This shows bursts of successful fault-ins in gfs2_file_read_iter
> (read_fault). The pauses in between might be caused by the storage;
> I'm not sure.

Don't know about the pauses, but the burst size might be impacted by that

+       const size_t max_size = 4 * PAGE_SIZE;

thing that limits how many calls to fixup_user_fault() we do per
fault_in_safe_writeable().

So it might be worth checking if that value seems to make any difference.

> I'd still let the caller of fault_in_safe_writeable() decide how much
> memory to fault in; the tight cap in fault_in_safe_writeable() doesn't
> seem useful.

Well, there are real latency concerns there - fixup_user_fault() is
not necessarily all that low-cost.

And it's actually going to be worse when we have the sub-page coloring
issues happening, and will need to probe at a 128-byte granularity
(not on s390, but on arm64).

At that point, we almost certainly will need to have a "probe user
space non-destructibly for writability" instruction (possibly
extending on our current arch_futex_atomic_op_inuser()
infrastructure).

So honestly, if you do IO on areas that will get page faults on them,
to some degree it's a "you are doing something stupid, you get what
you deserve". This code should _work_, it should not have to worry
about users having bad patterns (where "read data into huge cold
mappings under enough memory pressure that it causes access bit faults
in the middle of the read" very much counts as such a bad pattern).

> Also, you want to put in an extra L here:
> > Signed-off-by: linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>

Heh. Fixed locally.

                 Linus