[no subject]

**Date** **Thread**

Granted, perhaps you'd only _ever_ be reading sequentially within a
specific VMA's boundaries, rather than going from one to another (excluding
PROT_NONE guards obviously) and that's very possible, if that's what you
mean.

But otherwise, surely this is a thing? And might we therefore be imposing
unnecessary cache misses?

Which is why I suggest...

>
> So yes, sequential read of a memory mapping of a file fragmented into many
> VMAs will be somewhat slower. My impression is such use is rare (sequential
> readers tend to use read(2) rather than mmap) but I could be wrong.
>
> > What about shared libraries with r/o parts and exec parts?
> >
> > I think we'd really need to do some pretty careful checking to ensure this
> > wouldn't break some real world use cases esp. if we really do mostly
> > readahead data from page cache.
>
> So I'm not sure if you are not conflating two things here because the above
> sentence doesn't make sense to me :). Readahead is the mechanism that
> brings data from underlying filesystem into the page cache. Fault-around is
> the mechanism that maps into page tables pages present in the page cache
> although they were not possibly requested by the page fault. By "do mostly
> readahead data from page cache" are you speaking about fault-around? That
> currently does not cross VMA boundaries anyway as far as I'm reading
> do_fault_around()...

...that we test this and see how it behaves :) Which is literally all I
am saying in the above. Ideally with representative workloads.

I mean, I think this shouldn't be a controversial point right? Perhaps
again I didn't communicate this well. But this is all I mean here.

BTW, I understand the difference between readahead and fault-around, you can
run git blame on do_fault_around() if you have doubts about that ;)

And yes fault around is constrained to the VMA (and actually avoids
crossing PTE boundaries).

>
> > > Regarding controlling readahead for various portions of the file - I'm
> > > skeptical. In my opinion it would require too much bookeeping on the kernel
> > > side for such a niche usecache (but maybe your numbers will show it isn't
> > > such a niche as I think :)). I can imagine you could just completely
> > > turn off kernel readahead for the file and do your special readahead from
> > > userspace - I think you could use either userfaultfd for triggering it or
> > > new fanotify FAN_PREACCESS events.
> >
> > I'm opposed to anything that'll proliferate VMAs (and from what Kalesh
> > says, he is too!) I don't really see how we could avoid having to do that
> > for this kind of case, but I may be missing something...
>
> I don't see why we would need to be increasing number of VMAs here at all.
> With FAN_PREACCESS you get notification with file & offset when it's
> accessed, you can issue readahead(2) calls based on that however you like.
> Similarly you can ask for userfaults for the whole mapped range and handle
> those. Now thinking more about this, this approach has the downside that
> you cannot implement async readahead with it (once PTE is mapped to some
> page it won't trigger notifications either with FAN_PREACCESS or with
> UFFD). But with UFFD you could at least trigger readahead on minor faults.

Yeah we're talking past each other on this, sorry I missed your point about
fanotify there!

uffd is probably not reasonably workable given overhead I would have
thought.

I am really unaware of how fanotify works so I mean cool if you can find a
solution this way, awesome :)

I'm just saying, if we need to somehow retain state about regions which
should have adjusted readahead behaviour at a VMA level, I can't see how
this could be done without VMA fragmentation and I'd rather we didn't.

If we can avoid that great!

>
> 								Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR