Re: [RFC PATCH 0/6] Add support for shared PTEs across processes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 26, 2022 at 04:04:48AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > > > I think zero-API approach (plus madvise() hints to tweak it) is worth
> > > > > considering.
> > > > 
> > > > I think the zero-API approach actually misses out on a lot of
> > > > possibilities that the mshare() approach offers.  For example, mshare()
> > > > allows you to mmap() many small files in the shared region -- you
> > > > can't do that with zeroAPI.
> > > 
> > > Do you consider a use-case for many small files to be common? I would
> > > think that the main consumer of the feature to be mmap of huge files.
> > > And in this case zero enabling burden on userspace side sounds like a
> > > sweet deal.
> > 
> > mmap() of huge files is certainly the Oracle use-case.  With occasional
> > funny business like mprotect() of a single page in the middle of a 1GB
> > hugepage.
> 
> Bill and I were talking about this earlier and realised that this is
> the key point.  There's a requirement that when one process mprotects
> a page that it gets protected in all processes.  You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.

"hurr, durr, we are Oracle" :P

Sounds like a very niche requirement. I doubt there will more than single
digit user count for the feature. Maybe only the DB.

> So how about something like this ...
> 
> int mcreate(const char *name, int flags, mode_t mode);
> 
> creates a new mm_struct with a refcount of 2.  returns an fd (one
> of the two refcounts) and creates a name for it (inside msharefs,
> holds the other refcount).
> 
> You can then mmap() that fd to attach it to a chunk of your address
> space.  Once attached, you can start to populate it by calling
> mmap() and specifying an address inside the attached mm as the first
> argument to mmap().

That is not what mmap() would normally do to an existing mapping. So it
requires special treatment.

In general mmap() of a mm_struct scares me. I can't wrap my head around
implications.

Like how does it work on fork()?

How accounting works? What happens on OOM?

What prevents creating loops, like mapping a mm_struct inside itself?

What mremap()/munmap() do to such mapping? Will it affect mapping of
mm_struct or will it target mapping inside the mm_sturct?

Maybe it just didn't clicked for me, I donno.

> Maybe mcreate() is just a library call, and it's really a thin wrapper
> around open() that happens to know where msharefs is mounted.

-- 
 Kirill A. Shutemov




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux