Re: [PATCH 00/20] Add support for shared PTEs across processes

David Hildenbrand <david@xxxxxxxxxx> · Tue, 28 Jan 2025 10:21:23 +0100

On 27.01.25 23:33, Andrew Morton wrote:
On Fri, 24 Jan 2025 15:54:34 -0800 Anthony Yznaga <anthony.yznaga@xxxxxxxxxx> wrote:

Memory pages shared between processes require page table entries
(PTEs) for each process. Each of these PTEs consume some of
the memory and as long as the number of mappings being maintained
is small enough, this space consumed by page tables is not
objectionable. When very few memory pages are shared between
processes, the number of PTEs to maintain is mostly constrained by
the number of pages of memory on the system. As the number of shared
pages and the number of times pages are shared goes up, amount of
memory consumed by page tables starts to become significant. This
issue does not apply to threads. Any number of threads can share the
same pages inside a process while sharing the same PTEs. Extending
this same model to sharing pages across processes can eliminate this
issue for sharing across processes as well.

...

API
===

mshare does not introduce a new API. It instead uses existing APIs
to implement page table sharing. The steps to use this feature are:

1. Mount msharefs on /sys/fs/mshare -
         mount -t msharefs msharefs /sys/fs/mshare

2. mshare regions have alignment and size requirements. Start
    address for the region must be aligned to an address boundary and
    be a multiple of fixed size. This alignment and size requirement
    can be obtained by reading the file /sys/fs/mshare/mshare_info
    which returns a number in text format. mshare regions must be
    aligned to this boundary and be a multiple of this size.

3. For the process creating an mshare region:
         a. Create a file on /sys/fs/mshare, for example -
                 fd = open("/sys/fs/mshare/shareme",
                                 O_RDWR|O_CREAT|O_EXCL, 0600);

         b. Establish the starting address and size of the region
                 struct mshare_info minfo;

                 minfo.start = TB(2);
                 minfo.size = BUFFER_SIZE;
                 ioctl(fd, MSHAREFS_SET_SIZE, &minfo)
>>>>          c. Map some memory in the region
                 struct mshare_create mcreate;

                 mcreate.addr = TB(2);
>>                  mcreate.size = BUFFER_SIZE;>> 
mcreate.offset = 0;
                 mcreate.prot = PROT_READ | PROT_WRITE;
                 mcreate.flags = MAP_ANONYMOUS | MAP_SHARED | MAP_FIXED;
                 mcreate.fd = -1;

                 ioctl(fd, MSHAREFS_CREATE_MAPPING, &mcreate)

I'm not really understanding why step a exists.  It's basically an
mmap() so why can't this be done within step d?

Conceptually, it's defining the content of the virtual file: by creating 
mappings/unmapping mappings/changing mappings. Some applications will 
require multiple different mappings in such a virtual file.

Processes mmap the resulting virtual file.

--
Cheers,

David / dhildenb