Re: [LSF/MM/BPF TOPIC] Single Owner Memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 21, 2023 at 12:16:27PM -0500, Pasha Tatashin wrote:
> > > > Obviously once we get to dynamically allocated memdescs, this whole
> > > > thing goes away, so I'm not excited about making big changes to the
> > > > kernel to support this.
> > >
> > > This is why the changes that I am thinking about are going to be
> > > mostly localized in a separate driver and do not alter the core mm
> > > much. However, even with memdesc, today the Single Owner Memory is not
> > > singled out from the rest of memory types (shared, anon, named), so I
> > > do not expect the memdescs can provide saving or optimizations for
> > > this specific use case.
> >
> > With memdescs, let's suppose the malloc library asks for a 256kB
> > allocation.  You end up using 8 bytes per page for the memdesc pointer
> > (512 bytes) plus around 96 bytes for the folio that's used by the anon
> > memory (assuming appropriate hinting / heuristics that says "Hey, treat
> > this as a single allocation").
> 
> Also, the 256kB should be physically contiguous, right? Hopefully,
> fragmentation is not going to be an issue, but we might need to look
> into increasing the page migration enforcements in order to reduce
> fragmentations  during allocs, and thus reduce the memory overheads.
> Today, fragmentation can potentially reduce the performance when THPs
> are not available but in the future with memdescs the fragmentation
> might also effect the memory overhead. We might need to look into
> changing some of the migration policies.

Yes, folios are always physically, virtually and logically contiguous,
and aligned, just like compound pages are today.  No plans to change that.

With more parts of the kernel using larger allocations, larger allocations
are easier to come by.  Clean pagecache is the easiest type of memory
to reclaim, and if the filesystem is using 64kB allocations instead of
4kB allocations, finding a contiguous 256kB only needs four consecutive
allocations to be freed rather than 64.  And if the page cache is trying
to allocate large contiguous amounts of memory, it's going to be kicking
kcompactd to make those happen more often.

We're never going to beat fragmentation all the time, but when we
lose to it, it just means that we end up allocating smaller folios,
not failing entirely.

> 1. Independent memory pool.
> While /dev/som itself always manages memory in 2M chunks it can be
> configured to use memory from HugeTLB (2M or 1G), devdax, or kernel
> external memory (i.e. memory that is not part of System RAM).

I'm not certain that's a good idea.  That memory isn't part of system
ram for a reason; maybe it's worse performance (or it's being saved
for better performance).  Handing it out to random users is going to
give unexpected performance problems.

> 2. Low overhead
> /dev/som will allocate memory from the pool in 1G chunks, and manage
> it in 2M chunks. This will allow low memory overhead management via
> bitmaps. List/tree of 2M chunks are going to be per user process, from
> where the faults on som vmas are going to be handled.

I'm not sure that it's going to be lower overhead than memdescs.
I have no idea what your data structures are going to be, so I can't do
an estimate.  I should warn you that I have a version of memdescs in mind
that has even more memory savings than the initial version (one pointer
per allocation instead of one pointer per page), so there's definitely
room for improvement.

One benefit that you don't mention is that /dev/som can almost certainly
be implemented faster than memdescs.  But memdescs are going to give you
better savings, so it's really going to be up to you which you want to
work on.

If it's a Google special that you keep internal, then I don't mind.  I'm
not sure whether we would still want to support the /dev/som interface
upstream after memdescs lands.  Maybe not; there always has to be a
userspace fallback to a som-less approach for older kernels, so
perhaps it can just be deleted after memdescs lands.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux