Re: [LSF/MM/BPF TOPIC] Single Owner Memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 22, 2023 at 11:18 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Tue, Feb 21, 2023 at 12:16:27PM -0500, Pasha Tatashin wrote:
> > > > > Obviously once we get to dynamically allocated memdescs, this whole
> > > > > thing goes away, so I'm not excited about making big changes to the
> > > > > kernel to support this.
> > > >
> > > > This is why the changes that I am thinking about are going to be
> > > > mostly localized in a separate driver and do not alter the core mm
> > > > much. However, even with memdesc, today the Single Owner Memory is not
> > > > singled out from the rest of memory types (shared, anon, named), so I
> > > > do not expect the memdescs can provide saving or optimizations for
> > > > this specific use case.
> > >
> > > With memdescs, let's suppose the malloc library asks for a 256kB
> > > allocation.  You end up using 8 bytes per page for the memdesc pointer
> > > (512 bytes) plus around 96 bytes for the folio that's used by the anon
> > > memory (assuming appropriate hinting / heuristics that says "Hey, treat
> > > this as a single allocation").
> >
> > Also, the 256kB should be physically contiguous, right? Hopefully,
> > fragmentation is not going to be an issue, but we might need to look
> > into increasing the page migration enforcements in order to reduce
> > fragmentations  during allocs, and thus reduce the memory overheads.
> > Today, fragmentation can potentially reduce the performance when THPs
> > are not available but in the future with memdescs the fragmentation
> > might also effect the memory overhead. We might need to look into
> > changing some of the migration policies.
>
> Yes, folios are always physically, virtually and logically contiguous,
> and aligned, just like compound pages are today.  No plans to change that.
>
> With more parts of the kernel using larger allocations, larger allocations
> are easier to come by.  Clean pagecache is the easiest type of memory
> to reclaim, and if the filesystem is using 64kB allocations instead of
> 4kB allocations, finding a contiguous 256kB only needs four consecutive
> allocations to be freed rather than 64.  And if the page cache is trying
> to allocate large contiguous amounts of memory, it's going to be kicking
> kcompactd to make those happen more often.
>
> We're never going to beat fragmentation all the time, but when we
> lose to it, it just means that we end up allocating smaller folios,
> not failing entirely.
>
> > 1. Independent memory pool.
> > While /dev/som itself always manages memory in 2M chunks it can be
> > configured to use memory from HugeTLB (2M or 1G), devdax, or kernel
> > external memory (i.e. memory that is not part of System RAM).
>
> I'm not certain that's a good idea.  That memory isn't part of system
> ram for a reason; maybe it's worse performance (or it's being saved
> for better performance).  Handing it out to random users is going to
> give unexpected performance problems.

Even today such memory can be given to users via hot-plug: convert
pmem into devdax, and hotplug that into memory. However, that is not
ideal for several reasons: struct page overhead, mixes the memory with
the rest of the pages in the system, no easy way to enforce latency
policies etc.

Also, the kernel external memory does not have to be different from
regular ram, it can be regular memory where memmap kernel parameter
was used to remove most of the memory from the kernel management for
faster booting, and lower overhead. /dev/som can use such memory.

>
> > 2. Low overhead
> > /dev/som will allocate memory from the pool in 1G chunks, and manage
> > it in 2M chunks. This will allow low memory overhead management via
> > bitmaps. List/tree of 2M chunks are going to be per user process, from
> > where the faults on som vmas are going to be handled.
>
> I'm not sure that it's going to be lower overhead than memdescs.
> I have no idea what your data structures are going to be, so I can't do
> an estimate.  I should warn you that I have a version of memdescs in mind
> that has even more memory savings than the initial version (one pointer
> per allocation instead of one pointer per page), so there's definitely
> room for improvement.
>
> One benefit that you don't mention is that /dev/som can almost certainly
> be implemented faster than memdescs.  But memdescs are going to give you
> better savings, so it's really going to be up to you which you want to
> work on.

This right, but I specifically did not mention this benefit because I
actually want /dev/som to be compatible with memdesc in the long run
as well. Would you like to chat about the other potential memory
savings that you have in mind for memdescs during LSF/MM, perhaps a
brainstorming session?

>
> If it's a Google special that you keep internal, then I don't mind.  I'm
> not sure whether we would still want to support the /dev/som interface
> upstream after memdescs lands.  Maybe not; there always has to be a
> userspace fallback to a som-less approach for older kernels, so
> perhaps it can just be deleted after memdescs lands.

I have several benefits of /dev/som envisioned for a virtualized
environment, that I am not sure how to resolve without using it. We
might need it even after memdesc is fully implemented.

Thanks,
Pasha




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux