Re: [LSF/MM/BPF TOPIC] SLUB allocator, mainly the sheaves caching layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 24, 2025 at 10:12 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote:
>
> On Mon, Feb 24, 2025 at 07:46:52PM +0100, Mateusz Guzik wrote:
> > On Mon, Feb 24, 2025 at 10:02:09AM -0800, Shakeel Butt wrote:
> > > What about pre-memcg-charged sheaves? We had to disable memcg charging
> > > of some kernel allocations and I think sheaves can help in reenabling
> > > it.
> >
> > It has been several months since last I looked at memcg, so details are
> > fuzzy and I don't have time to refresh everything.
> >
> > However, if memory serves right the primary problem was the irq on/off
> > trip associated with them (sometimes happening twice, second time with
> > refill_obj_stock()).
> >
> > I think the real fix(tm) would recognize only some allocations need
> > interrupt safety -- as in some slabs should not be allowed to be used
> > outside of the process context. This is somewhat what sheaves is doing,
> > but can be applied without fronting the current kmem caching mechanism.
> > This may be a tough sell and even then it plays whackamole with patching
> > up all consumers.
> >
> > Suppose it is not an option.
> >
> > Then there are 2 ways that I considered.
> >
> > The easiest splits memcg accounting for irq and process level -- similar
> > to what localtry thing is doing. this would only cost preemption off/on
> > trip in the common case and a branch on the current state. But suppose
> > this is a no-go as well.
>
> Have you seen 559271146efc ("mm/memcg: optimize user context object
> stock access"). It got reverted for RT (or something). Maybe we can look
> at it again.
>

Huh. I have not it, it does look like the same core idea.

Even if RT itself is the problem, perhaps this could be made build
time conditional on it?

> >
> > My primary idea was using hand-rolled sequence counters and local 8-byte
> > cmpxchg (*without* the lock prefix, also not to be confused with 16-byte
> > used by the current slub fast path). Should this work, it would be
> > significantly faster than irq trips.
> >
> > The irq thing is there only to facilitate several fields being updated
> > or memcg itself getting replaced in an atomic manner for process vs
> > interrupt context.
> >
> > The observation is that all values which are getting updated are 4
> > bytes. Then perhaps an additional counter can be added next to each one
> > so that an 8-byte cmpxchg is going to fail should an irq swoop in and
> > change stuff from under us.
> >
> > The percpu state would have a sequence counter associated with the
> > assigned memcg_stock_pcp. The memcg_stock_pcp object would have the same
> > value replicated inside for every var which can be updated in the fast
> > path.
> >
> > Then the fast path would only succeed if the value read off from per-cpu
> > did not change vs what's in the stock thing.
> >
> > Any change to memcg_stock_pcp (e.g., rolling up bytes after passing the
> > page size threshold) would disable interrupts and modify all these
> > counters.
> >
> > There is some more work needed to make sure the stock obj can be safely
> > swapped out for a new one and not accidentally have a value which lines
> > up with the prevoius one, I don't remember what I had for that (and yes,
> > I recognize a 4 byte value will invariably roll over and *in principle*
> > a conflict will be possible).
> >
> > This is a rough outline since Vlasta keeps prodding me about it.
>
> By chance do you have this code lying around somewhere? Not saying this
> is the way to go but wanted to take a look.

Sorry mate, there was a lot of handwaving produced around this and
kmem fast paths, but no code. :)

Conceptually though I think this is pretty straightforward.

Anyhow, I forgot to mention another angle: perhaps a kernel-equivalent
of rseq could be somehow employed here?

As in you prep the op. Should an interrupt come in, it can detect you
were going to execute it and redirect your IP to a fallback or just
restart. I have no idea how feasible this is here, food for thought.
-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux