Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Linus,

On Sun, Jan 21, 2024 at 9:16 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > >
> > > So I don't see how we can make it really cheap (say, less than 5% overhead)
> > > without caching pre-accounted objects.
> >
> > Maybe this is what we want. Now we are down to just SLUB, maybe such
> > caching of pre-accounted objects can be in SLUB layer and we can
> > decide to keep this caching per-kmem-cache opt-in or always on.
>
> So it turns out that we have another case of SLAB_ACCOUNT being quite
> a big expense, and it's actually the normal - but failed - open() or
> execve() case.
>
> See the thread at
>
>     https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@xxxxxxxxxxxxxx/
>
> and to see the effect in profiles, you can use this EXTREMELY stupid
> test program:
>
>     #include <fcntl.h>
>
>     int main(int argc, char **argv)
>     {
>         for (int i = 0; i < 10000000; i++)
>                 open("nonexistent", O_RDONLY);
>     }
>
> where the point of course is that the "nonexistent" pathname doesn't
> actually exist (so don't create a file called that for the test).
>
> What happens is that open() allocates a 'struct file *' early from the
> filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
> for it, failt the pathname open, and free it again, which uncharges
> the accounting.
>
> Now, in this case, I actually have a suggestion: could we please just
> make SLAB_ACCOUNT be something that we do *after* the allocation, kind
> of the same way the zeroing works?
>
> IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
> slab_post_alloc_hook() do all the "charge the memcg if required".
>
> Obviously that means that now a failure to charge the memcg would have
> to then de-allocate things, but that's an uncommon path and would be
> marked unlikely and not be in the hot path at all.
>
> Now, the reason I would prefer that is that the *second* step would be to
>
>  (a) expose a "kmem_cache_charge()" function that takes a
> *non*-accounted slab allocation, and turns it into an accounted one
> (and obviously this is why you want to do everything in the post-alloc
> hook: just try to share this code)
>
>  (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
> allocations start out unaccounted.
>
>  (c) when we have *actually* looked up the pathname and open the file
> successfully, at *that* point we'd do a
>
>         error = kmem_cache_charge(filp_cachep, file);
>
>     in do_dentry_open() to turn the unaccounted file pointer into an
> accounted one (and if that fails, we do the cleanup and free it, of
> course, exactly like we do when file_get_write_access() fails)
>
> which means that now the failure case doesn't unnecessarily charge the
> allocation that never ends up being finalized.
>
> NOTE! I think this would clean up mm/slub.c too, simply because it
> would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
> rid of the need to carry the "struct obj_cgroup **objcgp" pointer
> along until the post-alloc hook: everything would be done post-alloc.
>
> The actual kmem_cache_free() code already deals with "this slab hasn't
> been accounted" because it obviously has to deal with allocations that
> were done without __GFP_ACCOUNT anyway. So there's no change needed on
> the freeing path, it already has to handle this all gracefully.
>
> I may be missing something, but it would seem to have very little
> downside, and fix a case that actually is visible right now.
>

Thanks for the idea. Actually I had a discussion with Vlastimil at LPC
'22 on a similar idea but for a different problem i.e. allocation in
irq context or more specifically charging sockets for incoming TCP
connections. If you see inet_csk_accept() we solved this for TCP
memory by charging at accept() syscall but the kernel memory holding
socket is still uncharged.

So, I think having the framework you suggested would help more
use-cases. I will take a stab at this in the next couple of days
(sorry stuck in some other stuff atm).

thanks,
Shakeel





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux