On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > So I don't see how we can make it really cheap (say, less than 5% overhead) > > without caching pre-accounted objects. > > Maybe this is what we want. Now we are down to just SLUB, maybe such > caching of pre-accounted objects can be in SLUB layer and we can > decide to keep this caching per-kmem-cache opt-in or always on. So it turns out that we have another case of SLAB_ACCOUNT being quite a big expense, and it's actually the normal - but failed - open() or execve() case. See the thread at https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@xxxxxxxxxxxxxx/ and to see the effect in profiles, you can use this EXTREMELY stupid test program: #include <fcntl.h> int main(int argc, char **argv) { for (int i = 0; i < 10000000; i++) open("nonexistent", O_RDONLY); } where the point of course is that the "nonexistent" pathname doesn't actually exist (so don't create a file called that for the test). What happens is that open() allocates a 'struct file *' early from the filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting for it, failt the pathname open, and free it again, which uncharges the accounting. Now, in this case, I actually have a suggestion: could we please just make SLAB_ACCOUNT be something that we do *after* the allocation, kind of the same way the zeroing works? IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make slab_post_alloc_hook() do all the "charge the memcg if required". Obviously that means that now a failure to charge the memcg would have to then de-allocate things, but that's an uncommon path and would be marked unlikely and not be in the hot path at all. Now, the reason I would prefer that is that the *second* step would be to (a) expose a "kmem_cache_charge()" function that takes a *non*-accounted slab allocation, and turns it into an accounted one (and obviously this is why you want to do everything in the post-alloc hook: just try to share this code) (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file allocations start out unaccounted. (c) when we have *actually* looked up the pathname and open the file successfully, at *that* point we'd do a error = kmem_cache_charge(filp_cachep, file); in do_dentry_open() to turn the unaccounted file pointer into an accounted one (and if that fails, we do the cleanup and free it, of course, exactly like we do when file_get_write_access() fails) which means that now the failure case doesn't unnecessarily charge the allocation that never ends up being finalized. NOTE! I think this would clean up mm/slub.c too, simply because it would get rid of that memcg_slab_pre_alloc_hook() entirely, and get rid of the need to carry the "struct obj_cgroup **objcgp" pointer along until the post-alloc hook: everything would be done post-alloc. The actual kmem_cache_free() code already deals with "this slab hasn't been accounted" because it obviously has to deal with allocations that were done without __GFP_ACCOUNT anyway. So there's no change needed on the freeing path, it already has to handle this all gracefully. I may be missing something, but it would seem to have very little downside, and fix a case that actually is visible right now. Linus