On Sun, Oct 8, 2017 at 11:24 PM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Fri 06-10-17 12:33:03, Shakeel Butt wrote: >> >> names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0, >> >> - SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL); >> >> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); >> > >> > I might be wrong but isn't name cache only holding temporary objects >> > used for path resolution which are not stored anywhere? >> > >> >> Even though they're temporary, many containers can together use a >> significant amount of transient uncharged memory. We've seen machines >> with 100s of MiBs in names_cache. > > Yes that might be possible but are we prepared for random ENOMEM from > vfs calls which need to allocate a temporary name? > I looked at all the syscalls which invoke allocations from 'names_cache' and tried to narrow down whose man page does not mention that they can return ENOMEM. I found couple of syscalls like truncate(), readdir() & getdents() which does not mention that they can return ENOMEM but this patch will make them return ENOMEM. >> >> >> filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, >> >> - SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL); >> >> + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, NULL); >> >> percpu_counter_init(&nr_files, 0, GFP_KERNEL); >> >> } >> > >> > Don't we have a limit for the maximum number of open files? >> > >> >> Yes, there is a system limit of maximum number of open files. However >> this limit is shared between different users on the system and one >> user can hog this resource. To cater that, we set the maximum limit >> very high and let the memory limit of each user limit the number of >> files they can open. > > Similarly here. Are all syscalls allocating a fd prepared to return > ENOMEM? For filp, I found _sysctl(). However the man page says not to use it. On Tue, Oct 10, 2017 at 2:10 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Mon 09-10-17 20:17:54, Michal Hocko wrote: >> the primary concern for this patch was whether we really need/want to >> charge short therm objects which do not outlive a single syscall. > > Let me expand on this some more. What is the benefit of kmem accounting > of such an object? It cannot stop any runaway as a syscall lifetime > allocations are bound to number of processes which we kind of contain by > other means. We can contain by limited the number of processes or thread but for us applications having thousands of threads is very common. So, limiting the number of threads/processes will not work. > If we do account then we put a memory pressure due to > something that cannot be reclaimed by no means. Even the memcg OOM > killer would simply kick a single path while there might be others > to consume the same type of memory. > > So what is the actual point in accounting these? Does it help to contain > any workload better? What kind of workload? > I think the benefits will be isolation and more accurate billing. As I have said before we have observed 100s of MiBs in names_cache on many machines and cumulative amount is not something we can ignore as just memory overhead. > Or am I completely wrong and name objects can outlive a syscall > considerably? > No, I didn't find any instance of the name objects outliving the syscall. Anyways, we can discuss more on names_cache, do you have any objection regarding charging filp?