Re: [PATCH v10 29/35] memcg: per-memcg kmem shrinking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/06/2013 01:49 PM, Andrew Morton wrote:
> On Thu, 6 Jun 2013 12:35:33 +0400 Glauber Costa <glommer@xxxxxxxxxxxxx> wrote:
> 
>> On 06/06/2013 03:08 AM, Andrew Morton wrote:
>>>> +
>>>>> +		/*
>>>>> +		 * We will try to shrink kernel memory present in caches. We
>>>>> +		 * are sure that we can wait, so we will. The duration of our
>>>>> +		 * wait is determined by congestion, the same way as vmscan.c
>>>>> +		 *
>>>>> +		 * If we are in FS context, though, then although we can wait,
>>>>> +		 * we cannot call the shrinkers. Most fs shrinkers (which
>>>>> +		 * comprises most of our kmem data) will not run without
>>>>> +		 * __GFP_FS since they can deadlock. The solution is to
>>>>> +		 * synchronously run that in a different context.
>>> But this is pointless.  Calling a function via a different thread and
>>> then waiting for it to complete is equivalent to calling it directly.
>>>
>> Not in this case. We are in wait-capable context (we check for this
>> right before we reach this), but we are not in fs capable context.
>>
>> So the reason we do this - which I tried to cover in the changelog, is
>> to escape from the GFP_FS limitation that our call chain has, not the
>> wait limitation.
> 
> But that's equivalent to calling the code directly.  Look:
> 
> some_fs_function()
> {
> 	lock(some-fs-lock);
> 	...
> }
> 
> some_other_fs_function()
> {
> 	lock(some-fs-lock);
> 	alloc_pages(GFP_NOFS);
> 	->...
> 	  ->schedule_work(some_fs_function);
> 	    flush_scheduled_work();
> 
> that flush_scheduled_work() won't complete until some_fs_function() has
> completed.  But some_fs_function() won't complete, because we're
> holding some-fs-lock.
> 

In my experience during this series, most of the kmem allocation here
will be filesystem related. This means that we will allocate that with
GFP_FS on. If we don't do anything like that, reclaim is almost
pointless since it will never free anything (only once here and there
when the allocation is not from fs).

It tend to work just fine like this. It may very well be because fs
people just mark everything as NOFS out of safety and we aren't *really*
holding any locks in common situations, but it will blow in our faces in
a subtle way (which none of us want).

That said, suggestions are more than welcome.




--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux