On 06/29/2015 12:38 PM, Michal Hocko wrote: > On Mon 29-06-15 12:23:16, Nikolay Borisov wrote: >> >> >> On 06/29/2015 12:16 PM, Michal Hocko wrote: >>> On Mon 29-06-15 12:07:54, Nikolay Borisov wrote: >>>> >>>> >>>> On 06/29/2015 11:32 AM, Michal Hocko wrote: >>>>> On Thu 25-06-15 18:27:10, Nikolay Borisov wrote: >>>>>> >>>>>> >>>>>> On 06/25/2015 06:18 PM, Michal Hocko wrote: >>>>>>> On Thu 25-06-15 17:34:22, Nikolay Borisov wrote: >>>>>>>> On 06/25/2015 05:05 PM, Michal Hocko wrote: >>>>>>>>> On Thu 25-06-15 16:49:43, Nikolay Borisov wrote: >>>>>>>>> [...] >>>>>>>>>> How would you advise to rectify such situation? >>>>>>>>> >>>>>>>>> As I've said. Check the oom victim traces and see if it is holding any >>>>>>>>> of those locks. >>>>>>>> >>>>>>>> As mentioned previously all OOM traces are identical to the one I've >>>>>>>> sent - OOM being called form the page fault path. >>>>>>> >>>>>>> By identical you mean that all of them kill the same task? Or just that >>>>>>> the path is same (which wouldn't be surprising as this is the only path >>>>>>> which triggers memcg oom killer)? >>>>>> >>>>>> The code path is the same, the tasks being killed are different >>>>> >>>>> Is the OOM killer triggered only for a singe memcg or others misbehave >>>>> as well? >>>> >>>> Generally OOM would be triggered for whichever memcg runs out of >>>> resources but so far I've only observed that the D state issue happens >>>> in a single containers. >>> >>> It is not clear whether it is the OOM memcg which has tasks in the D >>> state. Anyway I think it all smells like one memcg is throttling others >>> on another shared resource - journal in your case. >> >> Be that as it may, how do I find which cgroup is the culprit? > > Ted has already described that. You have to check all the running tasks > and try to find which of them is doing the operation which blocks > others. Transaction commit sounds like the first one to check. One other, fairly crucial detail - each and every container is on a separate block device, meaning the journals for different block devices is not being shared, since the journal is per-block device. I guess this means that whatever is happening is more or less constrained to the block device and thus the possibility that different memcg competing for the journal can be eliminated? > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html