On Sun, Mar 01, 2015 at 02:36:35PM -0500, Theodore Ts'o wrote: > On Sun, Mar 01, 2015 at 11:15:06AM -0500, Johannes Weiner wrote: > > > > We had these lockups in cgroups with just a handful of threads, which > > all got stuck in the allocator and there was nobody left to volunteer > > unreclaimable memory. When this was being addressed, we knew that the > > same can theoretically happen on the system-level but weren't aware of > > any reports. Well now, here we are. > > I think the "few threads in a small" cgroup problem is a little > difference, because in those cases very often the global system has > enough memory, and there is always the possibility that we might relax > the memory cgroup guarantees a little in order to allow forward > progress. That's exactly how we fixed it. __GFP_NOFAIL are allowed to simply bypass the cgroup memory limits when reclaim within the group fails to make room for the allocation. I'm just mentioning that because the global case doesn't have the same out, but is susceptible to the same deadlock situation when there are no other threads volunteering pages. If your machines are loaded with hundreds or thousands of threads, the chances that a thread stuck in the allocator will be bailed out by the other threads in the system is likely (or that you run into CPU limits first), but if you have only a handful of memory-intensive tasks, this might not be the case. The cgroup problem was closer to that second scenario, where few threads split all available memory between them. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs