Re: XFS: possible memory allocation deadlock in kmem_alloc on glusterfs setup

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 5 Dec 2016 18:46:45 +1100

On Sun, Dec 04, 2016 at 05:47:04PM -0800, Cyril Peponnet wrote:
> > On Dec 4, 2016, at 5:22 PM, Dave Chinner <david@xxxxxxxxxxxxx>
> > wrote: What deadlock is that? XFS is reporting memory allocation
> > issues, not that there is a filesystem deadlock. Your comments
> > that dropping caches make the problem go away indicate that
> > there isn't any deadlock, just blocking on memory allocation
> > that is taking a long time to resolve…
> 
> You are right by deadlock I meant that mount point hang for ls
> commands for instance (from the server it self) and basically the
> glusterfs mount on the hypervisors is also hanging for all vms (it
> makes the vms to remount RO).

So everything is /blocked/, not deadlocked. If the memory allocation
then makes progress, we're all ok.

> On this server I have 1200 living snapshots, usually not too much
> write in it except for some of them…
> 
> Is there a way to optimize the memory allocation? Or we likely
> need to add more ram…

More RAM won't help - it's likely a memory fragmentation problem
made worse by the fact that older kernels won't do memory compaction
(i.e. memory defrag) when high order memory allocation fails. If the
problem is file fragmentation, then the best you can probably do
right now is limit fragmentation.

Stopping qcow2 from fragmenting the crap out of the files will
prevent the problem from re-occurring. This is the primary use case
for extent size hints in XFS these days, though I know you can't
actually control the gluster back end to use this. perhaps there are
other tweaks to qcow formats (cluster size?) or gluster config to
limit the amount of fragmentation....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html