Re: XFS: possible memory allocation deadlock in kmem_alloc on glusterfs setup

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 7 Dec 2016 17:16:46 +1100

On Tue, Dec 06, 2016 at 09:54:37AM -0800, Cyril Peponnet wrote:
> 
> > On Dec 5, 2016, at 1:45 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > 
> > On Mon, Dec 05, 2016 at 07:51:45AM -0800, Cyril Peponnet wrote:
> >> I had the issue again but I don’t have more output in dmesg or
> >> journalctl even with the echo 11 > /proc/sys/fs/xfs/error_level
> >> set.
> > 
> > Which means your kernel does not have this commit:
> > 
> > commit 847f9f6875fb02b576035e3dc31f5e647b7617a7
> > Author: Eric Sandeen <sandeen@xxxxxxxxxx>
> > Date:   Mon Oct 12 16:04:45 2015 +1100
> > 
> >    xfs: more info from kmem deadlocks and high-level error msgs
> > 
> >    In an effort to get more useful out of "possible memory
> >    allocation deadlock" messages, print the size of the
> >    requested allocation, and dump the stack if the xfs error
> >    level is tuned high.
> > 
> >    The stack dump is implemented in define_xfs_printk_level()
> >    for error levels >= LOGLEVEL_ERR, partly because it
> >    seems generically useful, and also because kmem.c has
> >    no knowledge of xfs error level tunables or other such bits,
> >    it's very kmem-specific.
> > 
> >    Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> >    Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
> >    Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> 
> Indeed we should plan an upgrade window.
> 
> > 
> >> Is there another location where I should look at ?
> > 
> > Nope, there's nothing in your kernel we can use to identify the
> > source of memory allocations. I'm pretty sure that RH have used
> > systemtap scripts to pull this information from these kernels for
> > RHEL customers - we've added additional debug help here to avoid
> > that need, but your kernel doesn't have that code....
> > 
> > Essentially, best guess is that it's file fragmentation causing
> > problems with extent list allocation. Finding out why that one
> > snapshot is fragmenting so much and mitigating it is probably the
> > only thing you can do right now (i.e. extent size hints). Long term
> > is to get gluster to do the mitigation for VM images automatically.
> > 
> 
> Looks like it’s better since I disabled the vm that was taking a lot of disk space:
> 
> qemu-img info disk0.snapshot.qcow2
> image: disk0.snapshot.qcow2
> file format: qcow2
> virtual size: 265G (284541583360 bytes)
> disk size: 798G
> cluster_size: 65536
> backing file: base.qcow2
> 
> Note the virtual size vs the disk size, looks pretty fragmented.
> 
> I will follow up with glusters guys.
> 
> One dumb question, can the extent size hint be done at the root
> level?

Yes. Just set it immediately after mkfs on the root directory inode
and everything in the filesystem will inherit that extent size hint
at create time.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html