On Mon, Nov 04, 2019 at 03:38:12PM -0800, Chris Holcombe wrote: > After upgrading from scientific linux 6 -> centos 7 i'm starting to > see a sharp uptick in dmesg lines about xfs having a possible memory > allocation deadlock. All the searching I did through previous mailing > list archives and blog posts show all pointing to large files having > too many extents. > I don't think that is the case with these servers so I'm reaching out > in the hopes of getting an answer to what is going on. The largest > file sizes I can find on the servers are roughly 15GB with maybe 9 > extents total. The vast majority small with only a few extents. > I've setup a cron job to drop the cache every 5 minutes which is > helping but not eliminating the problem. These servers are dedicated > to storing data that is written through nginx webdav. AFAIK nginx > webdav put does not use sparse files. > > Some info about the servers this issue is occurring on: > > nginx is writing to 82TB filesystems: > xfs_info /dev/sdb1 > meta-data=/dev/sdb1 isize=512 agcount=82, agsize=268435424 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=0 spinodes=0 > data = bsize=4096 blocks=21973302784, imaxpct=1 > = sunit=16 swidth=144 blks > naming =version 2 bsize=65536 ascii-ci=0 ftype=1 > log =internal bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > xfs_db -r /dev/sdb1 > xfs_db> frag > actual 6565, ideal 5996, fragmentation factor 8.67% > Note, this number is largely meaningless. > Files on this filesystem average 1.09 extents per file > > I see dmesg lines with various size numbers in the line: > [6262080.803537] XFS: nginx(2514) possible memory allocation deadlock > size 50184 in kmem_alloc (mode:0x250) Full kernel logs, please. There's not enough info here to tell what's trying to grab a 50K memory buffer. --D > Typical extents for the largest files on the filesystem are: > > find /mnt/jbod/ -type f -size +15G -printf '%s %p\n' -exec xfs_bmap > -vp {} \; | tee extents > 17093242444 /mnt/jbod/boxfiler3038-sdb1/data/220190411/ephemeral/2019-08-12/18/0f6bee4d6ee0136af3b58eef611e2586.enc > /mnt/jbod/boxfiler3038-sdb1/data/220190411/ephemeral/2019-08-12/18/0f6bee4d6ee0136af3b58eef611e2586.enc: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET > TOTAL FLAGS > 0: [0..1919]: 51660187008..51660188927 24 > (120585600..120587519) 1920 00010 > 1: [1920..8063]: 51660189056..51660195199 24 > (120587648..120593791) 6144 00011 > 2: [8064..4194175]: 51660210816..51664396927 24 > (120609408..124795519) 4186112 00001 > 3: [4194176..11552759]: 51664560768..51671919351 24 > (124959360..132317943) 7358584 00101 > 4: [11552760..33385239]: 51678355840..51700188319 24 > (138754432..160586911) 21832480 00111 > > > Memory size: > free -m > total used free shared buff/cache available > Mem: 64150 6338 421 2 57390 57123 > Swap: 2047 6 2041 > > cat /etc/redhat-release > CentOS Linux release 7.6.1810 (Core) > > cat /proc/buddyinfo > Node 0, zone DMA 0 0 1 0 1 0 0 > 0 0 1 3 > Node 0, zone DMA32 31577 88 2 0 0 0 0 > 0 0 0 0 > Node 0, zone Normal 33331 3323 582 87 0 0 0 > 0 0 0 0 > Node 1, zone Normal 51121 6343 822 77 1 0 0 > 0 0 0 0 > > tuned-adm shows 'balanced' as the current tuning profile. > > Thanks for your help!