On 6/16/17 1:57 PM, Brian Matheson wrote: > Thanks for the reply, Eric. > > I don't have data for each of the files handy, but this particular > filesystem was at 46% fragmentation before our first run and went down > to 35% after. It's currently at 24%. The fsr run reports that many > of the files are fully defragmented but some have as many as 40,000 > extents. Well, see http://xfs.org/index.php/XFS_FAQ#Q:_The_xfs_db_.22frag.22_command_says_I.27m_over_50.25._Is_that_bad.3F That number is pretty meaningless. What matters in this case is the fragmentation of individual files. > Preventing the fragmentation by setting the extent size would be > great, but I understand that operation only works if there are no > extents in the file at the time of the operation. Since we're > creating the files on a hypervisor that's nfs mounting the xfs fs, it > would be tricky to insert a step to set the extent size hits at file > creation time. Just set it on the parent directory, and new files will inherit it. This is all documented in the xfs_io manpage FWIW. > We'd prefer to avoid dropping the caches, and maybe instead tune > vm.vfs_cache_pressure or use some other mechanism to prevent these > problems. We're not in a position to experiment right now though, and > are looking for recommendations. > > Do you think fragmentation is the root of the problem, even at 24% > fragmentation for the fs? Again, the fs-wide number is largely pointless ;) Try setting the fs.xfs.error_level sysctl to 11, and it should dump out a stack next time you get the message; the stack trace will help us know for sure what type of allocation is happening. -Eric > On Fri, Jun 16, 2017 at 2:14 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote: >> >> On 6/16/17 12:10 PM, Brian Matheson wrote: >>> Hi all, >>> >>> I'm writing to get some information about a problem we're seeing on >>> our nfs servers. We're using XFS on an LVM volume backed by an LSI >>> RAID card (raid 6, with 24 SSDs). We're nfs exporting the volume to a >>> number of hypervisors. We're seeing messages like the following: >>> >>> Jun 16 09:22:30 ny2r3s1 kernel: [15259176.032579] XFS: nfsd(2301) >>> possible memory allocation deadlock size 68256 in kmem_alloc >>> (mode:0x2400240) >>> >>> These messages are followed by nfsd failures as indicated by log messages like: >>> >>> Jun 16 09:22:39 ny2r3s1 kernel: [15259184.933311] nfsd: peername >>> failed (err 107)! >>> >>> Dropping the caches on the box fixes the problem immediately. Based >>> on a little research, we thought that the problem could be occurring >>> due to file fragmentation, so we're running xfs_fsr periodically to >>> defragment. At the moment we're also periodically dropping the cache >>> in an attempt to prevent the problem from occurring. >> >> A better approach might be to set extent size hints on the fragmented >> files in question, to avoid the fragmentation in the first place. >> >> drop caches is a pretty big hammer, and xfs_fsr can have other side >> effects w.r.t. filesystem aging and freespace fragmentation. >> >> How badly fragmented were the files in question? >> >> -Eric >> >>> Any help appreciated, and if this query belongs on a different mailing >>> list, please let me know. >>> >>> >>> The systems are running ubuntu 14.04 with a 4.4.0 kernel (Linux >>> ny2r3s1 4.4.0-53-generic #74~14.04.1-Ubuntu SMP Fri Dec 2 03:43:31 UTC >>> 2016 x86_64 x86_64 x86_64 GNU/Linux). xfs_repair is version 3.1.9. >>> They have 64G of RAM, most of which is used by cache, and 12 cpu >>> cores. As mentioned we're using ssds connected to an lsi raid card. >>> xfs_info reports: >>> >>> meta-data=/dev/mapper/VMSTORAGE_SSD-XFS_VHD isize=256 agcount=62, >>> agsize=167772096 blks >>> = sectsz=512 attr=2 >>> data = bsize=4096 blocks=10311515136, imaxpct=5 >>> = sunit=64 swidth=256 blks >>> naming =version 2 bsize=4096 ascii-ci=0 >>> log =internal bsize=4096 blocks=521728, version=2 >>> = sectsz=512 sunit=64 blks, lazy-count=1 >>> realtime =none extsz=4096 blocks=0, rtextents=0 >>> >>> At the moment, slabtop reports this: >>> Active / Total Objects (% used) : 5543699 / 5668921 (97.8%) >>> Active / Total Slabs (% used) : 157822 / 157822 (100.0%) >>> Active / Total Caches (% used) : 77 / 144 (53.5%) >>> Active / Total Size (% used) : 1110436.20K / 1259304.73K (88.2%) >>> Minimum / Average / Maximum Object : 0.01K / 0.22K / 18.50K >>> >>> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME >>> 4382508 4382508 100% 0.10K 112372 39 449488K buffer_head >>> 348152 348152 100% 0.57K 12434 28 198944K radix_tree_node >>> 116880 83796 71% 4.00K 14610 8 467520K kmalloc-4096 >>> 114492 88855 77% 0.09K 2726 42 10904K kmalloc-96 >>> 108640 86238 79% 0.12K 3395 32 13580K kmalloc-128 >>> 51680 51680 100% 0.12K 1520 34 6080K kernfs_node_cache >>> 49536 29011 58% 0.06K 774 64 3096K kmalloc-64 >>> 46464 46214 99% 0.03K 363 128 1452K kmalloc-32 >>> 44394 34860 78% 0.19K 1057 42 8456K dentry >>> 40188 38679 96% 0.04K 394 102 1576K ext4_extent_status >>> 33150 31649 95% 0.05K 390 85 1560K ftrace_event_field >>> 26207 25842 98% 0.05K 359 73 1436K Acpi-Parse >>> 23142 20528 88% 0.38K 551 42 8816K mnt_cache >>> 21756 21515 98% 0.19K 518 42 4144K kmalloc-192 >>> 20160 20160 100% 0.07K 360 56 1440K Acpi-Operand >>> 19800 19800 100% 0.18K 450 44 3600K xfs_log_ticket >>> >>> Thanks much, >>> Brian Matheson >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html