On Wed, Jan 06, 2016 at 10:15:25AM -0500, Mark Seger wrote: > I've recently found the performance our development swift system is > degrading over time as the number of objects/files increases. This is a > relatively small system, each server has 3 400GB disks. The system I'm > currently looking at has about 70GB tied up in slabs alone, close to 55GB > in xfs inodes and ili, and about 2GB free. The kernel > is 3.14.57-1-amd64-hlinux. So you go 50M cached inodes in memory, and a relatively old kernel. > Here's the way the filesystems are mounted: > > /dev/sdb1 on /srv/node/disk0 type xfs > (rw,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,logbsize=256k,sunit=512,swidth=1536,noquota) > > I can do about 2000 1K file creates/sec when running 2 minute PUT tests at > 100 threads. If I repeat that tests for multiple hours, I see the number > of IOPS steadily decreasing to about 770 and the very next run it drops to > 260 and continues to fall from there. This happens at about 12M files. According to the numbers you've provided: lookups creates removes Fast: 1550 1350 300 Slow: 1000 900 250 This is pretty much what I'd expect on the XFS level when going from a small empty filesystem to one containing 12M 1k files. That does not correlate to your numbers above, so it's not at all clear that there is realy a problem here at the XFS level. > The directory structure is 2 tiered, with 1000 directories per tier so we > can have about 1M of them, though they don't currently all exist. That's insane. The xfs directory structure is much, much more space, time, IO and memory efficient that a directory hierachy like this. The only thing you need a directory hash hierarchy for is to provide sufficient concurrency for your operations, which you would probably get with a single level with one or two subdirs per filesystem AG. What you are doing is spreading the IO over thousands of different regions on the disks, and then randomly seeking between them on every operation. i.e. your workload is seekbound, and your directory structure is has the effect of /maximising/ seeks per operation... > I've written a collectl plugin that lets me watch many of the xfs stats in /me sighs and points at PCP: http://pcp.io > real-time and also have a test script that exercises the swift PUT code > directly and so eliminates all the inter-node communications. This script > also allows me to write to the existing swift directories as well as > redirect to an empty structure so mimics clean environment with no existing > subdirectories. Yet that doesn't behave like an empty filesystem, which is clearly shown by the fact the caches are full of inodes that are't being used by the test. It also points out that allocation of new inodes will follow the old logarithmic search speed degradation, because you're kernel is sufficiently old that it doesn't support the free inode btree feature... > I'm attaching some xfs stats during the run and hope they're readable. > These values are in operations/sec and each line is 1 second's worth of > data. The first set of numbers is on the clean directory and the second on > the existing 12M file one. At the bottom of these stats are also the xfs > slab allocations as reported by collectl. I can also watch these during a > test and can see the number of inode and ilo objects steadily grow at about > 1K/sec, which is curious since I'm only creating about 300. It grows at exactly the rate of the lookups beng done, which is what is expected. i.e. for each create being done, there are other lookups being done first. e.g. directories, other objects to determine where to create the new one, lookup has to be done before removes (which there are significant number of), etc. > > If there is anything else I can provide just let me know. > > I don't fully understand all the xfs stats but what does jump out at me is > the XFS read/write ops have increased by a factor of about 5 when the > system is slower. Which means your application is reading/writing 5x as much information from the filesystem when it is slow. That's not a filesystem problem - your applicaiton is having to traverse/modify 5x as much information for each object it is creating/modifying. There's a good chance that's a result of your massively wide object store directory heirarchy.... i.e. you need to start by understanding what your application is doing in terms of IO, configuration and algorithms and determine whether that is optimal before you start looking at whether the filesystem is actually the bottleneck. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs