On Thu, Sep 26, 2013 at 11:26:47AM -0400, Jay Ashworth wrote: > ----- Original Message ----- > > From: "Joe Landman" <joe.landman@xxxxxxxxx> > > > > takes. The folders are image folders that have anywhere between 5 to > > > 10 million images in each folder. > > > > The combination of very large folders, and virtualization is working > > against you. Couple that with an old (ancient by Linux standards) xfs > > in the virtual CentOS 5.9 system, and you aren't going to have much > > joy with this without changing a few things. > > > Can you change from one single large folder to a heirarchical set of > > folders? The single large folder means any metadata operation (ls, > > stat, open, close) has a huge set of lists to traverse. It will work, > > albiet slowly. As a rule of thumb, we try to make sure our users don't > > go much beyond 10k files/folder. If they need to, building a heirarchy > > of folders slightly increases management complexity, but keeps the > > lists that are needed to be traversed much smaller. > > > > A strategy for doing this: If your files are named "aaaa0001" > > "aaaa0002" ... "zzzz9999" or similar, then you can chop off the first > > letter, and make a directory of it, and then put all files starting > > with that letter in that directory. Then within each of those directories, > > do the same thing with the second letter. This gets you 676 > > directories and about 15k files per directory. Much faster directory operations. > > Much smaller lists to traverse. > > While this problem isn't *near* as bad on XFS as it was on older filesystems, > where over maybe 500-1000 files would result in 'ls' commands taking > over a minute... Assuming a worst case, 500-1000 files requires 700-1200 IOs for LS to complete. If that's taking over a minute, then you're getting less than 10-20 IOPS for the workload which is about 10% of the capability of a typical SATA drive. This sounds to me like there was lots of other stuff competing for IO bandwidth at the same time or something else wrong to result in such poor performance for ls. > It's still a good idea to filename hash large collections of files of > similar types into a directory tree, as Joe recommends. The best approach > I myself have seen to this is to has a filename of > > 835bfak3f89yu12.jpg > > into > > 8/3/5/b/835bfak3f89yu12.jpg > 8/3/5/b/f/835bfak3f89yu12.jpg > 8/3/5/b/f/a/835bfak3f89yu12.jpg No, not on XFS. here you have a fanout per level of 16. i.e. consider a tree with a fanout of 16. To move from level to level, it takes 2 IOs. Lets consider the internal hash btree in XFS. For a 4k directory block, it fits 500 entries - call it 512 to make the math easy. i.e. it is a tree with a fanout per level of 512 To move from level to level, it takes 1 IO. > 8/3/5/b/f/a/835bfak3f89yu12.jpg Here we have 6 levels of hash, that's 16^6 = 16.7M fanout. With a fanout of 512, the internal XFS hash btree needs only 3 levels (64 * 512 * 512) to index the same number directory entries. So, do a lookup on the hash, it takes 12 IOs to get to the leaf directory, then as many IOs are required to look up the entry in the leaf directory. For a single large XFS directory, it takes 3 IOs to find the dirent, and another 1 to read the dirent and return it to userspace i.e. 4 IOs total vs 12 + N IOs for the equivalent 16-way hash of the same depth... What I am trying to point out is that on XFS deep hashing will not improve performance like it might on ext4 - on XFS you should look to use wide, shallow directory hashing with relatively large numbers of entries in each leaf directory because the internal directory structure is much more efficient that from an IO perspective than hashing is... And then, of course, if directory IO is still the limiting factor with large numbers of leaf entries (e.g. you're indexing billions of files), you have the option of using larger directory blocks and making the internal directory fanout up to 16x wider than in this example... > Going as deep as necessary to reduce the size of the directories. What > you lose in needing to cache the extra directory levels outweighs (probably > far outweighs) having to handle Directories Of Unusual Size. On XFS, a directory with a million entries is not an unusual size - with a 4k directory block size the algorithms are still pretty CPU efficient at this point, though it's going to be at roughly half that of an empty directory. It's once you get above several million entries that the modification cost starts to dominate performance considerations and at that point a wider hash, not a deeper hash should be considered.. > Note that I didn't actually trim the filename proper; the final file still has > its full name. This hash is easy to build, as long as you fix the number of layers > in advance... and if you need to make it deeper, later, it's easy to build a > shell script that crawls the current tree and adds the next layer. Avoiding the need for rebalancing a directory hash is one of the reasons for designing it around a scalable directory structure in the first place. It pretty much means the only consideration for the width of the hash and the underlying filesystem layout is the concurrency your application requires. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs