On Tue, Feb 14, 2012 at 01:32:00PM +0100, Richard Ems wrote: > On 02/14/2012 01:09 AM, Dave Chinner wrote: > >> I am asking because I am seeing very long times while removing big > >> directory trees. I thought on kernels above 3.0 removing dirs and files > >> had improved a lot, but I don't see that improvement. > > > > You won't if the directory traversal is seek bound and that is the > > limiting factor for performance. > > *Seek bound*? *When* is the directory traversal *seek bound*? Whenever you are traversing a directory structure that is not alrady hot in the cache. IOWS, almost always. > >> This is a backup system running dirvish, so most files in the dirs I am > >> removing are hard links. Almost all of the files do have ACLs set. > > > > The unlink will have an extra IO to read per inode - the out-of-line > > attribute block, so you've just added 11 million IOs to the 800,000 > > the traversal already takes to the unlink overhead. So it's going to > > take roughly ten hours because the unlink is gong to be read IO seek > > bound.... > > It took 110 minutes and not 10 hours. All files and dirs there had ACLs set. I was basing that on you "find dir" time of 100 minutes, which was the only number you gave, and making the assumption it didn't read the attribute blocks and that it was seeing worse case seek times (i.e. avg seek times) for every IO. Given the way locality works in XFS, I'd suggest that the typical seek time will be much less (a few blocks, not half the disk platter) and not necessarily on the same disk (due to RAID) so the average seek time for your workload is likely to be much lower. If it's at 1ms (closer to track-to-track seek times) instead of the 5ms, then that 10hrs becomes 2hrs for that many IOs.... > > Also, for large directories like this (millions of entries) you > > should also consider using a larger directory block size (mkfs -n > > size=xxxx option) as that can be scaled independently to the > > filesystem block size. This will significantly decrease the amount > > of IO and fragmentation large directories cause. Peak modification > > performance of small directories will be reduced because larger > > block size directories consume more CPU to process, but for large > > directories performance will be significantly better as they will > > spend much less time waiting for IO. > > This was not ONE directory with that many files, but a directory > containing 834591 subdirectories (deeply nested, not all in the same > dir!) and 10539154 files. So you've got a directory *tree* that indexes 11 million inodes, not "one directory with 11 million files and dirs in it" as you originally described. Both Christoph and I have interpreted your original description as "one large directory", but there's no need to shout at us because it's difficult to understand any given configuration from just a few lines of text. IOWs, details like "one directory" vs "one directory tree" might seem insignificant to you, but they mean an awful lot us developers and can easily lead us down the wrong path. FWIW, directory tree traversal is even more read IO latency sensitive than a single large directory traversal because we can't do readahead across directory boundaries to hide seek latencies as much as possible and the locality on individual directories can be very different depending on the allocaiton policy the filesystem is using. As it is, large directory blocks can also reduce the amount of IO needed in this sort of situation and speed up traversals.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs