On Mon, May 17, 2021 at 04:06:58PM +0100, David Howells wrote: > Hi, > > With filesystems like ext4, xfs and btrfs, what are the limits on directory > capacity, and how well are they indexed? > > The reason I ask is that inside of cachefiles, I insert fanout directories > inside index directories to divide up the space for ext2 to cope with the > limits on directory sizes and that it did linear searches (IIRC). Don't do that for XFS. XFS directories have internal hashed btree indexes that are far more space efficient than using fanout in userspace. i.e. The XFS hash index uses 8 bytes per dirent, and so in a 4kB directory block size structure can index about 500 entries per block. And being O(log N) for lookup, insert and remove, the fan-out within the directory hash per IO operation is an aorder of magnitude higher than using directories in userspace.... The capacity limit for XFS is 32GB of dirent data, which generally equates to somewhere around 300-500 million dirents depending on filename size. The hash index is separate from this limit (has it's own 32GB address segment, as does the internal freespace map for the directory.... The other directory design characterisitic of XFs directories is that readdir is always a sequential read through the dirent data with built in readahead. It does not need to look up the hash index to determine where to read the next dirents from - that's a straight "file offset to physical location" lookup in the extent btree, which is always cached in memory. So that's generally not a limiting factor, either. > For some applications, I need to be able to cache over 1M entries (render > farm) and even a kernel tree has over 100k. Not a problem for XFS with a single directory, but could definitely be a problem for others especially as the directory grows and shrinks. Last I measured, ext4 directory perf drops off at about 80-90k entries using 40 byte file names, but you can get an idea of XFS directory scalability with large entry counts in commit 756c6f0f7efe ("xfs: reverse search directory freespace indexes"). I'll reproduce the table using a 4kB directory block size here: File count create time(sec) / rate (files/s) 10k 0.41 / 24.3k 20k 0.75 / 26.7k 100k 3.27 / 30.6k 200k 6.71 / 29.8k 1M 37.67 / 26.5k 2M 79.55 / 25.2k 10M 552.89 / 18.1k So that's single threaded file create, which shows the rough limits of insert into the large directory. There really isn't a major drop-off in performance until there are several million entries in the directory. Remove is roughly the same speed for the same dirent count. > What I'd like to do is remove the fanout directories, so that for each logical > "volume"[*] I have a single directory with all the files in it. But that > means sticking massive amounts of entries into a single directory and hoping > it (a) isn't too slow and (b) doesn't hit the capacity limit. Note that if you use a single directory, you are effectively single threading modifications to your file index. You still need to use fanout directories if you want concurrency during modification for the cachefiles index, but that's a different design criteria compared to directory capacity and modification/lookup scalability. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx