On May 17, 2021, at 9:06 AM, David Howells <dhowells@xxxxxxxxxx> wrote: > With filesystems like ext4, xfs and btrfs, what are the limits on directory > capacity, and how well are they indexed? > > The reason I ask is that inside of cachefiles, I insert fanout directories > inside index directories to divide up the space for ext2 to cope with the > limits on directory sizes and that it did linear searches (IIRC). > > For some applications, I need to be able to cache over 1M entries (render > farm) and even a kernel tree has over 100k. > > What I'd like to do is remove the fanout directories, so that for each logical > "volume"[*] I have a single directory with all the files in it. But that > means sticking massive amounts of entries into a single directory and hoping > it (a) isn't too slow and (b) doesn't hit the capacity limit. Ext4 can comfortably handle ~12M entries in a single directory, if the filenames are not too long (e.g. 32 bytes or so). With the "large_dir" feature (since 4.13, but not enabled by default) a single directory can hold around 4B entries, basically all the inodes of a filesystem. There are performance knees as the index grows to a new level (~50k, 10M, depending on filename length) As described elsewhere in the thread, allowing concurrent create and unlink in a directory (rename probably not needed) would be invaluable for scaling multi-threaded workloads. Neil Brown posted a prototype patch to add this to the VFS for NFS: https://lore.kernel.org/lustre-devel/8736rsbdx1.fsf@xxxxxxxxxxxxxxxxxxxxxxxx/ Maybe it's time to restart that discussion? Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP