Just to add, I can attest that moving the files from the old dir to the new as described improves performance on my machines dramatically. In our service we end up with directories of 150k+ files which are generally touched only as they are added, though every file will be touched several times over a month. The files are each around 50kB. When the directory entry gets to be about 4MB it begins to take a long time for remote machines to copy files into the directory, maybe 4 seconds for a 50kB file on a switched 100 base network. The performance hit is worst with remote machines using SMB. Compressing the directory entry with mkdir new cp old/* new/ rm -rf old mv old new definitely improves things, but generally when there gets to be more than 200k files we have to roll over to a new directory to keep things moving. I suspect the remote machines are effectively downloading the directory entry with each copy to the server, but I also see the smbd tasks pegging on the server as well, but never really investigated it. We see this with ext2 and ext3. Not really looking for a solution here but just offering the info, but if anyone has a quick fix please share it. I may try resiserfs someday but for now we just use thousands of directories for the files. Mike > "Alan R.Becker" <beckera@mail-now.com> wrote: > > > > (1) Is the assumption that directories don't compress when deleting > > files correct? How is this handled (in general terms)? > > That is correct. A deleted file leaves a "hole" in the directory > which a new addition can fill (if it fits). > > > (2) Is there any difference between ext2 and ext3? > > No. > > > (3) Does the htree code change the picture any (even > > though I don't use it, and won't until it is production) ? > > No, htree will not release directory blocks. > > > (4) Is it possible that the directories themselves > > were fragmented? > > Yes, very probable. > > However to understand why things slowed down a bit more info is needed. > > It is probable that the many little files in one typical directory are > splattered all over the disk. Does your workload regularly touch all the > file in these directories? If so then it maybe suffering from this lack of > inter-file locality. > > If not then yes, perhaps the problem is due to large, fragmented > directories. > > How many bytes does a typical directory consume? If you have the disk > space, and are confident that (say) 64k is "enough" then perhaps you could > grow each user's mail directory to (say) 64k when that user is created. > This way they will have a nice unfragmented directory for all time. > > > > (5) After doing a "mkdir" to create a new directory, how many > > file entries can it hold before it would be expanded to accept > > another file? > > 4 kilobytes. Each directory entry consumes eight bytes, plus the length of > the name rounded up to a multiple of 4 bytes. > > > When a directory is expanded, how many additional > > file entries can be stored before needing another expansion? > > Another 4 kilobytes. > > > (6) Say I have a directory containing some files, then I delete > > some files, and finally I start adding files. Will new file > > entries use empty or vacated directory slots before expanding > > the directory? > > Deletion causes holes. Holes are coalesced within a 4k block. Holes are > allocated from on a first-fit basis. > > > (7) I am aware of e2defrag (latest version I have found is 0.73). > > Does this program (or any other any tool) perform any > > directory optimization that would affect this problem? > > It's obsolete. > > For your purposes, all you'd need to do to defrag a directory is > > mkdir new > ln old/* new/ > rm -rf old > mv old new > > If you use `cp' instead of `ln' then you'll defrag the files themselves, > and lay them out close to each other. Which is only important if you app > regularly touches lots of files in a single directory. It probably does > not.. > > > (8) If e2defrag would be helpful, has it/is it being brought > > forward to operate correctly with current (RH 8/9) systems? > > I see some warnings about blocksise restrictions, etc. > > I haven't heard of anyone using it in ages. > > > (9) In designing new systems, are there some useful guidelines > > about the maximum number of files that can exist in a single > > directory without significant performance loss? > > I am interested in ext2, ext3, and htree. > > Non-htree gets awkward at a few thousand. htree appears to be OK up to > hundreds of thousands. Its practical scalability is unknown, really. > > > _______________________________________________ > > Ext3-users@redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users _______________________________________________ Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users