Re: ext2/ext3 directory handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sunday 18 May 2003 18:01, Michael Harris wrote:
> Just to add, I can attest that moving the files from the old dir to the new
> as described improves performance on my machines dramatically. In our
> service we end up with directories of 150k+ files which are generally
> touched only as they are added, though every file will be touched several
> times over a month. The files are each around 50kB. When the directory
> entry gets to be about 4MB it begins to take a long time for remote
> machines to copy files into the directory, maybe 4 seconds for a 50kB file
> on a switched 100 base network. The performance hit is worst with remote
> machines using SMB. Compressing the directory entry with mkdir new
>  	cp old/* new/
>  	rm -rf old
>  	mv old new
> definitely improves things, but generally when there gets to be more than
> 200k files we have to roll over to a new directory to keep things moving. I
> suspect the remote machines are effectively downloading the directory entry
> with each copy to the server, but I also see the smbd tasks pegging on the
> server as well, but never really investigated it. We see this with ext2 and
> ext3. Not really looking for a solution here but just offering the info,
> but if anyone has a quick fix please share it. I may try resiserfs someday
> but for now we just use thousands of directories for the files.
> Mike

which way do you normaly use to push the files when you don't use smb?

>
> > "Alan R.Becker" <beckera@mail-now.com> wrote:
> > > (1) Is the assumption that directories don't compress when deleting
> > > files correct?	How is this handled (in general terms)?
> >
> > That is correct.  A deleted file leaves a "hole" in the directory
> > which a new addition can fill (if it fits).
> >
> > > (2) Is there any difference between ext2 and ext3?
> >
> > No.
> >
> > > (3) Does the htree code change the picture any (even
> > > though I don't use it, and won't until it is production) ?
> >
> > No, htree will not release directory blocks.
> >
> > > (4) Is it possible that the directories themselves
> > > were fragmented?
> >
> > Yes, very probable.
> >
> > However to understand why things slowed down a bit more info is needed.
> >
> > It is probable that the many little files in one typical directory are
> > splattered all over the disk.  Does your workload regularly touch all the
> > file in these directories?  If so then it maybe suffering from this lack
> > of inter-file locality.
> >
> > If not then yes, perhaps the problem is due to large, fragmented
> > directories.
> >
> > How many bytes does a typical directory consume?  If you have the disk
> > space, and are confident that (say) 64k is "enough" then perhaps you
> > could grow each user's mail directory to (say) 64k when that user is
> > created. This way they will have a nice unfragmented directory for all
> > time.
> >
> > > (5) After doing a "mkdir" to create a new directory, how many
> > > file entries can it hold before it would be expanded to accept
> > > another file?
> >
> > 4 kilobytes.  Each directory entry consumes eight bytes, plus the length
> > of the name rounded up to a multiple of 4 bytes.
> >
> > > When a directory is expanded, how many additional
> > > file entries can be stored before needing another expansion?
> >
> > Another 4 kilobytes.
> >
> > > (6) Say I have a directory containing some files, then I delete
> > > some files, and finally I start adding files.  Will new file
> > > entries use empty or vacated directory slots before expanding
> > > the directory?
> >
> > Deletion causes holes.  Holes are coalesced within a 4k block.  Holes are
> > allocated from on a first-fit basis.
> >
> > > (7) I am aware of e2defrag (latest version I have found is 0.73).
> > > Does this program (or any other any tool) perform any
> > > directory optimization that would affect this problem?
> >
> > It's obsolete.
> >
> > For your purposes, all you'd need to do to defrag a directory is
> >
> > 	mkdir new
> > 	ln old/* new/
> > 	rm -rf old
> > 	mv old new
> >
> > If you use `cp' instead of `ln' then you'll defrag the files themselves,
> > and lay them out close to each other.  Which is only important if you app
> > regularly touches lots of files in a single directory.  It probably does
> > not..
> >
> > > (8) If e2defrag would be helpful, has it/is it being brought
> > > forward to operate correctly with current (RH 8/9) systems?
> > > I see some warnings about blocksise restrictions, etc.
> >
> > I haven't heard of anyone using it in ages.
> >
> > > (9) In designing new systems, are there some useful guidelines
> > > about the maximum number of files that can exist in a single
> > > directory without significant performance loss?
> > > I am interested in ext2, ext3, and htree.
> >
> > Non-htree gets awkward at a few thousand.  htree appears to be OK up to
> > hundreds of thousands.  Its practical scalability is unknown, really.
> >
> >
> > _______________________________________________
> > 
> > Ext3-users@redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
>
> _______________________________________________
> 
> Ext3-users@redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

-- 
e-admin internet gmbh
Andreas Gietl                                            tel +49 941 3810884
Ludwig-Thoma-Strasse 35                      fax +49 89 244329104
93051 Regensburg                                  mobil +49 171 6070008

PGP/GPG-Key unter http://www.e-admin.de/gpg.html





_______________________________________________

Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux