Re: The maximum number of files under a folder

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



What does what does the h stand for in h-tree? Like the b in btree is binary Tree



Stephen Samuel wrote:
The OS will have to search the directory to see if the file already exists before creating it.

Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
you'll find that your access times will be must faster (especially if you don't use H-Trees). This also applies if you're just creating a file, because you'll have to search the entire directory to see if that filename exists

With regular directories, searching through them to see if a file already exist increases linearly with the number of entries. If you hash on 3 levels with 8-bits per level, you'll have to open 2 or 3 extra inodes, but you'll cut your directory search times down by a factor of 20000-1. You'll also skip having to deal with any sort of directory-size limit. (=2^24/256/3)

I did something similar on a Solaris box which had 200000 emails in the /var/spool/mqueue directory. That many messages was slowing the system to a crawl. I hashed it into 100 directories with 2000 entries each, it sped things up *enormously.*

On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger@xxxxxxx <mailto:adilger@xxxxxxx>> wrote:

    On Mar 17, 2008  09:32 -0400, Theodore Ts'o wrote:
    > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
    > > Theodore Tso,
    > >
    > >     In 64bit system, directory size can not be bigger than 2GB?
    >
    > No, because the high 32-bits for i_size are overloaded to store the
    > directory creation acl.

    I think we should change the code (kernel and e2fsprogs) to allow
    i_size_high for directories also.

    > In practice, you really don't want to have a directory that huge
    > anyway.  Iterating through it all with readdir() gets horribly slow,
    > and applications that try do anything with really huge directories
    > would be well advised to use a database, because they will get
    *much*
    > better performance that way....

    Actually, for many HPC applications they never do readdir at all.
    The job creates 1 file/process and always uses a predefined filename
    like {job}-{timestamp}-{process} that it will directly look up.

    Cheers, Andreas




--
Stephen Samuel http://www.bcgreen.com
778-861-7641

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users

[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux