Re: Question about optimal filesystem with many small files.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



oooooooooooo ooooooooooooo wrote:
>> You can hash it and still keep the original filename, and you don't
>> even need a MySQL database to do lookups.
> 
> There are an issue I forgot to mention: the original file name can be up top 1023 characters long. As linux only allows 256 characters in the file path, I could have a (very small) number of collisions, that's why my original idea was using a hash->filename table. So I'm not sure if I could implement that idea in my scenario.
> 
>> For instance: example.txt ->
>> e7/6f/example.txt. That might (or might not) give you a better
>> performance.
> 
> After a quick calculation, that could put around 3200 files per directory (I have around 15 million of files), I think that above 1000 files the performance will start to degrade significantly, anyway it would be a mater of doing some benchmarks.

There's C code to do this in squid, and backuppc does it in perl (for a 
pool directory where all identical files are hardlinked).  Source for 
both is available and might be worth a look at their choices for the 
depth of the trees and collision handling (backuppc actually hashes the 
file content, not the name, though).

-- 
   Les Mikesell
    lesmikesell@xxxxxxxxx

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux