Re: Fwd: problems with large directories?

Ric Wheeler <rwheeler@xxxxxxxxxx> · Tue, 09 Mar 2010 20:51:20 -0500

On 03/09/2010 09:36 AM, Charles Riley wrote:
Sorry, I meant to send this to the list, not just Ric.

----- Forwarded Message -----
From: "Charles Riley"<criley@xxxxxxxx>
To: "Ric Wheeler"<rwheeler@xxxxxxxxxx>
Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
Subject: Re: problems with large directories?

----- "Ric Wheeler"<rwheeler@xxxxxxxxxx>  wrote:

On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
Hi,

I have an application that deals with 100,000 to 1,000,000 image
files.

I initially structured it to use multiple directories, so that file
123456 would be stored in /12/34/123456.  I'm now wondering if
that's
pointless, as it would simplify things to simply store the file in
/123456.

Can anyone indicate whether I'm gaining anything by using smaller
directories in ext3/ext4?  Thanks.

Mitch

I think that breaking up your files into subdirectories makes it
easier to
navigate the tree and find files from a human point of view. Even
better if the
bytes reflect something like year/month/day/hour/min (assuming your
pathname has
a date based guid or similar encoding).

You can have a million files in one large directory, but be careful to
iterate
and copy them in a sorted order (sorted by inode) to avoid nasty
performance
issues that are side effects of the way we hash file names in ext3/4.

Good luck!

Ric

Hi Ric,

Can you elaborate on the performance issues you mention above?

We use rhel4/ext3 on our pacs (medical imaging) servers.
We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm.  Now we store images for a given patient's study in a path something like:
aa/ab/ac/1.2.3/

where 1.2.3 is the dicom study instance uid (a wwuid for a medical study)
and aa/ab/ac/ is the directory hash we derived from that study instance uid.

The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/.
Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files.
Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/

In this context, would we gain filesystem performance by sorting by inode before copying?
Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance?

Thanks for any insight you can provide,

Charles

Hi Charles,

The big issue with touching a lot of files (reading, stating, unlinking them) in 
ext3/4 is that readdir gives us back a list in effectively random order. This 
makes the accesses very seeky.

Not an issue with a handful of files (say a couple of hundred), but when you get 
to thousands (or millions) of files, performance really tanks.

To avoid that, you can sort the list returned by readdir() into ascending order 
by inode in reasonably large batches and get your performance up.

Several core tools have been looking at doing this automatically, but it is 
important for any home grown applications as well.

In your scenario with the directory hierarchy, I suspect that you won't hit 
this. If you had one very large directory, you certainly would.

Best regards,

Ric

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users