Re: Re: Poor Performance WhenNumber of Files > 1M

John Kalucki <ext3@xxxxxxxxxxx> · Wed, 11 Jun 2008 15:04:17 -0700

Eric Sandeen wrote:
John Kalucki wrote:

Performance seems to always map directly to the number of files in the 
ext3 filesystem.

After some initial run-fast time, perhaps once dirty pages begin to be 
written aggressively, for every 5,000 files added, my files created per 
second tends to drop by about one. So, depending on the variables, say 
with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop, 
then more slowly drop to ~300 files/sec at perhaps 1 million files, then 
see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.

As you'd expect, there isn't much CPU utilization, other than iowait, 
and some kjournald activity.

Is this a known limitation of ext3? Is expecting to write to 
O(10^6)-O(10^7) files in something approaching constant time expecting 
too much from a filesystem? What, exactly, am I stressing to cause this 
unbounded performance degradation?

I think this is a linear search through the block groups for the new
inode allocation, which always starts at the parent directory's block
group; and starts over from there each time.  See find_group_other().

So if the parent's group is full and so are the next 1000 block groups,
it will search 1000 groups and find space in the 1001st.  On the next
inode allocation it will re-search(!) those 1000 groups, and again find
space in the 1001st.  And so on.  Until the 1001st is full, and then
it'll search 1001 groups and find space in the 1002nd... etc (If I'm
remembering/reading correctly, but this does jive with what you see.).

I've toyed  with keeping track (in the parent's inode) where the last
successful child allocation happened, and start the search there.  I'm a
bit leery of how this might age, though... plus I'm not sure if it
should be on-disk or just in memory.... But this behavior clearly needs
some help.  I should probably just get it sent out for comment.

-Eric

This is the best explanation I've read so far. There does indeed appear 
to be some O(n) behavior that is exacerbated by having many directories 
in the working set (not open, just referenced often) and perhaps 
moderate fragmentation. I read up on ext3 inode allocation, and the 
attempt to place files in the same cylinder group as directories. Trying 
to work with this system, I started on a fresh filesystem and flattened 
the directory depth to just 4 levels, I've managed to boost performance 
greatly, and flatten the degradation curve quite a bit.

I can get to about 2,800,000 files before performance starts to slowly 
drop from a nearly constant ~1,700 file/sec. At ~4,000,000 files, I see 
about ~1,500 files/sec, and afterwards I start to see the old behavior 
of greater decline. By 5,500,000 files, it's down to 1,230 files/sec. 
I've used 9% of the space and 8% of the inodes at this point.

Changing journal size and /proc/sys/fs/file-max had no effect. Even 
dir_index had only marginal impact, as my directories have only about 
300 files each.

I think the biggest factor to making performance nearly linear is the 
number of directories in the working set. If this grows too large, the 
linear allocation behavior is magnified, and performance drops. My 
version of RHEL doesn't seem to allow tweaking of directory cache 
behavior, perhaps a deprecated feature from the 2.4 days.

If I discover anything else, I'll be sure to update this thread.
-John

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users