Re: Large directories and poor order correlation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/15/2011 07:06 AM, Theodore Tso wrote:
On Mar 15, 2011, at 3:59 AM, Florian Weimer wrote:

* Eric Sandeen:

No, because htree (dir_index) dirs returns names in hash-value
order, not inode number order.  i.e. "at random."

As you say, sorting by inode number will work much better...
The dpkg folks tested this and it turns out that you get better
results if you open the file and use FIBMAP to get the first block
number, and sort by that.  You could sort by inode number before the
open/fstat calls, but it does not seem to help much.
It depends on which problem you are trying to solve.  If this is a cold
cache situation, and the inode cache is empty, then sorting by inode
number will help since otherwise you'll be seeking all over just to
read in the inode structures.   This is true for any kind of readdir+stat
combination, whether it's ls -l, or du or readdir + FIBMAP (I'd
recommend using FIEMAP these days, though).

However, if you need to suck in the information for a large number of
small files (such as all of the files in /var/lib/dpkg/info), then sure, sorting
ont he block number can help reduce seeks on the data blocks side of
things.

So in an absolute cold cache situations, what I'd recommend is readdir,
sort by inode, FIEMAP,  sort by block, and then read in the dpkg files.
Of course an RPM partisan might say, "it would help if you guys had
used a real database instead of ab(using) the file system.  And then
the dpkg guys could complain about what happens when RPM has to
deal with corrupted rpm database, and how this allows dpkg to use
shell scripts to access their package information.  Life is full of tradeoffs.

-- Ted


I have tested both sorting techniques with very large directories.

Most of the gain came with the simple sorting by inode number, but of course this relies on the file system allocation policy having a correlation between the inode numbers and layout (i.e., higher inode number correspond to higher block numbers).

Note that you can get the inode number used in this sorting without doing any stat calls.

Sorting by first block number also works well, but does have that extra syscall (probably two - open & fibmap?) per file.

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux