Re: Storing inodes in a separate block device?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 22, 2008  09:53 -0500, Nathan Roberts wrote:
> Has a feature ever been considered (or already exist) for storing inodes in 
> a block device separate from the data? Is it even a "reasonable" thing to 
> do or are there major pitfalls that one would run into?

There was a filesystem called "dualfs" that implemented this - I believe
it was a hacked version of ext3.  It showed quite decent results.
Similarly, Lustre splits filesystem metadata (path, permissions,
attributes) onto different filesystems from the data.

For ext4 there is work being done on the FLEX_BG feature, which will allow
clustering of the metadata into a larger groups inside the filesystem, in
order to reduce seeking when doing filesystem scans.  This could be taken
to extremes to group all of the metadata into a single area, and use LVM
to place that on a separate disk.

Putting the journal on a separate disk would also help reduce the seeking
during writes.  Using flash for the journal is not useful because it does
almost exclusively linear IO and no seeking.

> The rationale behind this question comes from use cases where a file system 
> is storing very large numbers of files. Reading files in these file systems 
> will essentially incur at least two seeks: one for the inode, one for the 
> data blocks. If the seek to the inode were more efficient, dramatic 
> performance gains could be achieved for such use cases.
>
> Fast seeking devices (such as flash based devices) are becoming much more 
> mainstream these days and would seem like a reasonable device for the 
> inodes. The $/GB is not as good as disks but it's much better than DRAM. 
> For many use cases, the number of these "fast access" inodes that would 
> need to be cached in RAM is near 0. So, RAM savings are also a potential 
> benefit.
>
> I've ran some basic tests using ext4 on a SATA array plus a USB thumb drive 
> for the inodes. Even with the slowness of a thumb drive, I was able to see 
> encouraging results ( >50% read throughput improvement for a mixture of 
> 4K-8K files).

Ah, are you using FLEX_BG for this test?  It would also be interesting to
see if splitting the metadata onto a normal disk had the same effect,
just by virtue of allowing the data and metadata to seek independently.

There is also work to aggregate the allocation of ext4 file allocation
metadata, and while this speeds up unlink the current algorithm hurts
the read performance.  Having the file allocation metadata on a separate
disk may avoid this performance hit.  It may also be that we just need
to make more effort to do readahead on this metadata.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux