Thanks all for your feedback. Summarizing from the discussion so far, there seem to be three main solutions suggested for replicating metadata: 1) Use mke2fs hack to store all metadata in 1st block group and use dm and raid1 to mirror 1st block group (most of the metadata). Pros: Simple approach that does not require any ext4 changes. Cons: Added overhead of raid and device mapper will be significant for fast SSDs Cons: Management overhead on large number of machines Cons: Need to add support in raid to read from the mirror if primary fails. 2) Have a separate metadata device and access all ext4 metadata from it. This device could be raid1 or whatever. Pros: No need for device mapper Pros: Solves many other problems (SSDs can be used to cache metadata for disks, etc.) Cons: Will need to significantly over allocate space (running out of space on this device potentially means no more writes to filesystem). Cons: Lot of ext4 code change 3) A replica inode that resides on either same device or an external device (this proposal) Pros: No need for device mapper or other additional layers Pros: Simpler management in production Cons: Not generic (Ext4 specific) Cons: Complicates Ext4 for questionable gain (specially with inode being on same device) #2 seems to be an ideal solution, but it would be substantial amount of efforts and will require lot of ext4 changes. One other alternative that comes to mind is to have an external "replica device" (hybrid of ideas #2 and #3) instead of an entire "metadata device" with an option for the filesystem to read from the replica first. All metadata writes that go to the original will also go to the replica device. In addition, the filesystem can choose to read from the replica first. With this, we get the benifits of #2 and #3 without needing lot of ext4 (or any other filesystem) changes. What do you think? Will this be something that could be implemented without much intrusion into ext4 codebase? Thanks, On Fri, Oct 21, 2011 at 8:54 AM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > On Fri, Oct 21, 2011 at 10:52:11AM -0500, Eric Sandeen wrote: >> With an SSD, you -really- don't know the independent failure domains, >> with all the garbage collection & remapping that they may do, right? > > In fact some popular consumer SSDs do some fairly efficient data > de-duplication which completly runs any metadata redundancy on a single > of these devices void. > > -- Aditya -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html