I can't imagine that this isn't a frequently asked question, but with my
poor search skills, I've come up completely empty on this.
I'm trying to understand why the newer "sophisticated" filesystems (e.g.
btrfs) are implementing raid redundancy as part of the filesystem rather
than the traditional approach of a separate virtual-block-device layer such
as md, with a filesystem on top of it as a distinct layer. In addition to
replication of effort/code [again and again for each new filesystem
implementation that comes along], it seems to be mixing too much
functionality into one monolithic layer, increasing complexity and the
subsequent inevitable increased number of bugs and difficulty of debugging.
Of course, the people working on these filesystems aren't idiots, so I
assume that there _are_ reasons, but in speculating what they are, I don't
come up with much that seems to me to overcome the inherent disadvantages of
the integrated approach. The primary thing that I've thought of is the
ability to use filesystem-specific information to optimize raid operations,
such as awareness that a given block is not currently in use by any file, so
does not need to be redundantly stored (e.g. during a resync operation).
Yet surely an API could be created to allow cross-layer communication of
this sort of information, similar to the TRIM command of SSDs.
Can anyone shed light on the relative advantages and disadvantages of
integrating raid redundancy functionality directly into a filesystem?
Thank you very much in advance,
David