Filesystem-based raid vs. device-based raid

David F <raid@xxxxxxxxxxxxxxxx> · Thu, 20 Sep 2018 15:52:25 -0400

I can't imagine that this isn't a frequently asked question, but with my 
poor search skills, I've come up completely empty on this.

I'm trying to understand why the newer "sophisticated" filesystems (e.g. 
btrfs) are implementing raid redundancy as part of the filesystem rather 
than the traditional approach of a separate virtual-block-device layer such 
as md, with a filesystem on top of it as a distinct layer.  In addition to 
replication of effort/code [again and again for each new filesystem 
implementation that comes along], it seems to be mixing too much 
functionality into one monolithic layer, increasing complexity and the 
subsequent inevitable increased number of bugs and difficulty of debugging.

Of course, the people working on these filesystems aren't idiots, so I 
assume that there _are_ reasons, but in speculating what they are, I don't 
come up with much that seems to me to overcome the inherent disadvantages of 
the integrated approach.  The primary thing that I've thought of is the 
ability to use filesystem-specific information to optimize raid operations, 
such as awareness that a given block is not currently in use by any file, so 
does not need to be redundantly stored (e.g. during a resync operation). 
Yet surely an API could be created to allow cross-layer communication of 
this sort of information, similar to the TRIM command of SSDs.

Can anyone shed light on the relative advantages and disadvantages of 
integrating raid redundancy functionality directly into a filesystem?

Thank you very much in advance,
David