On Tue, Sep 9, 2014 at 6:10 PM, Blair Bethwaite <blair.bethwaite at gmail.com> wrote: > Hi Sage, > > Thanks for weighing into this directly and allaying some concerns. > > It would be good to get a better understanding about where the rough > edges are - if deployers have some knowledge of those then they can be > worked around to some extent. It's just a very long process to qualify a filesystem, even in this limited sense. We're still at the point where we're solving bugs that the open-source community brings us rather than setting out to make it stable for a particular identified workload. For the moment most of our development effort is focused on 1) instrumentation that makes it possible for users (and developers!) to identify the cause of problems we run across 2) basic mechanisms for fixing "ephemeral" bugs (things like booting dead clients, restarting hung metadata ops, etc) 3) general usability issues that our newer developers and users are reporting to us 4) the beginnings of fsck (correctness checking for now, no fixing yet) > E.g., for our use-case it may be that > whilst Inktank/RedHat won't provide support for CephFS that we are > better off using it in a tightly controlled fashion (e.g., no > snapshots, restricted set of native clients acting as presentation > layer with others coming in via SAMBA & Ganesha, no dynamic metadata > tree/s, ???) where we're less likely to run into issues. Well, snapshots are definitely going to break your install (they're disabled by default, now). Multi-mds is unstable enough that nobody should be running with it. We run samba and NFS tests in our nightlies and they mostly work, although we've got some odd issues we've not tracked down when *ending* the samba process or unmounting nfs. (Our best guess on these is test or environment issues, rather than actual FS issues.) But these are probably not complete. > Related, given there is no fsck, how would one go about backing up the > metadata in order to facilitate DR? Is there even a way for that to > make sense given the decoupling of data & metadata pools...? Uh, depends on the kind of DR you're going for, I guess. There are lots of things that will backup a generic filesystem; you could do something smarter with a bit of custom scripting using Ceph's rstats. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com