the state of cephfs in giant

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 13 Oct 2014 11:16:31 -0700 (PDT)

We've been doing a lot of work on CephFS over the past few months. This
is an update on the current state of things as of Giant.

What we've working on:

* better mds/cephfs health reports to the monitor
* mds journal dump/repair tool
* many kernel and ceph-fuse/libcephfs client bug fixes
* file size recovery improvements
* client session management fixes (and tests)
* admin socket commands for diagnosis and admin intervention
* many bug fixes

We started using CephFS to back the teuthology (QA) infrastructure in the
lab about three months ago. We fixed a bunch of stuff over the first
month or two (several kernel bugs, a few MDS bugs). We've had no problems
for the last month or so. We're currently running 0.86 (giant release
candidate) with a single MDS and ~70 OSDs. Clients are running a 3.16
kernel plus several fixes that went into 3.17.

With Giant, we are at a point where we would ask that everyone try
things out for any non-production workloads. We are very interested in
feedback around stability, usability, feature gaps, and performance. We
recommend:

* Single active MDS. You can run any number of standby MDS's, but we are
  not focusing on multi-mds bugs just yet (and our existing multimds test
  suite is already hitting several).
* No snapshots. These are disabled by default and require a scary admin
  command to enable them. Although these mostly work, there are
  several known issues that we haven't addressed and they complicate
  things immensely. Please avoid them for now.
* Either the kernel client (kernel 3.17 or later) or userspace (ceph-fuse
  or libcephfs) clients are in good working order.

The key missing feature right now is fsck (both check and repair). This is 
*the* development focus for Hammer.

Here's a more detailed rundown of the status of various features:

* multi-mds: implemented. limited test coverage. several known issues.
  use only for non-production workloads and expect some stability
  issues that could lead to data loss.

* snapshots: implemented. limited test coverage. several known issues.
  use only for non-production workloads and expect some stability issues
  that could lead to data loss.

* hard links: stable. no known issues, but there is somewhat limited
  test coverage (we don't test creating huge link farms).

* direct io: implemented and tested for kernel client. no special
  support for ceph-fuse (the kernel fuse driver handles this).

* xattrs: implemented, stable, tested. no known issues (for both kernel
  and userspace clients).

* ACLs: implemented, tested for kernel client. not implemented for
  ceph-fuse.

* file locking (fcntl, flock): supported and tested for kernel client.
  limited test coverage. one known minor issue for kernel with fix
  pending. implemention in progress for ceph-fuse/libcephfs.

* kernel fscache support: implmented. no test coverage. used in
  production by adfin.

* hadoop bindings: implemented, limited test coverage. a few known
  issues.

* samba VFS integration: implemented, limited test coverage.

* ganesha NFS integration: implemented, no test coverage.

* kernel NFS reexport: implemented. limited test coverage. no known
  issues.

Anybody who has experienced bugs in the past should be excited by:

* new MDS admin socket commands to look at pending operations and client 
  session states. (Check them out with "ceph daemon mds.a help"!) These 
  will make diagnosing, debugging, and even fixing issues a lot simpler.

* the cephfs_journal_tool, which is capable of manipulating mds journal 
  state without doing difficult exports/imports and using hexedit.

Thanks!
sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com