Re: [PATCH 0/3] block I/O when cluster is full

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Mon, 09 Dec 2013 16:11:34 -0800

On 12/06/2013 06:24 PM, Gregory Farnum wrote:
On Fri, Dec 6, 2013 at 6:16 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:
On 12/05/2013 08:58 PM, Gregory Farnum wrote:

On Thu, Dec 5, 2013 at 5:47 PM, Josh Durgin <josh.durgin@xxxxxxxxxxx>
wrote:

On 12/03/2013 03:12 PM, Josh Durgin wrote:

These patches allow rbd to block writes instead of returning errors
when OSDs are full enough that the FULL flag is set in the osd map.
This avoids filesystems on top of rbd getting confused by transient
EIOs if the cluster oscillates between full and non-full.

These are also available in the wip-full branch of ceph-client.git.

Josh Durgin (3):
     libceph: block I/O when PAUSE or FULL osd map flags are set
     libceph: add an option to configure client behavior when osds are
       full
     rbd: document rbd-specific options

Due to a race condition between clients and osds in handling maps
marked FULL, it's not feasible to offer the 'error' option, so patches
2 and 3 can be ignored.

http://tracker.ceph.com/issues/6938

It's not clear to me — are you going to assume all ENOSPC means the
map is marked as full and intercept it, or that you can't reliably
block IO so don't bother trying?

Don't bother trying to stop ENOSPC on the client side, since it'd need some
restructuring in the kernel side and would be prone to screwing up
write ordering.

Instead drop writes on the osd side when they have a map marked full,
and have clients resend all writes when a map goes transitions from
full -> nonfull. The userspace side is https://github.com/ceph/ceph/pull/914

Do previous client implementations already satisfy that requirement?
We can't drop requests if older clients expect a response...

No, previous clients do not do this. For old rbd clients, this turns a
potential corruption into a hang, which is a good trade-off imo.

For userspace clients, this only happens when the osd gets the FULL map
first, and rejects a write in flight before the client got a FULL map.

The kernel client already rejects writes at the fs layer when the FULL
flag is set, so kcephfs will only be affected when it hits this race as
well.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html