Re: RBD journal draft design

Jason Dillaman <dillaman@xxxxxxxxxx> · Thu, 4 Jun 2015 11:08:08 -0400 (EDT)

> >> >A successful append will indicate whether or not the journal is now full
> >> >(larger than the max object size), indicating to the client that a new
> >> >journal object should be used.  If the journal is too large, an error
> >> >code
> >> >responce would alert the client that it needs to write to the current
> >> >active journal object.  In practice, the only time the journaler should
> >> >expect to see such a response would be in the case where multiple clients
> >> >are using the same journal and the active object update notification has
> >> >yet to be received.
> >>
> >> I'm confused. How does this work with the splay count thing you
> >> mentioned above? Can you define <splay count>?
> >
> > Similar to the stripe width.
> 
> Okay, that sort of makes sense but I don't see how you could legally
> be writing to different "sets" so why not just make it an explicit
> striping thing and move all journal entries for that "set" at once?
> 
> ...Actually, doesn't *not* forcing a coordinated move from one object
> set to another mean that you don't actually have an ordering guarantee 
> across tags if you replay the journal objects in order?

The ordering between tags was meant to be a soft ordering guarantee (since any number of delays could throw off the actual order as delivered from the OS).  In the case of a VM using multiple RBD images sharing the same journal, this provides an ordering guarantee per device but not between devices.

This is no worse than the case of each RBD image using its own journal instead of sharing a journal and the behavior doesn't seem too different from a non-RBD case when submitting requests to two different physical devices (e.g. a SSD device and a NAS device will commit data at different latencies). Without the forced coordinated move, the potential gap in request orders between two devices would increase by the latency of the notify message roundtrip time, but it prevents the need for potentially resending journal entries to a new journal object.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html