Re: Understanding Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 18, 2011 at 8:43 AM, Bill Hastings <bllhastings@xxxxxxxxx> wrote:
> Thanks for the response. What if a write of 16 bytes was successful at
> nodes A and B and failed at C, perhaps C had a momentarily unreachable
> via the network? How is the Ceph client prevented from performing the
> next read at C? Also what if the writes to OSD's were successful but

In that case the client wouldn't have gotten a successful response in
the first place. The client sends the writes to the primary osd
handling that pg, and will get the following responses from it:
 - ack message when the request is in the page/buffer cache on all replicas
 - commit message when the request is on stable storage on all replicas

(depending on setup, in some cases it'll just get a commit message
which implies ack anyway)

The osd is responsible that data was written to all replicas, and the
client wouldn't get the commit response until then. For rbd, clients
wait for the commit message as an acknowledgment to write completion.

> the metadata update fails? How is this managed if at all? How are

What kind of metadata are you referring to? For rbd there is no metadata update.

> writes that straddle chunk boundaries handled from a transactional
> perspective? I am just in the process of investigation so please
> forgive me if the questions are very naive.

Depending on which client we're talking about. The short answer is
that the client will only get a response after all chunks were written
and acknowledged.

However, there are currently two different implementations; one is in
the linux kernel and the other one is based on librbd. In the linux
kernel, acknowledging the write is being done in byte order of the
request. That is, only after the first chunk was acked, the second one
would be acked, even if the osds responded in different order.
In librbd there's another complexity since we can cache the requests
and respond with an early ack. Ignoring that, client will only get a
response after all chunks were applied and acked.


Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux