On 11/01/2012 04:18 PM, Gandalf Corvotempesta wrote:
2012/10/31 Stefan Kleijkers <stefan@xxxxxxxxxxxx>:
As far as I know, this is correct. You get a ACK (on the write) back after
it landed on ALL three journals (or/and osds in case of BTRFS in parallel
mode). So If you lose one node, you still have it in two more nodes and they
will commit it to disk. After recovering the missing node/osd it will get
the data from one of the other nodes. So you won't lose any data.
In this case I can suppose that ceph writing speed is relative to the
journal's writing speed and never to ODS disks.
Eventually you will need to write all of that data out to disk and
writes to the journal will have to stop to allow the underlying disk to
catch up. In cases like that you will often see performance going along
seemingly speedily and then all of a sudden see long pauses and possibly
chaotic performance characteristics.
Let's assume a journal size of 150GB, capable to write at 200MB/s in a
2gbit/s network (lacp between two gigabit ports), no replica between
OSDs and very very slow SATA disk (5400 RPM, for example, much slower
than jurnal). Just a single osd.
Ceph will write at 200MB/s, and in background it will flush journals
to disk, right?
It will do that for a while, based on how you've tweaked the flush
intervals and various journal settings to determine how much data ceph
will allow to hang out in the journal while still accepting new requests.
I can assume that journal is a buffer and RBD will write only to it.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html