Like a lot of system monitoring stuff, this is the kind of thing that
in an ideal world we wouldn't have to worry about, but the experience
in practice is that people deploy big distributed storage systems
without having really good monitoring in place. We (people providing
not to become completely off-topic but do you have any suggestions for
such "really good monitoring" that could help monitor the many-to-many
communication pattern that is typical for ceph cluster? especially the
performance part, not only the funxtional part.
stijn
storage) have an especially good motivation to provide basic network
issue detection, because without it we can be blamed for network
issues ("The storage is slow!" ... 1 week... "No, your infiniband
cable is kinked").
That said, the fact that we're motivated to write it doesn't mean it
has to be physically built into things like the osd and the mon, it
makes sense to keep things like this a bit separate.
Nice option would be to read data from all replicas at once - this would of course increase load and cause all sorts of issues if abused, but if you have an app that absolutely-always-without-fail-must-get-data-ASAP then you could enable this in the client (and I think that would be an easy option to add). This is actually used in some systems. Harder part is to fail nicely when writing (like waiting only for the remote network buffers on 2 nodes to get the data instead of waiting for commit on all 3 replicas…)
Parallel reads have been talked about
https://wiki.ceph.com/Planning/Blueprints/Hammer/librados%3A_support_parallel_reads
(no idea if anyone has a working version of it yet).
John
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com