Re: Appending to an erasure coded pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Gregory,

Many thanks for your reply. I couldn't spot any resources that describe/show how you can successfully write / append to an EC pool with the librados API on those links. Do you know of any such examples or resources? Or is it just simply not possible?

Best regards,

James Norman

On 6 Oct 2016, at 19:17, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

On Thu, Oct 6, 2016 at 4:08 AM, James Norman <james@xxxxxxxxxxxxxxxxxxx> wrote:
Hi there,

I am developing a web application that supports browsing, uploading,
downloading, moving files in Ceph Rados pool. Internally to write objects we
use rados_append, as it's often too memory intensive for us to have the full
file in memory to do a rados_write_full.

We do not control our customer's Ceph installations, such as whether they
use replicated pools, EC pools etc. We've found that when dealing with a EC
pool, our rados_append calls return error code 95 and message "Operation not
supported".

I've had several discussions with members in the IRC chatroom regarding
this, and the general consensus I've got is:
1) Use write alignment.
2) Put a replicated pool in front of the EC pool
3) EC pools have a limited feature set

Regarding point 1), are there any actual code example for how you would
handle this in the context of rados_append? I have struggled to find even
one. This seems to me something that should be handled by either the API
libraries, or Ceph itself, not the client trying to write some data.

librados requires a fair bit of knowledge from the user applications,
yes. One thing you mention that sounds concerning is that you can't
hold the objects in-memory — RADOS is not comfortable with very large
objects and you'll find that things like backfill might not perform as
you expect. (At this point everything will *probably* function, but it
may be so slow as to make no difference to you when it hits that
situation.) Certainly if your objects do not all fit neatly into
buckets of a particular size and you have some that are very large,
you will have a very not-uniform balance.

But, if you want to learn about EC pools there is some documentation
at http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/
(or in ceph.git/doc/dev/osd_internals/erasure_coding) from when they
were being created.


Regarding point 2) This seems to be a workaround, and generally not
something we want to recommend to our customers. Is it detrimental to us an
EC pool without a replicated pool? What are the performance costs of doing
so?

Yeah, don't do that. Cache pools are really tricky to use properly and
turned out not to perform very well.


Regarding point 3) Can you point me towards resources that describe what
features / abilities you lose by adopting an EC pool?

Same as above links, apparently. But really, you can read from and
append to them. There are no object classes, no arbitrary overwrites,
no omaps.
-Greg

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux