On Mon, Oct 17, 2016 at 3:34 AM, James Norman <james@xxxxxxxxxxxxxxxxxxx> wrote: > Hi Gregory, > > Many thanks for your reply. I couldn't spot any resources that describe/show > how you can successfully write / append to an EC pool with the librados API > on those links. Do you know of any such examples or resources? Or is it just > simply not possible? If it's not in there I guess it's all "spoken" knowledge and you'll have to dig through ceph-devel archives (probably for emails from Sam). I'm not on the RADOS team, but the concept you need: *) objects in EC pools can only be appended or truncated+recreated *) because otherwise you'd need round-trip read-modify-write operations *) so all operations must be in the block size you specify (or maybe it's implicit based on stripe size and EC n count?) at pool create time *) including the appends. I'm afraid that's about all the info I've got on it though. -Greg > > Best regards, > > James Norman > > On 6 Oct 2016, at 19:17, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > On Thu, Oct 6, 2016 at 4:08 AM, James Norman <james@xxxxxxxxxxxxxxxxxxx> > wrote: > > Hi there, > > I am developing a web application that supports browsing, uploading, > downloading, moving files in Ceph Rados pool. Internally to write objects we > use rados_append, as it's often too memory intensive for us to have the full > file in memory to do a rados_write_full. > > We do not control our customer's Ceph installations, such as whether they > use replicated pools, EC pools etc. We've found that when dealing with a EC > pool, our rados_append calls return error code 95 and message "Operation not > supported". > > I've had several discussions with members in the IRC chatroom regarding > this, and the general consensus I've got is: > 1) Use write alignment. > 2) Put a replicated pool in front of the EC pool > 3) EC pools have a limited feature set > > Regarding point 1), are there any actual code example for how you would > handle this in the context of rados_append? I have struggled to find even > one. This seems to me something that should be handled by either the API > libraries, or Ceph itself, not the client trying to write some data. > > > librados requires a fair bit of knowledge from the user applications, > yes. One thing you mention that sounds concerning is that you can't > hold the objects in-memory — RADOS is not comfortable with very large > objects and you'll find that things like backfill might not perform as > you expect. (At this point everything will *probably* function, but it > may be so slow as to make no difference to you when it hits that > situation.) Certainly if your objects do not all fit neatly into > buckets of a particular size and you have some that are very large, > you will have a very not-uniform balance. > > But, if you want to learn about EC pools there is some documentation > at http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/ > (or in ceph.git/doc/dev/osd_internals/erasure_coding) from when they > were being created. > > > Regarding point 2) This seems to be a workaround, and generally not > something we want to recommend to our customers. Is it detrimental to us an > EC pool without a replicated pool? What are the performance costs of doing > so? > > > Yeah, don't do that. Cache pools are really tricky to use properly and > turned out not to perform very well. > > > Regarding point 3) Can you point me towards resources that describe what > features / abilities you lose by adopting an EC pool? > > > Same as above links, apparently. But really, you can read from and > append to them. There are no object classes, no arbitrary overwrites, > no omaps. > -Greg > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com