Re: maximum object size

"HEWLETT, Paul (Paul)" <paul.hewlett@xxxxxxxxxxxxxxxxxx> · Wed, 9 Sep 2015 08:22:16 +0000

By setting a parameter osd_max_write_size to 2047Š
This normally defaults to 90

Setting to 2048 exposes a bug in Ceph where signed overflow occurs...

Part of the problem is my expectations. Ilya pointed out that one can use
libradosstriper to stripe a large object over many OSD¹s. I expected this
to happen automatically for any object > osd_max_write_size (=90MB) but it
does not. Instead one has to set special attributes to trigger striping.

Additionally interaction with erasure coding is unclear - apparently the
error is reached when the total file size exceeds the limit - if EC is
enabled then maybe a better solution would be to test the size of the
chunk written to the OSD which will be only part of the total file size.
Or do I have that wrong?

If EC is being used then would the individual chunks after splitting the
file then be erasure coded ? I.e if we decide to split a large file into 5
striped chunks does ceph then EC the individual chunks?

Striping is not really documentedŠ

Paul

On 08/09/2015 17:53, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote:

>I think the limit is 90 MB from OSD side, isn't it ?
>If so, how are you able to write object till 1.99 GB ?
>Am I missing anything ?
>
>Thanks & Regards
>Somnath
>
>-----Original Message-----
>From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>HEWLETT, Paul (Paul)
>Sent: Tuesday, September 08, 2015 8:55 AM
>To: ceph-users@xxxxxxxxxxxxxx
>Subject:  maximum object size
>
>Hi All
>
>We have recently encountered a problem on Hammer (0.94.2) whereby we
>cannot write objects > 2GB in size to the rados backend.
>(NB not RadosGW, CephFS or RBD)
>
>I found the following issue
>https://wiki.ceph.com/Planning/Blueprints/Firefly/Object_striping_in_libra
>d
>os which seems to address this but no progress reported.
>
>What are the implications of writing such large objects to RADOS? What
>impact is expected on the XFS backend particularly regarding the size and
>location of the journal?
>
>Any prospect of progressing the issue reported in the enclosed link?
>
>Interestingly I could not find anywhere in the ceph documentation that
>describes the 2GB limitation. The implication of most of the website docs
>is that there is no limit on objects stored in Ceph. The only hint is
>that osd_max_write_size is a 32 bit signed integer.
>
>If we use erasure coding will this reduce the impact? I.e. 4+1 EC will
>only write 500MB to each OSD and then this value will be tested against
>the chunk size instead of the total file size?
>
>The relevant code in Ceph is:
>
>src/FileJournal.cc:
>
>  needed_space = ((int64_t)g_conf->osd_max_write_size) << 20;
>  needed_space += (2 * sizeof(entry_header_t)) + get_top();
>  if (header.max_size - header.start < needed_space) {
>    derr << "FileJournal::create: OSD journal is not large enough to hold
>"
>    << "osd_max_write_size bytes!" << dendl;
>    ret = -ENOSPC;
>    goto free_buf;
>  }
>
>src/osd/OSD.cc:
>
>    // too big?
>    if (cct->_conf->osd_max_write_size &&
>    m->get_data_len() > cct->_conf->osd_max_write_size << 20) {
>    // journal can't hold commit!
>     derr << "handle_op msg data len " << m->get_data_len()
>     << " > osd_max_write_size " << (cct->_conf->osd_max_write_size << 20)
>     << " on " << *m << dendl;
>    service.reply_op_error(op, -OSD_WRITETOOBIG);
>    return;
>  }
>
>Interestingly the code in OSD.cc looks like a bug - the max_write value
>should be cast to an int64_t before shifting left 20 bits (which is done
>correctly in FileJournal.cc). Otherwise overflow may occur and negative
>values generated.
>
>
>Any comments welcome - any help appreciated.
>
>Regards
>Paul
>
>
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>________________________________
>
>PLEASE NOTE: The information contained in this electronic mail message is
>intended only for the use of the designated recipient(s) named above. If
>the reader of this message is not the intended recipient, you are hereby
>notified that you have received this message in error and that any
>review, dissemination, distribution, or copying of this message is
>strictly prohibited. If you have received this communication in error,
>please notify the sender by telephone or e-mail (as shown above)
>immediately and destroy any and all copies of this message in your
>possession (whether hard copies or electronically stored copies).
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com