By setting a parameter osd_max_write_size to 2047Š This normally defaults to 90 Setting to 2048 exposes a bug in Ceph where signed overflow occurs... Part of the problem is my expectations. Ilya pointed out that one can use libradosstriper to stripe a large object over many OSD¹s. I expected this to happen automatically for any object > osd_max_write_size (=90MB) but it does not. Instead one has to set special attributes to trigger striping. Additionally interaction with erasure coding is unclear - apparently the error is reached when the total file size exceeds the limit - if EC is enabled then maybe a better solution would be to test the size of the chunk written to the OSD which will be only part of the total file size. Or do I have that wrong? If EC is being used then would the individual chunks after splitting the file then be erasure coded ? I.e if we decide to split a large file into 5 striped chunks does ceph then EC the individual chunks? Striping is not really documentedŠ Paul On 08/09/2015 17:53, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote: >I think the limit is 90 MB from OSD side, isn't it ? >If so, how are you able to write object till 1.99 GB ? >Am I missing anything ? > >Thanks & Regards >Somnath > >-----Original Message----- >From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >HEWLETT, Paul (Paul) >Sent: Tuesday, September 08, 2015 8:55 AM >To: ceph-users@xxxxxxxxxxxxxx >Subject: maximum object size > >Hi All > >We have recently encountered a problem on Hammer (0.94.2) whereby we >cannot write objects > 2GB in size to the rados backend. >(NB not RadosGW, CephFS or RBD) > >I found the following issue >https://wiki.ceph.com/Planning/Blueprints/Firefly/Object_striping_in_libra >d >os which seems to address this but no progress reported. > >What are the implications of writing such large objects to RADOS? What >impact is expected on the XFS backend particularly regarding the size and >location of the journal? > >Any prospect of progressing the issue reported in the enclosed link? > >Interestingly I could not find anywhere in the ceph documentation that >describes the 2GB limitation. The implication of most of the website docs >is that there is no limit on objects stored in Ceph. The only hint is >that osd_max_write_size is a 32 bit signed integer. > >If we use erasure coding will this reduce the impact? I.e. 4+1 EC will >only write 500MB to each OSD and then this value will be tested against >the chunk size instead of the total file size? > >The relevant code in Ceph is: > >src/FileJournal.cc: > > needed_space = ((int64_t)g_conf->osd_max_write_size) << 20; > needed_space += (2 * sizeof(entry_header_t)) + get_top(); > if (header.max_size - header.start < needed_space) { > derr << "FileJournal::create: OSD journal is not large enough to hold >" > << "osd_max_write_size bytes!" << dendl; > ret = -ENOSPC; > goto free_buf; > } > >src/osd/OSD.cc: > > // too big? > if (cct->_conf->osd_max_write_size && > m->get_data_len() > cct->_conf->osd_max_write_size << 20) { > // journal can't hold commit! > derr << "handle_op msg data len " << m->get_data_len() > << " > osd_max_write_size " << (cct->_conf->osd_max_write_size << 20) > << " on " << *m << dendl; > service.reply_op_error(op, -OSD_WRITETOOBIG); > return; > } > >Interestingly the code in OSD.cc looks like a bug - the max_write value >should be cast to an int64_t before shifting left 20 bits (which is done >correctly in FileJournal.cc). Otherwise overflow may occur and negative >values generated. > > >Any comments welcome - any help appreciated. > >Regards >Paul > > >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >________________________________ > >PLEASE NOTE: The information contained in this electronic mail message is >intended only for the use of the designated recipient(s) named above. If >the reader of this message is not the intended recipient, you are hereby >notified that you have received this message in error and that any >review, dissemination, distribution, or copying of this message is >strictly prohibited. If you have received this communication in error, >please notify the sender by telephone or e-mail (as shown above) >immediately and destroy any and all copies of this message in your >possession (whether hard copies or electronically stored copies). > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com