maximum object size

"HEWLETT, Paul (Paul)" <paul.hewlett@xxxxxxxxxxxxxxxxxx> · Tue, 8 Sep 2015 15:54:48 +0000

Hi All

We have recently encountered a problem on Hammer (0.94.2) whereby we
cannot write objects > 2GB in size to the rados backend.
(NB not RadosGW, CephFS or RBD)

I found the following issue
https://wiki.ceph.com/Planning/Blueprints/Firefly/Object_striping_in_librad
os which seems to address this but no progress reported.

What are the implications of writing such large objects to RADOS? What
impact is expected on the XFS backend particularly regarding the size and
location of the journal?

Any prospect of progressing the issue reported in the enclosed link?

Interestingly I could not find anywhere in the ceph documentation that
describes the 2GB limitation. The implication of most of the website docs
is that there is no limit on objects stored in Ceph. The only hint is that
osd_max_write_size is a 32 bit signed integer.

If we use erasure coding will this reduce the impact? I.e. 4+1 EC will
only write 500MB to each OSD and then this value will be tested against
the chunk size instead of the total file size?

The relevant code in Ceph is:

src/FileJournal.cc:

  needed_space = ((int64_t)g_conf->osd_max_write_size) << 20;
  needed_space += (2 * sizeof(entry_header_t)) + get_top();
  if (header.max_size - header.start < needed_space) {
    derr << "FileJournal::create: OSD journal is not large enough to hold "
    << "osd_max_write_size bytes!" << dendl;
    ret = -ENOSPC;
    goto free_buf;
  }

src/osd/OSD.cc:

    // too big?
    if (cct->_conf->osd_max_write_size &&
    m->get_data_len() > cct->_conf->osd_max_write_size << 20) {
    // journal can't hold commit!
     derr << "handle_op msg data len " << m->get_data_len()
     << " > osd_max_write_size " << (cct->_conf->osd_max_write_size << 20)
     << " on " << *m << dendl;
    service.reply_op_error(op, -OSD_WRITETOOBIG);
    return;
  }

Interestingly the code in OSD.cc looks like a bug - the max_write value
should be cast to an int64_t before shifting left 20 bits (which is done
correctly in FileJournal.cc). Otherwise overflow may occur and negative
values generated.

Any comments welcome - any help appreciated.

Regards
Paul

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com