Re: Rados maximum object size issue since Luminous?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Mon, 3 Jul 2017 10:59:08 -0700



On Mon, Jul 3, 2017 at 10:17 AM, Martin Emrich
<martin.emrich@xxxxxxxxxxx> wrote:
> Hi!
>
>
>
> Having to interrupt my bluestore test, I have another issue since upgrading
> from Jewel to Luminous: My backup system (Bareos with RadosFile backend) can
> no longer write Volumes (objects) larger than around 128MB.
>
> (Of course, I did not test that on my test cluster prior to upgrading the
> production one :/ )
>
>
>
> At first, I suspected an incompatibility between the Bareos storage daemon
> and the newer Ceph version, but I could replicate it with the rados tool:
>
>
>
> Create a large file (1GB)
>
>
>
> Put it with rados
>
>
>
> rados --pool backup put rados-testfile rados-testfile-1G
>
> error putting backup-fra1/rados-testfile: (27) File too large
>
>
>
> Read it back:
>
>
>
> rados  --pool backup get rados-testfile rados-testfile-readback
>
>
>
> Indeed, it wrote just about 128MB
>
>
>
> Adding the “—striper” option to both get and put command lines, it works:
>
>
>
> -rw-r--r-- 1 root root 1073741824  3. Jul 18:47 rados-testfile-1G
>
> -rw-r--r-- 1 root root  134217728  3. Jul 19:12 rados-testfile-readback
>
>
>
> The error message I get from the backup system looks similar:
>
> block.c:659-29028 === Write error. fd=0 size=64512 rtn=-1 dev_blk=134185235
> blk_blk=10401 errno=28: ERR=Auf dem Gerät ist kein Speicherplatz mehr
> verfügbar
>
>
>
> (German for „No space left on device”)
>
>
>
> The service worked fine with Ceph jewel, nicely writing 50GB objects. Did
> the API change somehow?

We set a default maximum object size (of 128MB, probably?) in order to
prevent people setting individual objects which are too large for the
system to behave well with. It is configurable (I don't remember how,
you'll need to look it up in hopefully-the-docs but
probably-the-source), but there's generally not a good reason to
create single individual objects instead of sharding them. 50GB
objects probably work fine for archival, but if eg you have an OSD
failure you won't be able to do any IO on objects which are being
backfilled or recovered, and for a 50GB object that will take a while.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com