Re: CEPH bluestore space consumption with small objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Don't forget that at those sizes the internal journals and rocksdb size tunings are likely to be a significant fixed cost.

On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann <marcus.haarmann@xxxxxxxxx>:
>
>
> Hi,
> we are doing some tests here with a Kraken setup using bluestore backend (on Ubuntu 64 bit).
> We are trying to store > 10 mio very small objects using RADOS.
> (no fs, no rdb, only osd and monitors)
>
> The setup was done with ceph-deploy, using the standard bluestore option, no separate devices
> for wal. The test cluster spreads over 3 virtual machines, each with 100GB storage für osd.
>
> We are now in the following situation (used pool is "test"):
> rados df
> POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED RD_OPS RD WR_OPS WR
> rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k
> test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M
>
> total_objects 595429
> total_used 141G
> total_avail 158G
> total_space 299G
>
> ceph osd df
> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
> 0 0.09760 1.00000 102298M 50763M 51535M 49.62 1.00 72
> 1 0.09760 1.00000 102298M 50799M 51499M 49.66 1.00 72
> 2 0.09760 1.00000 102298M 50814M 51484M 49.67 1.00 72
> TOTAL 299G 148G 150G 49.65
> MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02
>
> As you can see, there are about 18GB data stored in ~595000 objects now.
> The actual space consumption is about 150GB, which fills about half of the storage.
>

Not really. Each OSD uses 50GB, but since you replicate 3 times (default) it's storing 150GB spread out over 3 OSDs.

So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a lot, but a lot less then 150GB.

> Objects have been added with a test script using the rados command line (put).
>
> Obviously, the stored objects are counted byte by byte in the rados df command,
> but the real space allocation is about factor 8.
>

As written above, it's ~2.5x, not 8x.

> The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects.
>
> Is there any recommended way to configure bluestore with a better suitable
> block size for those small objects ? I cannot find any configuration option
> which would allow modification of the internal block handling of bluestore.
> Is luminous an option which allows more specific configuration ?
>

Could you try this with the Luminous RC as well? I don't know the answer here, but since Kraken a LOT has been improved to BlueStore.

Wido

> Thank you all in advance for support.
>
> Marcus Haarmann
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux