RE: BlueStore metadata write overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, seems something wrong..Here is the 'ceph df' on a freshly created cluster with no data.

root@stormeap-1:~/fio_rbd/fio/examples# ceph df
2016-08-19 09:47:45.714539 7f7fd3ed9700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2016-08-19 09:47:45.717589 7f7fd3ed9700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
2016-08-19 09:47:45.718583 7f7fd3ed9700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore,rocksdb
GLOBAL:
    SIZE     AVAIL       RAW USED     %RAW USED
    106T     100536G        8742G          8.00
POOLS:
    NAME              ID     USED     %USED     MAX AVAIL     OBJECTS
    recovery_test     1        16         0        50268G           3

It is saying 8TB RAW used , I will take a look.

Thanks & Regards
Somnath

-----Original Message-----
From: Sage Weil [mailto:sweil@xxxxxxxxxx]
Sent: Friday, August 19, 2016 7:00 AM
To: Somnath Roy
Cc: ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: BlueStore metadata write overhead

On Thu, 18 Aug 2016, Somnath Roy wrote:
> Sage, Here is some estimate probably how much extra (space wise) we
> are writing with Bluestore. Considering Rocksdb has not much space amp
> for level style compaction , these are mostly the metadata Bluestore
> is writing.
>
> BlueStore:
> ----------------
> root@stormeap-1:~/fio_rbd/fio/examples# ceph df
> 2016-08-17 11:00:57.936952 7f5c39a2b700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2016-08-17 11:00:57.939969 7f5c39a2b700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2016-08-17 11:00:57.941269 7f5c39a2b700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> GLOBAL:
>     SIZE     AVAIL      RAW USED     %RAW USED
>     106T     22411G       86867G         79.49
> POOLS:
>     NAME              ID     USED       %USED     MAX AVAIL     OBJECTS
>     recovery_test     1      39062G     71.49         6290G     10000003
>
>
> So, if we trust the statfs implementation of Bluestore , it is  writing ~8743 GB more.  Total data  = image size of 39062 GB * replication 2 = ~78124 GB. So, ~11.19% more.
> BTW, this is after 1MB image preconditioning only, filling with 4K blocksize will be adding more metadata.
>
> Filestore:
> -----------
> GLOBAL:
>     SIZE     AVAIL      RAW USED     %RAW USED
>     109T     34443G       78147G         69.41
> POOLS:
>     NAME              ID     USED       %USED     MAX AVAIL     OBJECTS
>     recovery_test     2      39062G     69.39        12930G     10000003
>
> So, in the similar setup , filestore is writing only ~23GB extra and
> i.e ~0.029%

This seems like a lot for bluestore.  The statfs output from bluestore should show how much of the space is bluefs vs bluestore.

Hmm, my guess is that bluestore is counting all of the space that it has given to bluefs as used, even though bluefs isn't using it.  Probably just need to make BlueStore::statfs() call BlueFs::statfs() and correct for the bluefs unused space...

sage
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux