On 10/22/2018 10:30 PM, Mark Nelson wrote:
On 10/22/18 2:12 PM, Sage Weil wrote:
On Mon, 22 Oct 2018, Mark Nelson wrote:
On 10/22/18 12:18 PM, Igor Fedotov wrote:
On 10/22/2018 7:49 PM, Sage Weil wrote:
On Mon, 22 Oct 2018, Igor Fedotov wrote:
Hi folks,
doing the last cleanup for https://github.com/ceph/ceph/pull/19454
I realized that we still include the space for separate DB volume
into
total
(and hence treat it as used too) space reported by BlueStore.
It seems we had such a discussion a while ago but unfortunately I
don't
recall
the results.
See BlueStore::statfs(...)
if (bluefs) {
// include dedicated db, too, if that isn't the shared device.
if (bluefs_shared_bdev != BlueFS::BDEV_DB) {
buf->total += bluefs->get_total(BlueFS::BDEV_DB);
}
I'm not sure if there are any rationales behind that. And have a
strong
desire
to remove it from 'total' calculation.
Just want to share two options for the new "ceph df' output.
3x OSD config: 10 Gb block device + 1Gb DB device + 1 Gb WAL device.
1) DB space included.
GLOBAL:
SIZE AVAIL USED RAW USED %RAW USED
33 GiB 27 GiB 2.9 GiB 5.9 GiB 18.01
POOLS:
NAME ID STORED OBJECTS USED
%USED MAX
AVAIL
cephfs_data_a 1 0 B 0 0 B
0 8.9 GiB
cephfs_metadata_a 2 2.2 KiB 22 384 KiB
0 8.9 GiB
2) DB space isn't included.
GLOBAL:
SIZE AVAIL USED RAW USED %RAW USED
30 GiB 27 GiB 1.9 MiB 3.0 GiB 10.01
POOLS:
NAME ID STORED OBJECTS USED
%USED MAX
AVAIL
cephfs_data_a 1 0 B 0 0 B
0 8.9 GiB
cephfs_metadata_a 2 2.2 KiB 22 384 KiB
0 8.9 GiB
So for the first case GLOBAL SIZE includes both block and DB devices
space.
RAW USED includes 3GB for separate BlueFS volumes and ~3GB
permanently
reserved for BlueFS as slow device as per bluestore_bluefs_min_free
config
parameter. Not to mention space actually allocated for user data.
And AVAIL is equal to SIZE - RAW USED;
For the second option (I'm inclined to) all the numbers lack that DB
device
space and hence IMO provide more consistent picture.
Please note that 3x1GB allocated for WAL aren't taken into
account in
both
cases.
So the question is what variant do we prefer? Are there any
reasons to
account
for DB space here?
Doesn't it make sense to export BlueFS stats (total/avail)
separately
which
(along with existing "internal_metadata" and "omap_allocated"
fields)
allows
to build a perfect and consistent view of DB space usage if needed?
I think this is the key. The problem is that the db space is
'available'
space, but only for omap (or metadata). I think the way to
complete the
picture may be to start with (1), but then also expose separate
data_available and omap_available values?
Hence if going with (1) we get something like that one day (given
that DB
uses 1GB):
GLOBAL:
SIZE AVAIL DATA_AVAIL USED RAW USED
%RAW USED
33 GiB 29 GiB 27 GIB 2.9 GiB 5.9
GiB 18.01
Not very transparent IMO.
I'd prefer to split data and metadata usage and have something like
the
following:
DATA:
SIZE AVAIL USED RAW USED %RAW USED
30 GiB 27 GiB 1.9 MiB 3.0 GiB 10.01
META:
SIZE AVAIL USED OMAP_USED %USED
6 GiB 5 GiB 1 GiB 256MIB ZZZ
That would be fantastic! I'm all for it.
I don't think it makes sense to separate it out completely like that.
The main part of the device can be consumed by either omap or data,
so you
would end up including it in both the SIZE for data and meta.
I think it would be simpler (and more accurate) to show something
like:
SIZE DATA_USED DATA_AVAIL OMAP_USED OMAP_AVAIL RAW USED
%RAW USED
33 GiB 1 GiB 27 GiB 6 GiB 2.9 GiB 8.1
GiB 38.01
or similar?
sage
On reflection the first questions I often get asked are "What's the
difference between the DB and the WAL?" and "How big should the DB and
WAL Partitions be?" Usually when people hear that the WAL is
relatively small they lose interest in it, but a semi-common followup
question is "How do I tell how full the DB partition is?" I'm not
sure how many of our users really understand or care about
OMAP/onodes/blobs/extents/etc. I think it's more about trying to
verify that they've provision a reasonable amount of flash storage per
node and they want us to give them some metric by which to define
"reasonable". Right now that metric seems to be "DB fits entirely on
flash".
Yeah, and this makes me dream of 'df' output which shows whether "DB
fits entirely on flash"
Mark