Re: Include BlueFS DB space in total/used stats or not?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 22 Oct 2018, Mark Nelson wrote:
> On 10/22/18 12:18 PM, Igor Fedotov wrote:
> > 
> > 
> > On 10/22/2018 7:49 PM, Sage Weil wrote:
> > > On Mon, 22 Oct 2018, Igor Fedotov wrote:
> > > > Hi folks,
> > > > 
> > > > doing the last cleanup for https://github.com/ceph/ceph/pull/19454
> > > > 
> > > > I realized that we still include the space for separate DB volume into
> > > > total
> > > > (and hence treat it as used too) space reported by BlueStore.
> > > > 
> > > > It seems we had such a discussion a while ago but unfortunately I don't
> > > > recall
> > > > the results.
> > > > 
> > > > 
> > > > See BlueStore::statfs(...)
> > > > 
> > > >   if (bluefs) {
> > > >      // include dedicated db, too, if that isn't the shared device.
> > > >      if (bluefs_shared_bdev != BlueFS::BDEV_DB) {
> > > >        buf->total += bluefs->get_total(BlueFS::BDEV_DB);
> > > >      }
> > > > 
> > > > I'm not sure if there are any rationales behind that. And have a strong
> > > > desire
> > > > to remove it from 'total' calculation.
> > > > 
> > > > Just want to share two options for the new "ceph df' output.
> > > > 
> > > > 3x OSD config: 10 Gb block device + 1Gb DB device + 1 Gb WAL device.
> > > > 
> > > > 1) DB space included.
> > > > 
> > > > GLOBAL:
> > > >      SIZE       AVAIL      USED        RAW USED     %RAW USED
> > > >      33 GiB     27 GiB     2.9 GiB      5.9 GiB         18.01
> > > > POOLS:
> > > >      NAME                  ID     STORED      OBJECTS USED
> > > > %USED     MAX
> > > > AVAIL
> > > >      cephfs_data_a         1          0 B           0         0 B
> > > > 0       8.9 GiB
> > > >      cephfs_metadata_a     2      2.2 KiB          22     384 KiB
> > > > 0       8.9 GiB
> > > > 
> > > > 2) DB space isn't included.
> > > > 
> > > > GLOBAL:
> > > >      SIZE       AVAIL      USED        RAW USED     %RAW USED
> > > >      30 GiB     27 GiB     1.9 MiB      3.0 GiB         10.01
> > > > POOLS:
> > > >      NAME                  ID     STORED      OBJECTS USED
> > > > %USED     MAX
> > > > AVAIL
> > > >      cephfs_data_a         1          0 B           0         0 B
> > > > 0       8.9 GiB
> > > >      cephfs_metadata_a     2      2.2 KiB          22     384 KiB
> > > > 0       8.9 GiB
> > > > 
> > > > So for the first case GLOBAL SIZE includes both block and DB devices
> > > > space.
> > > > RAW USED includes 3GB for separate BlueFS volumes and ~3GB permanently
> > > > reserved for BlueFS as slow device as per bluestore_bluefs_min_free
> > > > config
> > > > parameter. Not to mention space actually allocated for user data.
> > > > 
> > > > And AVAIL is equal to SIZE - RAW USED;
> > > > 
> > > > For the second option (I'm inclined to) all the numbers lack that DB
> > > > device
> > > > space and hence IMO provide more consistent picture.
> > > > 
> > > > Please note that 3x1GB allocated for WAL aren't taken into account in
> > > > both
> > > > cases.
> > > > 
> > > > So the question is what variant do we prefer? Are there any reasons to
> > > > account
> > > > for DB space here?
> > > > 
> > > > Doesn't it make sense to export BlueFS stats (total/avail) separately
> > > > which
> > > > (along with existing "internal_metadata" and "omap_allocated" fields)
> > > > allows
> > > > to build a perfect and consistent view of DB space usage if needed?
> > > I think this is the key.  The problem is that the db space is 'available'
> > > space, but only for omap (or metadata).  I think the way to complete the
> > > picture may be to start with (1), but then also expose separate
> > > data_available and omap_available values?
> > Hence if going with (1) we get something like that one day (given that DB
> > uses 1GB):
> > 
> > GLOBAL:
> >      SIZE       AVAIL      DATA_AVAIL   USED        RAW USED     %RAW USED
> >      33 GiB     29 GiB     27 GIB       2.9 GiB      5.9 GiB         18.01
> > 
> > Not very transparent IMO.
> > I'd prefer to split data and metadata usage and have something like the
> > following:
> > 
> > DATA:
> >      SIZE       AVAIL      USED        RAW USED     %RAW USED
> >      30 GiB     27 GiB     1.9 MiB      3.0 GiB         10.01
> > META:
> >      SIZE       AVAIL      USED        OMAP_USED  %USED
> >      6 GiB      5 GiB     1 GiB        256MIB     ZZZ
> > 
> 
> That would be fantastic!  I'm all for it.

I don't think it makes sense to separate it out completely like that.  
The main part of the device can be consumed by either omap or data, so you 
would end up including it in both the SIZE for data and meta.

I think it would be simpler (and more accurate) to show something 
like:

SIZE    DATA_USED  DATA_AVAIL  OMAP_USED  OMAP_AVAIL    RAW USED     %RAW USED
33 GiB      1 GiB      27 GiB      6 GiB     2.9 GiB     8.1 GiB         38.01

or similar?

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux