Re: ceph df: pool stored vs bytes_used -- raw or not?

Igor Fedotov <ifedotov@xxxxxxx> · Thu, 26 Nov 2020 22:12:12 +0300

Were they intentionally marked as 'out' or this was caused for unknown 
reasons?

On 11/26/2020 10:08 PM, Dan van der Ster wrote:
Hey that's it!

I stopped the up but out OSDs (100 and 177), and now the stats are correct!

# ceph df
RAW STORAGE:
     CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
     hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
     TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62

POOLS:
     POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
     public     68     2.9 PiB     143.56M     4.3 PiB     84.55       538 TiB
     test       71      29 MiB       6.56k     1.2 GiB         0       269 TiB
     foo        72     1.2 GiB         308     3.6 GiB         0       269 TiB

On Thu, Nov 26, 2020 at 8:02 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
There are a couple gaps, yes: https://termbin.com/9mx1

What should I do?

-- dan

On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
Does "ceph osd df tree" show stats properly (I mean there are no evident
gaps like unexpected zero values) for all the daemons?

1. Anyway, I found something weird...

I created a new 1-PG pool "foo" on a different cluster and wrote some
data to it.

The stored and used are equal.

Thu 26 Nov 19:26:58 CET 2020
RAW STORAGE:
      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31

POOLS:
      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
      public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
      test       71      29 MiB       6.56k      29 MiB         0       269 TiB
      foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB

But I tried restarting the relevant three OSDs, and the bytes_used are
temporarily reported correctly:

Thu 26 Nov 19:27:00 CET 2020
RAW STORAGE:
      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62

POOLS:
      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
      public     68     2.9 PiB     143.54M     4.3 PiB     84.55       538 TiB
      test       71      29 MiB       6.56k     1.2 GiB         0       269 TiB
      foo        72     1.2 GiB         308     3.6 GiB         0       269 TiB

But then a few seconds later it's back to used == stored:

Thu 26 Nov 19:27:03 CET 2020
RAW STORAGE:
      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47

POOLS:
      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
      public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
      test       71      29 MiB       6.56k      29 MiB         0       269 TiB
      foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB

It seems to report the correct stats only when the PG is peering (so
some other transition state).
I've restarted all three relevant OSDs now -- the stats are reported
as stored == used.

2. Another data point -- I found another old cluster that reports
stored/used correctly. I have no idea what might be different about
that cluster -- we updated it just like the others.

Cheers, Dan

On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
For specific BlueStore instance you can learn relevant statfs output by

setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may
be a couple of minutes - don't remember exact statsfs poll period ).

Then grep osd log for "statfs" and/or "pool_statfs" and get the output
formatted as per the following operator (taken from src/osd/osd_types.cc):

ostream& operator<<(ostream& out, const store_statfs_t &s)
{
     out << std::hex
         << "store_statfs(0x" << s.available
         << "/0x"  << s.internally_reserved
         << "/0x"  << s.total
         << ", data 0x" << s.data_stored
         << "/0x"  << s.allocated
         << ", compress 0x" << s.data_compressed
         << "/0x"  << s.data_compressed_allocated
         << "/0x"  << s.data_compressed_original
         << ", omap 0x" << s.omap_allocated
         << ", meta 0x" << s.internal_metadata
         << std::dec
         << ")";
     return out;
}

But honestly I doubt this is BlueStore which reports incorrectly since
it doesn't care about replication.

It rather looks like lack of stats from some replicas or improper pg
replica factor processing...

Perhaps legacy vs. new pool what matters... Can you try to create a new
pool at old cluster and fill it with some data (e.g. just a single 64K
object) and check the stats?

Thanks,

Igor

On 11/26/2020 8:00 PM, Dan van der Ster wrote:
Hi Igor,

No BLUESTORE_LEGACY_STATFS warning, and
bluestore_warn_on_legacy_statfs is the default true on this (and all)
clusters.
I'm quite sure we did the statfs conversion during one of the recent
upgrades (I forget which one exactly).

# ceph tell osd.* config get bluestore_warn_on_legacy_statfs | grep -v true
#

Is there a command to see the statfs reported by an individual OSD ?
We have a mix of ~year old and recently recreated OSDs, so I could try
to see if they differ.

Thanks!

Dan

On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
Hi Dan

don't you have BLUESTORE_LEGACY_STATFS alert raised (might be silenced
by bluestore_warn_on_legacy_statfs param) for the older cluster?

Thanks,

Igor

On 11/26/2020 7:29 PM, Dan van der Ster wrote:
Hi,

Depending on which cluster I look at (all running v14.2.11), the
bytes_used is reporting raw space or stored bytes variably.

Here's a 7 year old cluster:

# ceph df -f json | jq .pools[0]
{
      "name": "volumes",
      "id": 4,
      "stats": {
        "stored": 1229308190855881,
        "objects": 294401604,
        "kb_used": 1200496280133,
        "bytes_used": 1229308190855881,
        "percent_used": 0.4401889145374298,
        "max_avail": 521125025021952
      }
}

Note that stored == bytes_used for that pool. (this is a 3x replica pool).

But here's a newer cluster (installed recently with nautilus)

# ceph df -f json  | jq .pools[0]
{
      "name": "volumes",
      "id": 1,
      "stats": {
        "stored": 680977600893041,
        "objects": 163155803,
        "kb_used": 1995736271829,
        "bytes_used": 2043633942351985,
        "percent_used": 0.23379847407341003,
        "max_avail": 2232457428467712
      }
}

In the second cluster, bytes_used is 3x stored.

Does anyone know why these are not reported consistently?
Noticing this just now, I'll update our monitoring to plot stored
rather than bytes_used from now on.

Thanks!

Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx