Re: ceph df: pool stored vs bytes_used -- raw or not?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 26 Nov 2020 20:02:44 +0100

There are a couple gaps, yes: https://termbin.com/9mx1

What should I do?

-- dan

On Thu, Nov 26, 2020 at 7:52 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
>
> Does "ceph osd df tree" show stats properly (I mean there are no evident
> gaps like unexpected zero values) for all the daemons?
>
>
> > 1. Anyway, I found something weird...
> >
> > I created a new 1-PG pool "foo" on a different cluster and wrote some
> > data to it.
> >
> > The stored and used are equal.
> >
> > Thu 26 Nov 19:26:58 CET 2020
> > RAW STORAGE:
> >      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
> >      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
> >      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.31
> >
> > POOLS:
> >      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
> >      public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
> >      test       71      29 MiB       6.56k      29 MiB         0       269 TiB
> >      foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB
> >
> > But I tried restarting the relevant three OSDs, and the bytes_used are
> > temporarily reported correctly:
> >
> > Thu 26 Nov 19:27:00 CET 2020
> > RAW STORAGE:
> >      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
> >      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
> >      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.62
> >
> > POOLS:
> >      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
> >      public     68     2.9 PiB     143.54M     4.3 PiB     84.55       538 TiB
> >      test       71      29 MiB       6.56k     1.2 GiB         0       269 TiB
> >      foo        72     1.2 GiB         308     3.6 GiB         0       269 TiB
> >
> > But then a few seconds later it's back to used == stored:
> >
> > Thu 26 Nov 19:27:03 CET 2020
> > RAW STORAGE:
> >      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
> >      hdd       5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
> >      TOTAL     5.5 PiB     1.2 PiB     4.3 PiB      4.3 PiB         78.47
> >
> > POOLS:
> >      POOL       ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
> >      public     68     2.9 PiB     143.54M     2.9 PiB     78.49       538 TiB
> >      test       71      29 MiB       6.56k      29 MiB         0       269 TiB
> >      foo        72     1.2 GiB         308     1.2 GiB         0       269 TiB
> >
> > It seems to report the correct stats only when the PG is peering (so
> > some other transition state).
> > I've restarted all three relevant OSDs now -- the stats are reported
> > as stored == used.
> >
> > 2. Another data point -- I found another old cluster that reports
> > stored/used correctly. I have no idea what might be different about
> > that cluster -- we updated it just like the others.
> >
> > Cheers, Dan
> >
> > On Thu, Nov 26, 2020 at 6:22 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >> For specific BlueStore instance you can learn relevant statfs output by
> >>
> >> setting debug_bluestore to 20 and leaving OSD for 5-10 seconds (or may
> >> be a couple of minutes - don't remember exact statsfs poll period ).
> >>
> >> Then grep osd log for "statfs" and/or "pool_statfs" and get the output
> >> formatted as per the following operator (taken from src/osd/osd_types.cc):
> >>
> >> ostream& operator<<(ostream& out, const store_statfs_t &s)
> >> {
> >>     out << std::hex
> >>         << "store_statfs(0x" << s.available
> >>         << "/0x"  << s.internally_reserved
> >>         << "/0x"  << s.total
> >>         << ", data 0x" << s.data_stored
> >>         << "/0x"  << s.allocated
> >>         << ", compress 0x" << s.data_compressed
> >>         << "/0x"  << s.data_compressed_allocated
> >>         << "/0x"  << s.data_compressed_original
> >>         << ", omap 0x" << s.omap_allocated
> >>         << ", meta 0x" << s.internal_metadata
> >>         << std::dec
> >>         << ")";
> >>     return out;
> >> }
> >>
> >> But honestly I doubt this is BlueStore which reports incorrectly since
> >> it doesn't care about replication.
> >>
> >> It rather looks like lack of stats from some replicas or improper pg
> >> replica factor processing...
> >>
> >> Perhaps legacy vs. new pool what matters... Can you try to create a new
> >> pool at old cluster and fill it with some data (e.g. just a single 64K
> >> object) and check the stats?
> >>
> >>
> >> Thanks,
> >>
> >> Igor
> >>
> >> On 11/26/2020 8:00 PM, Dan van der Ster wrote:
> >>> Hi Igor,
> >>>
> >>> No BLUESTORE_LEGACY_STATFS warning, and
> >>> bluestore_warn_on_legacy_statfs is the default true on this (and all)
> >>> clusters.
> >>> I'm quite sure we did the statfs conversion during one of the recent
> >>> upgrades (I forget which one exactly).
> >>>
> >>> # ceph tell osd.* config get bluestore_warn_on_legacy_statfs | grep -v true
> >>> #
> >>>
> >>> Is there a command to see the statfs reported by an individual OSD ?
> >>> We have a mix of ~year old and recently recreated OSDs, so I could try
> >>> to see if they differ.
> >>>
> >>> Thanks!
> >>>
> >>> Dan
> >>>
> >>>
> >>> On Thu, Nov 26, 2020 at 5:50 PM Igor Fedotov <ifedotov@xxxxxxx> wrote:
> >>>> Hi Dan
> >>>>
> >>>> don't you have BLUESTORE_LEGACY_STATFS alert raised (might be silenced
> >>>> by bluestore_warn_on_legacy_statfs param) for the older cluster?
> >>>>
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Igor
> >>>>
> >>>>
> >>>> On 11/26/2020 7:29 PM, Dan van der Ster wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Depending on which cluster I look at (all running v14.2.11), the
> >>>>> bytes_used is reporting raw space or stored bytes variably.
> >>>>>
> >>>>> Here's a 7 year old cluster:
> >>>>>
> >>>>> # ceph df -f json | jq .pools[0]
> >>>>> {
> >>>>>      "name": "volumes",
> >>>>>      "id": 4,
> >>>>>      "stats": {
> >>>>>        "stored": 1229308190855881,
> >>>>>        "objects": 294401604,
> >>>>>        "kb_used": 1200496280133,
> >>>>>        "bytes_used": 1229308190855881,
> >>>>>        "percent_used": 0.4401889145374298,
> >>>>>        "max_avail": 521125025021952
> >>>>>      }
> >>>>> }
> >>>>>
> >>>>> Note that stored == bytes_used for that pool. (this is a 3x replica pool).
> >>>>>
> >>>>> But here's a newer cluster (installed recently with nautilus)
> >>>>>
> >>>>> # ceph df -f json  | jq .pools[0]
> >>>>> {
> >>>>>      "name": "volumes",
> >>>>>      "id": 1,
> >>>>>      "stats": {
> >>>>>        "stored": 680977600893041,
> >>>>>        "objects": 163155803,
> >>>>>        "kb_used": 1995736271829,
> >>>>>        "bytes_used": 2043633942351985,
> >>>>>        "percent_used": 0.23379847407341003,
> >>>>>        "max_avail": 2232457428467712
> >>>>>      }
> >>>>> }
> >>>>>
> >>>>> In the second cluster, bytes_used is 3x stored.
> >>>>>
> >>>>> Does anyone know why these are not reported consistently?
> >>>>> Noticing this just now, I'll update our monitoring to plot stored
> >>>>> rather than bytes_used from now on.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Dan
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx