Re: Fwd: “ceph df” pool section metrics is wrong during backfill

Xiaoxi Chen <superdebuger@xxxxxxxxx> · Thu, 11 Apr 2019 00:10:31 +0800

Hmm, it seems right , I did that for one of the OSD with 2.7TB data,
then "ceph df" per pool stat increase from 9.2TB to 10TB ,which seems
roughly match with 2.7TB/ 3 = 0.9 TB.
Though the repair is  super slow...

ceph-bluestore-tool repair --path /var/lib/ceph/osd/lvs_ceph_cal-395
--log-level 30

2019-04-10 09:02:39.752 7fcb942ed0c0 -1
bluestore(/var/lib/ceph/osd/lvs_ceph_cal-395) fsck error: legacy
statfs record found, removing
2019-04-10 09:02:39.752 7fcb942ed0c0 -1
bluestore(/var/lib/ceph/osd/lvs_ceph_cal-395) fsck error: missing Pool
StatFS record for pool 4
2019-04-10 09:02:39.752 7fcb942ed0c0 -1
bluestore(/var/lib/ceph/osd/lvs_ceph_cal-395) fsck error: missing Pool
StatFS record for pool ffffffffffffffff
repair success

ID  CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META
AVAIL    %USE  VAR  PGS STATUS
395   hdd 5.45741  1.00000 5.6 TiB 2.8 TiB 2.7 TiB  69 MiB  6.1 GiB
2.8 TiB 49.81 1.16   7     up

Xiaoxi Chen <superdebuger@xxxxxxxxx> 于2019年4月10日周三 下午11:53写道：
>
> Hi Igor,
>     Thanks for trouble shooting with us.   It is acceptable that the
> 'ceph-bluestore-tool repair' is nonreversible as once "ceph osd
> require-osd-release nautilus" was set after upgrade, downgrade to
> pre-naulitus is impossible.
>     But seems we can  either make the "ceph-bluestore-tool repair" run
> automatically after require_naulitus flag was set,  and/or ,clearly
> state in the release note that make "ceph-bluestore-tool repair" a
> step of upgrade, just like ceph-volume import.   Everyone upgrade to
> Naulitus will finally have a day to hit into this issue, either when
> cap add or break fix.
>
>
> Xiaoxi
>
> Igor Fedotov <ifedotov@xxxxxxx> 于2019年4月10日周三 下午11:06写道：
> >
> > Hi Xiaoxi,
> >
> > as we learned offline currently you have a mixture of new OSDs created
> > by Nautilus and old ones created by earlier releases.
> >
> > New OSDs provide per-pool statistics in a different manner than old
> > ones. Merging both together is hardly doable so once your cluster
> > contains any OSD with new format 'df' report starts to show pool
> > statistics using new OSDs only.
> >
> > To fix the issue one has to perform 'ceph-bluestore-tool repair' command
> > for any old OSDs.
> >
> > Please note that repair is nonreversible OSD upgrade, one wouldn't be
> > able to downgrade to prior to Nautilus releases after that.
> >
> >
> > Thanks,
> >
> > Igor
> >
> > On 4/4/2019 11:48 AM, Xiaoxi Chen wrote:
> > > Hi list,
> > >
> > >
> > >      The fs_data pool was under backfilling , 1 out of 16 hosts are
> > > rebuild with same OSD id. After doing that the fs_data stored size is
> > > not correct though it is increasing.
> > >
> > >        RAW STORAGE:
> > >      CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
> > >      hdd       1.2 PiB     653 TiB     547 TiB      547 TiB         45.60
> > >      meta       25 TiB      25 TiB      40 GiB      107 GiB          0.42
> > >      ssd       219 TiB     147 TiB      72 TiB       73 TiB         33.11
> > >      TOTAL     1.4 PiB     824 TiB     619 TiB      620 TiB         42.92
> > >
> > > POOLS:
> > >      POOL           ID     STORED      OBJECTS     USED        %USED
> > >   MAX AVAIL
> > >      cache_tier      3     8.0 TiB       9.94M      24 TiB     16.80
> > >      40 TiB
> > >      fs_data         4     1.6 TiB      63.63M     4.8 TiB      1.12
> > >     143 TiB
> > >      fs_meta         5      35 GiB     343.66k      40 GiB      0.18
> > >     7.0 TiB
> > >
> > >
> > >      The RAW STORAGE by class is correct.
> > >      Any insight?
> > >
> > > Xiaoxi