H Paul,
there was a post from Sage named "Pool stats issue with upgrades to
nautilus" recently.
Perhaps that's the case if you add new OSD or repair existing one...
Thanks,
Igor
On 7/17/2019 6:29 PM, Paul Mezzanini wrote:
Sometime after our upgrade to Nautilus our disk usage statistics went off the rails wrong. I can't tell you exactly when it broke but I know that after the initial upgrade it worked at least for a bit.
Correct numbers should be something similar to: (These are copy/pasted from the autoscale-status report)
POOL SIZE
cephfs_metadata 327.1G
cold-ec 98.36T
ceph-bulk-3r 142.6T
cephfs_data 31890G
ceph-hot-2r 5276G
kgcoe-cinder 103.2T
rbd 3098
Instead, we now show:
POOL SIZE
cephfs_metadata 362.9G (correct)
cold-ec 607.2G (wrong)
ceph-bulk-3r 5186G (wrong)
cephfs_data 1654G (wrong)
ceph-hot-2r 5884G (correct I think)
kgcoe-cinder 5761G (wrong)
rbd 128.0k
`ceph fs status` reports similar numbers. cold-ec, ceph-hot-2r and cephfs_data are all cephfs data pools and cephfs_metadata is unsurprisingly, cephfs metadata. The remaining pools are all used for rbd.
Interestingly, the `ceph df` outpool for raw storage feels correct for each drive class while the pool usage is wrong:
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 6.3 PiB 5.2 PiB 1.1 PiB 1.1 PiB 17.08
nvme 175 TiB 161 TiB 14 TiB 14 TiB 7.82
nvme-meta 14 TiB 11 TiB 2.2 TiB 2.5 TiB 18.45
TOTAL 6.5 PiB 5.4 PiB 1.1 PiB 1.1 PiB 16.84
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
kgcoe-cinder 24 1.9 TiB 29.49M 5.6 TiB 0.32 582 TiB
ceph-bulk-3r 32 1.7 TiB 88.28M 5.1 TiB 0.29 582 TiB
cephfs_data 35 518 GiB 135.68M 1.6 TiB 0.09 582 TiB
cephfs_metadata 36 363 GiB 5.63M 363 GiB 3.35 3.4 TiB
rbd 37 931 B 5 128 KiB 0 582 TiB
ceph-hot-2r 50 5.7 TiB 18.63M 5.7 TiB 3.72 74 TiB
cold-ec 51 417 GiB 105.23M 607 GiB 0.02 2.1 PiB
Everything is on "ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable)" and kernel 5.0.21 or 5.0.9. I'm actually doing the patching now to pull the ceph cluster up to 5.0.21, same as the clients. I'm not really sure where to dig into this one. Everything is working fine except disk usage reporting. This also completely blows up the autoscaler.
I feel like the question is obvious but I'll state it anyway. How do I get this issue resolved?
Thanks
-paul
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfmeec@xxxxxxx
CONFIDENTIALITY NOTE: The information transmitted, including attachments, is
intended only for the person(s) or entity to which it is addressed and may
contain confidential and/or privileged material. Any review, retransmission,
dissemination or other use of, or taking of any action in reliance upon this
information by persons or entities other than the intended recipient is
prohibited. If you received this in error, please contact the sender and
destroy any copies of this information.
------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com