Hi everyone, All current Nautilus releases have an issue where deploying a single new (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization stats reported by ``ceph df``. Until all OSDs have been reprovisioned or updated (via ``ceph-bluestore-tool repair``), the pool stats will show values that are lower than the true value. A fix is in the works but will not appear until 14.2.3. Users who have upgraded to Nautilus (or are considering upgrading) may want to delay provisioning new OSDs until the fix is available in the next release. This issue will only affect you if: - You started with a pre-nautilus cluster and upgraded - You then provision one or more new BlueStore OSDs, or run 'ceph-bluestore-tool repair' on an upgraded OSD. The symptom is that the pool stats from 'ceph df' are too small. For example, the pre-upgrade stats on our test cluster were ... POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL data 0 63 TiB 44.59M 63 TiB 30.21 48 TiB ... but when one OSD was updated it changed to POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL data 0 558 GiB 43.50M 1.7 TiB 1.22 45 TiB The root cause is that, starting with Nautilus, BlueStore maintains per-pool usage stats, but it requires a slight on-disk format change; upgraded OSDs won't have the new stats until you run a ceph-bluestore-tool repair. The problem is that the mon starts using the new stats as soon as *any* OSDs are reporting per-pool stats (instead of waiting until *all* OSDs are doing so). To avoid the issue, either - do not provision new BlueStore OSDs after the upgrade, or - update all OSDs to keep new per-pool stats. An existing BlueStore OSD can be converted with systemctl stop ceph-osd@$N ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-$N systemctl start ceph-osd@$N Note that FileStore does not support the new per-pool stats at all, so if there are filestore OSDs in your cluster there is no workaround that doesn't involve replacing the filestore OSDs with bluestore. A fix[1] is working it's way through QA and will appear in 14.2.3; it won't quite make the 14.2.2 release. sage [1] https://github.com/ceph/ceph/pull/28978 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com