Hello list,
my ceph cluster was upgraded from nautilus to octopus last October,
causing snaptrims
to overload OSDs so I had to disable them (bluefs_buffered_io=false|true
didn't help).
Now I've copied data elsewhere and removed all clients and try to fix
the cluster.
Scraping it and starting over is possible, but it would be wonderful if
we could
figure out what's wrong with it...
Cheers,
Eric
Ceph health is OK, there are 18 NVMe 4TB OSDs on 4 hosts.
Is there something wrong with these key statistics in monitor databases ?
151 auth
2 config
11 health
1315 logm
86 mds_health
1 mds_metadata
691 mdsmap
326 mgr
1 mgr_command_descs
3 mgr_metadata
211 mgrstat
1 mkfs
347 mon_config_key
1 mon_sync
6 monitor
1 monitor_store
32 monmap
18 osd_metadata
1 osd_pg_creating
25198 osd_snap
1389 osdmap
661 paxos
Of the 25198 osd_snap,
- 436 "osd_snap / purged_epoch"
- 24762 "osd_snap / purged_snap"
Of the 24762 purged_snap,
- 526 for the current rbd pool
- 24232 for a cephfs data pool deleted in November, after the
migration to octopus
- 1 for other deleted pools
Yesterday, I've deleted an old 2TB rdb snapshot, resulting in 6 hours of
snaptrim,
with osd latency 8s and blocked operations as long as 115 seconds.
cpu usage was 100-150% per snaptrimming OSD process, disk usage 100% for
snaptriming OSDs.
There are now 2 new osd_snap keys in the monitors, not less !!
Recovered space is consistent with the deleted snapshot size.
Reported usage for the rbd pool is wrong: 2 images total 4.8TB result in
a pool size of 27TiB (redundancy 2/1).
I will delete remaining images and see how much is "left behind"...
ceph config dump
WHO MASK LEVEL OPTION VALUE RO
global advanced osd_pool_default_pg_autoscale_mode warn
mon advanced auth_allow_insecure_global_id_reclaim false
mon advanced mon_crush_min_required_version
firefly *
mon advanced osd_heartbeat_grace 1800
mon.* advanced mon_cluster_log_file_level warn
mgr advanced mgr/volumes/log_level
osd advanced bluefs_buffered_io false
osd advanced osd_heartbeat_grace 1800
osd advanced osd_max_trimming_pgs 2
osd advanced osd_pg_max_concurrent_snap_trims 2
osd advanced osd_snap_trim_priority 2
osd advanced osd_snap_trim_sleep 0.000000
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx