Dear List,since my update yesterday from 14.2.18 to 14.2.20 i got an unhealthy cluster. As I remember right, it appeared after rebooting the second server. They are 7 missing objects from pgs of a cache pool (pool 3). This pool is now changed writeback to proxy and i'm not able to flush all objects.
root@scvirt06:/home/urzadmin/ceph_issue# ceph -s cluster: id: 5349724e-fa96-4fd6-8e44-8da2a39253f7 health: HEALTH_ERR 7/15893342 objects unfound (0.000%) Possible data damage: 7 pgs recovery_unfoundDegraded data redundancy: 21/47680026 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized
client is using insecure global_id reclaim mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum scvirt03,scvirt06,scvirt01 (age 19h) mgr: scvirt04(active, since 21m), standbys: scvirt03, scvirt02 mds: scfs:1 {0=scvirt04=up:active} 1 up:standby-replay 1 up:standby osd: 54 osds: 54 up (since 17m), 54 in (since 10w); 7 remapped pgs task status: scrub status: mds.scvirt03: idle data: pools: 5 pools, 704 pgs objects: 15.89M objects, 49 TiB usage: 139 TiB used, 145 TiB / 285 TiB avail pgs: 21/47680026 objects degraded (0.000%) 7/15893342 objects unfound (0.000%) 694 active+clean 7 active+recovery_unfound+undersized+degraded+remapped 3 active+clean+scrubbing+deep io: client: 3.7 MiB/s rd, 6.6 MiB/s wr, 40 op/s rd, 31 op/s wr my cluster: scvirt01 - mon,osds scvirt02 - mgr,osds scvirt03 - mon,mgr,mds,osds scvirt04 - mgr,mds,osds scvirt05 - osds scvirt06 - mon,mds,osds log of osd.49: root@scvirt03:/home/urzadmin# tail -f /var/log/ceph/ceph-osd.49.log AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0Cumulative compaction: 0.64 GB write, 0.01 MB/s write, 0.54 GB read, 0.01 MB/s read, 6.5 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
** File Read Latency Histogram By Level [default] **2021-06-24 08:53:08.865 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:08.865 7f88a505f700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88a9067700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:777] ------- DUMPING STATS -------
2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:778] ** DB Stats ** Uptime(secs): 85202.3 total, 600.0 intervalCumulative writes: 1148K writes, 8640K keys, 1148K commit groups, 1.0 writes per commit group, ingest: 1.24 GB, 0.01 MB/s Cumulative WAL: 1148K writes, 546K syncs, 2.10 writes per sync, written: 1.24 GB, 0.01 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percentInterval writes: 369 writes, 1758 keys, 369 commit groups, 1.0 writes per commit group, ingest: 0.41 MB, 0.00 MB/s Interval WAL: 369 writes, 155 syncs, 2.37 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent ** Compaction Stats [default] **Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------L0 3/0 104.40 MB 0.8 0.0 0.0 0.0 0.2 0.2 0.0 1.0 0.0 67.8 2.89 2.70 6 0.482 0 0 L1 2/0 131.98 MB 0.5 0.2 0.1 0.1 0.2 0.1 0.0 1.8 149.9 120.9 1.53 1.41 1 1.527 2293K 140K L2 16/0 871.57 MB 0.3 0.3 0.1 0.3 0.3 -0.0 0.0 5.2 158.1 132.3 2.05 1.93 1 2.052 3997K 1089K Sum 21/0 1.08 GB 0.0 0.5 0.2 0.4 0.6 0.2 0.0 3.3 85.5 100.8 6.47 6.03 8 0.809 6290K 1229K Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0
If I run ceph pg repair 3.1e it doesn't change anything and i do not understand why these pgs are undersized. All OSDs are up. ceph.conf: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 10.10.144.0/24 filestore_xattr_use_omap = true fsid = 5349724e-fa96-4fd6-8e44-8da2a39253f7 mon_allow_pool_delete = true mon_cluster_log_file_level = info mon_host = 172.26.8.151,172.26.8.153,172.26.8.156 osd_journal_size = 5120 osd_pool_default_min_size = 1 public_network = 172.26.8.128/26 [client] keyring = /etc/pve/priv/$cluster.$name.keyring [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring [mds.scvirt03] host = scvirt03 mds_standby_for_rank = 0 mds_standby_replay = true [mds.scvirt04] host = scvirt04 mds standby for name = pve [mds.scvirt06] host = scvirt06 mds_standby_for_rank = 0 mds_standby_replay = true [mon.scvirt01] public_addr = 172.26.8.151 [mon.scvirt03] public_addr = 172.26.8.153 [mon.scvirt06] public_addr = 172.26.8.156 ceph health detail:HEALTH_ERR 7/15893333 objects unfound (0.000%); Possible data damage: 7 pgs recovery_unfound; Degraded data redundancy: 21/47679999 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized; client is using insecure global_id reclaim; mons are allowing insecure global_id reclaim
OBJECT_UNFOUND 7/15893333 objects unfound (0.000%) pg 3.1e has 1 unfound objects pg 3.1f has 1 unfound objects pg 3.1b has 1 unfound objects pg 3.15 has 1 unfound objects pg 3.16 has 1 unfound objects pg 3.b has 1 unfound objects pg 3.9 has 1 unfound objects PG_DAMAGED Possible data damage: 7 pgs recovery_unfoundpg 3.9 is active+recovery_unfound+undersized+degraded+remapped, acting [49,52], 1 unfound pg 3.b is active+recovery_unfound+undersized+degraded+remapped, acting [43,52], 1 unfound pg 3.15 is active+recovery_unfound+undersized+degraded+remapped, acting [44,52], 1 unfound pg 3.16 is active+recovery_unfound+undersized+degraded+remapped, acting [43,51], 1 unfound pg 3.1b is active+recovery_unfound+undersized+degraded+remapped, acting [43,52], 1 unfound pg 3.1e is active+recovery_unfound+undersized+degraded+remapped, acting [49,51], 1 unfound pg 3.1f is active+recovery_unfound+undersized+degraded+remapped, acting [48,51], 1 unfound PG_DEGRADED Degraded data redundancy: 21/47679999 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized pg 3.9 is stuck undersized for 64516.343966, current state active+recovery_unfound+undersized+degraded+remapped, last acting [49,52] pg 3.b is stuck undersized for 64516.351507, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,52] pg 3.15 is stuck undersized for 64521.368841, current state active+recovery_unfound+undersized+degraded+remapped, last acting [44,52] pg 3.16 is stuck undersized for 64516.351599, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,51] pg 3.1b is stuck undersized for 64517.427120, current state active+recovery_unfound+undersized+degraded+remapped, last acting [43,52] pg 3.1e is stuck undersized for 64521.369635, current state active+recovery_unfound+undersized+degraded+remapped, last acting [49,51] pg 3.1f is stuck undersized for 64517.426392, current state active+recovery_unfound+undersized+degraded+remapped, last acting [48,51]
AUTH_INSECURE_GLOBAL_ID_RECLAIM client is using insecure global_id reclaimclient.admin at 172.26.8.154:0/3925203408 is using insecure global_id reclaim mds.scvirt04 at [v2:172.26.8.154:6836/3778505565,v1:172.26.8.154:6837/3778505565] is using insecure global_id reclaim AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED mons are allowing insecure global_id reclaim
mon.scvirt03 has auth_allow_insecure_global_id_reclaim set to true mon.scvirt06 has auth_allow_insecure_global_id_reclaim set to true mon.scvirt01 has auth_allow_insecure_global_id_reclaim set to true ceph osd tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 284.51312 root default -2 48.75215 host scvirt01 0 hdd 9.09560 osd.0 up 1.00000 1.00000 3 hdd 9.09560 osd.3 up 1.00000 1.00000 6 hdd 9.09560 osd.6 up 1.00000 1.00000 9 hdd 9.09560 osd.9 up 1.00000 1.00000 12 hdd 9.09560 osd.12 up 1.00000 1.00000 42 nvme 0.97029 osd.42 up 1.00000 1.00000 43 nvme 0.97029 osd.43 up 1.00000 1.00000 44 nvme 0.97029 osd.44 up 1.00000 1.00000 37 ssd 0.36330 osd.37 up 1.00000 1.00000 -3 48.75215 host scvirt02 1 hdd 9.09560 osd.1 up 1.00000 1.00000 4 hdd 9.09560 osd.4 up 1.00000 1.00000 7 hdd 9.09560 osd.7 up 1.00000 1.00000 10 hdd 9.09560 osd.10 up 1.00000 1.00000 13 hdd 9.09560 osd.13 up 1.00000 1.00000 45 nvme 0.97029 osd.45 up 1.00000 1.00000 46 nvme 0.97029 osd.46 up 1.00000 1.00000 47 nvme 0.97029 osd.47 up 1.00000 1.00000 38 ssd 0.36330 osd.38 up 1.00000 1.00000 -4 48.75224 host scvirt03 2 hdd 9.09569 osd.2 up 1.00000 1.00000 5 hdd 9.09560 osd.5 up 1.00000 1.00000 8 hdd 9.09560 osd.8 up 1.00000 1.00000 11 hdd 9.09560 osd.11 up 1.00000 1.00000 14 hdd 9.09560 osd.14 up 1.00000 1.00000 48 nvme 0.97029 osd.48 up 1.00000 1.00000 49 nvme 0.97029 osd.49 up 1.00000 1.00000 50 nvme 0.97029 osd.50 up 1.00000 1.00000 39 ssd 0.36330 osd.39 up 1.00000 1.00000 -9 56.75706 host scvirt04 15 hdd 9.09560 osd.15 up 1.00000 1.00000 17 hdd 9.09560 osd.17 up 1.00000 1.00000 20 hdd 9.09560 osd.20 up 1.00000 1.00000 22 hdd 9.09560 osd.22 up 1.00000 1.00000 23 hdd 9.09560 osd.23 up 1.00000 1.00000 25 hdd 3.63860 osd.25 up 1.00000 1.00000 26 hdd 3.63860 osd.26 up 1.00000 1.00000 27 hdd 3.63860 osd.27 up 1.00000 1.00000 40 ssd 0.36330 osd.40 up 1.00000 1.00000 -11 56.75706 host scvirt05 16 hdd 9.09560 osd.16 up 1.00000 1.00000 18 hdd 9.09560 osd.18 up 1.00000 1.00000 19 hdd 9.09560 osd.19 up 1.00000 1.00000 21 hdd 9.09560 osd.21 up 1.00000 1.00000 24 hdd 9.09560 osd.24 up 1.00000 1.00000 28 hdd 3.63860 osd.28 up 1.00000 1.00000 29 hdd 3.63860 osd.29 up 1.00000 1.00000 30 hdd 3.63860 osd.30 up 1.00000 1.00000 41 ssd 0.36330 osd.41 up 1.00000 1.00000 -13 24.74245 host scvirt06 31 hdd 3.63860 osd.31 up 1.00000 1.00000 32 hdd 3.63860 osd.32 up 1.00000 1.00000 33 hdd 3.63860 osd.33 up 1.00000 1.00000 34 hdd 3.63860 osd.34 up 1.00000 1.00000 35 hdd 3.63860 osd.35 up 1.00000 1.00000 36 hdd 3.63860 osd.36 up 1.00000 1.00000 51 nvme 0.97029 osd.51 up 1.00000 1.00000 52 nvme 0.97029 osd.52 up 1.00000 1.00000 53 nvme 0.97029 osd.53 up 1.00000 1.00000 Regards, Vadim -- Vadim Bulst Universität Leipzig / URZ 04109 Leipzig, Augustusplatz 10 phone: +49-341-97-33380 mail: vadim.bulst@xxxxxxxxxxxxxx
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx