Complementing the information: I'm using mimic (13.2) on the cluster. I noticed that during the PG repair process the entire cluster was extremely slow, however, there was no overhead on the OSD nodes. The load of these nodes, which in normal production is between 10.00 and 20.00 was less than 5. When repair is finished (after 4 hours), the cluster backed to normal. Is this result expected? Em ter., 27 de abr. de 2021 às 14:16, Gesiel Galvão Bernardes < gesiel.bernardes@xxxxxxxxx> escreveu: > Hi, > > I have 3 pools, where I use it exclusively for RBD images. 2 They are > mirrored and one is an erasure code. It turns out that today I received the > warning that a PG was inconsistent in the pool erasure, and then I ran > "ceph pg repair <pg>". It turns out that after that the entire cluster > became extremely slow, to the point that no VM works. > > > This is the output of "ceph -s": > # ceph -s > cluster: > id: 4ea72929-6f9e-453a-8cd5-bb0712f6b874 > health: HEALTH_ERR > 1 scrub errors > Possible data damage: 1 pg inconsistent, 1 pg repair > > services: > mon: 2 daemons, cmonitor quorum, cmonitor2 > mgr: cmonitor (active), standbys: cmonitor2 > osd: 87 osds: 87 up, 87 in > tcmu-runner: 10 active daemons > > date: > pools: 7 pools, 3072 pgs > objects: 30.00 M objects, 113 TiB > usage: 304 TiB used, 218 TiB / 523 TiB avail > pgs: 3063 active + clean > 8 active + clean + scrubbing + deep > 1 active + clean + scrubbing + deep + inconsistent + repair > > io: > client: 24 MiB / s rd, 23 MiB / s wr, 629 op / s rd, 519 op / s wr > cache: 5.9 MiB / s flush, 35 MiB / s evict, 9 op / s promote > > Does anyone have any idea how to make it available again? > > Regards, > Gesiel > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx