Hi community, 10 months ago, we discovered issue, after removing cache tier from cluster with cluster HEALTH, and start email thread, as result - new bug was created on tracker by Samuel Just
Till that time, i'm looking for good moment to upgrade (after fix was backported to 0.94.7). And yesterday i did upgrade on my production cluster.
From 28 scrub errors, only 5 remains, so i need to fix them by ceph-objectstore-tool remove-clone-metadata subcommand.
I try to did it, but without real results... Can you please give me advice, what i'm doing wrong?
My flow was the next:
1. Identify problem PGs... - ceph health detail | grep inco | grep -v HEALTH | cut -d " " -f 2
2. Start repair for them, to collect info about errors into logs - ceph pg repair <pg_id>
After this for example, i received next records into logs
2016-07-20 00:32:10.650061 osd.56 10.12.2.5:6800/1985741 25 : cluster [INF] 2.c4 repair starts
2016-07-20 00:33:06.405136 osd.56 10.12.2.5:6800/1985741 26 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/14d
2016-07-20 00:33:06.405323 osd.56 10.12.2.5:6800/1985741 27 : cluster [ERR] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir expected clone 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/138
2016-07-20 00:33:06.405385 osd.56 10.12.2.5:6800/1985741 28 : cluster [INF] repair 2.c4 2/22ca30c4/rbd_data.e846e25a70bf7.0000000000000307/snapdir 1 missing clone(s)
2016-07-20 00:40:42.457657 osd.56 10.12.2.5:6800/1985741 29 : cluster [ERR] 2.c4 repair 2 errors, 0 fixed
So, i try to fix it with next command:
stop ceph-osd id=56
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307 remove-clone-metadata 138
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-56/ --journal-path /var/lib/ceph/osd/ceph-56/journal rbd_data.e846e25a70bf7.0000000000000307 remove-clone-metadata 14d
start ceph-osd id=56
Strange fact, that after I did this commands - i don;t receive message like (according to sources... )
cout << "Removal of clone " << cloneid << " complete" << std::endl;
cout << "Use pg repair after OSD restarted to correct stat information" << std::endl;
I received silent (no output after command, and command take about 30-35 min to execute... )
Sure, i start pg repair again after this actions... But result - same, errors still exists...
So, possible i misunderstand input format for ceph-objectstore-tool...
Please help with this.. :)
Thanks you in advance!
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com