This was a weird one. Eventually 2/3 files were the correct size and 1 of them remained incorrect. At that point, I just followed the normal manual repair process from the documentation at http://ceph.com/geen-categorie/ceph-manually-repair-object/ .
Is it possible that the journal just hadn't flushed to disk yet? I thought there was a timeout where the journal would flush even if it were not full after some sensible amount of time.
On Fri, May 12, 2017 at 11:51 AM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
I have a cluster with 1 inconsistent pg. I have attempted the following steps with no luck. What should my next move be?1. executed ceph health detail to determine what pg was inconsistent[ceph-admin@admin libr-cluster]$ ceph health detailHEALTH_ERR 1 pgs inconsistent; 1 scrub errorspg 1.959 is active+clean+inconsistent, acting [69,252,127]1 scrub errors2. executed ceph pg repair 1.9593. after nothing happening for quite a while I decided to dig into it a bit. What strikes me most odd is that all of these files seem to be consistent using size, md5, and sha256. It's also a little concerning that they are all 0 for size.[root@osd5 ceph-admin]# rados list-inconsistent-pg cephfs_data["1.959"][root@osd5 ceph-admin]# rados list-inconsistent-pg cephfs_metadata[][root@osd5 ceph-admin]# rados list-inconsistent-pg rbd[][root@osd5 ceph-admin]# rados list-inconsistent-pg vmware_ecpool[][root@osd5 ceph-admin]# rados list-inconsistent-pg vmware_cache[][root@osd5 ceph-admin]# rados list-inconsistent-obj 1.959 --format=json-pretty{"epoch": 178113,"inconsistents": []}[root@osd5 ceph-admin]# grep -Hn 'ERR' /var/log/ceph/ceph-osd.69.log[root@osd5 ceph]# zgrep -Hn 'ERR' ./ceph-osd.69.log-*./ceph-osd.69.log-20170512.gz:717:2017-05-11 09:23:11.734142 7ff46cbe4700 -1 log_channel(cluster) log [ERR] : scrub 1.959 1:9a97a372:::10004313b01. 00000004:head on disk size (0) does not match object info size (1417216) adjusted for ondisk to (1417216) ./ceph-osd.69.log-20170512.gz:785:2017-05-11 09:26:02.877409 7ff46a3df700 -1 log_channel(cluster) log [ERR] : 1.959 scrub 1 errors [root@osd0 ceph]# grep -Hn 'ERR' ./ceph-osd.127.log[root@osd0 ceph]# zgrep -Hn 'ERR' ./ceph-osd.127.log-*[root@osd11 ceph-admin]# grep -Hn 'ERR' /var/log/ceph/ceph-osd.252.log[root@osd11 ceph-admin]# zgrep -Hn 'ERR' /var/log/ceph/ceph-osd.252.log-* [root@osd5 ceph]# find /var/lib/ceph/osd/ceph-69/current/1.959_head/ -name '10004313b01.00000004*' -ls 2737776487 0 -rw-r--r-- 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-69/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd5 ceph-admin]# md5sum /var/lib/ceph/osd/ceph-69/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 d41d8cd98f00b204e9800998ecf8427e /var/lib/ceph/osd/ceph-69/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd5 ceph-admin]# sha256sum /var/lib/ceph/osd/ceph-69/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852 b855 /var/lib/ceph/osd/ceph-69/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd5 ceph-admin]# ls -l /var/lib/ceph/osd/ceph-69/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 -rw-r--r--. 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-69/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd0 ceph-admin]# find /var/lib/ceph/osd/ceph-127/current/1.959_head/ -name '10004313b01.00000004*' -ls 2684661064 0 -rw-r--r-- 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-127/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd0 ceph-admin]# md5sum /var/lib/ceph/osd/ceph-127/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 d41d8cd98f00b204e9800998ecf8427e /var/lib/ceph/osd/ceph-127/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd0 ceph-admin]# sha256sum /var/lib/ceph/osd/ceph-127/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852 b855 /var/lib/ceph/osd/ceph-127/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd0 ceph-admin]# ls -l /var/lib/ceph/osd/ceph-127/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 -rw-r--r--. 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-127/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd11 ceph-admin]# find /var/lib/ceph/osd/ceph-252/current/1.959_head/ -name '10004313b01.00000004*' -ls 1650915243 0 -rw-r--r-- 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-252/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd11 ceph-admin]# md5sum /var/lib/ceph/osd/ceph-252/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 d41d8cd98f00b204e9800998ecf8427e /var/lib/ceph/osd/ceph-252/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd11 ceph-admin]# sha256sum /var/lib/ceph/osd/ceph-252/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852 b855 /var/lib/ceph/osd/ceph-252/ current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [root@osd11 ceph-admin]# ls -l /var/lib/ceph/osd/ceph-252/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 -rw-r--r--. 1 ceph ceph 0 May 10 07:01 /var/lib/ceph/osd/ceph-252/current/1.959_head/DIR_9/DIR_ 5/DIR_9/DIR_E/DIR_5/ 10004313b01.00000004__head_ 4EC5E959__1 [ceph-admin@admin libr-cluster]$ ceph statuscluster 6f91f60c-7bc0-4aaa-a136-4a90851fbe10 health HEALTH_ERR1 pgs inconsistent1 scrub errorsmonmap e17: 5 mons at {mon0=10.124.103.60:6789/0,mon1=10.124.103.61:6789/0, }mon2=10.124.103.62:6789/0, osd2=10.124.103.72:6789/0, osd3=10.124.103.73:6789/0 election epoch 454, quorum 0,1,2,3,4 mon0,mon1,mon2,osd2,osd3fsmap e7005: 1/1/1 up {0=mds0=up:active}, 1 up:standbyosdmap e178115: 235 osds: 235 up, 235 inflags sortbitwise,require_jewel_osdspgmap v21842842: 5892 pgs, 5 pools, 305 TB data, 119 Mobjects917 TB used, 364 TB / 1282 TB avail5863 active+clean16 active+clean+scrubbing+deep12 active+clean+scrubbing1 active+clean+inconsistentclient io 4076 kB/s rd, 633 kB/s wr, 15 op/s rd, 58 op/s wrThanks for any advice!-Brady Deetz
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com