Re: 1 pgs inconsistent 2 scrub errors

Mio Vlahović <Mio.Vlahovic@xxxxxx> · Thu, 26 Jan 2017 12:40:02 +0000

> From: Eugen Block [mailto:eblock@xxxxxx]
> 
>  From what I understand, with a rep size of 2 the cluster can't decide
> which object is intact if one is broken, so the repair fails. If you
> had a size of 3, the cluster would see 2 intact objects an repair the
> broken one (I guess). At least we didn't have these inconsistencies
> since we increased the size to 3.

I understand. Anyway, we have a healthy cluster again :)

After few ERR in logs...
2017-01-26 06:08:48.147132 osd.3 192.168.12.150:6802/5421 129 : cluster [ERR] 10.55 shard 3: soid 10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head data_digest 0xc44df2ba != known data_digest 0xff59029 from auth shard 4
2017-01-26 06:19:55.708510 osd.3 192.168.12.150:6802/5421 130 : cluster [ERR] 10.55 deep-scrub 0 missing, 1 inconsistent objects
2017-01-26 06:19:55.708514 osd.3 192.168.12.150:6802/5421 131 : cluster [ERR] 10.55 deep-scrub 1 errors
2017-01-26 10:00:48.267405 osd.3 192.168.12.150:6806/18501 2 : cluster [ERR] 10.55 shard 3 missing 10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head
2017-01-26 10:06:56.062854 osd.3 192.168.12.150:6806/18501 3 : cluster [ERR] 10.55 scrub 1 missing, 0 inconsistent objects
2017-01-26 10:06:56.062859 osd.3 192.168.12.150:6806/18501 4 : cluster [ERR] 10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 12:54:45.748066 osd.3 192.168.12.150:6806/18501 18 : cluster [ERR] 10.55 shard 3: soid 10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head size 0 != known size 52102, missing attr _, missing attr _user.rgw.acl, missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr _user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr _user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr snapset
2017-01-26 13:02:18.014584 osd.3 192.168.12.150:6806/18501 19 : cluster [ERR] 10.55 scrub 0 missing, 1 inconsistent objects
2017-01-26 13:02:18.014607 osd.3 192.168.12.150:6806/18501 20 : cluster [ERR] 10.55 scrub 1 errors ( 1 remaining deep scrub error(s) )
2017-01-26 13:16:56.634322 osd.3 192.168.12.150:6806/18501 22 : cluster [ERR] 10.55 shard 3: soid 10:aa0c6d9c:::ef4069bf-70fb-4414-a9d9-6bf5b32608fb.34127.33_nalazi%2f201607%2fLab_7bd28004-cc9d-4039-9567-7f5c597f6d88.pdf:head data_digest 0xffffffff != known data_digest 0xff59029 from auth shard 4, size 0 != known size 52102, missing attr _, missing attr _user.rgw.acl, missing attr _user.rgw.content_type, missing attr _user.rgw.etag, missing attr _user.rgw.idtag, missing attr _user.rgw.manifest, missing attr _user.rgw.pg_ver, missing attr _user.rgw.source_zone, missing attr _user.rgw.x-amz-acl, missing attr _user.rgw.x-amz-date, missing attr snapset

We got this:
2017-01-26 13:31:04.577603 osd.3 192.168.12.150:6806/18501 23 : cluster [ERR] 10.55 repair 0 missing, 1 inconsistent objects
2017-01-26 13:31:04.596102 osd.3 192.168.12.150:6806/18501 24 : cluster [ERR] 10.55 repair 1 errors, 1 fixed

And...
# ceph -s
    cluster 2bf80721-fceb-4b63-89ee-1a5faa278493
     health HEALTH_OK
     monmap e1: 1 mons at {cephadm01=192.168.12.150:6789/0}
            election epoch 7, quorum 0 cephadm01
     osdmap e580: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v11436879: 664 pgs, 13 pools, 1011 GB data, 13900 kobjects
            2143 GB used, 2354 GB / 4497 GB avail
                 661 active+clean
                   3 active+clean+scrubbing

Your method worked! Thank you for you time and help! I will see if we can add some more disk to set the replica to 3.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com