Hi, tl;dr: something deleted the objects from the .rgw.gc and then the pgs went inconsistent. Is this normal??!! Just now we had scrub errors and resulting inconsistencies on many of the pgs belonging to our .rgw.gc pool. HEALTH_ERR 119 pgs inconsistent; 119 scrub errors pg 11.1f0 is active+clean+inconsistent, acting [35,28,4] pg 11.1f8 is active+clean+inconsistent, acting [35,28,4] pg 11.1fb is active+clean+inconsistent, acting [11,34,38] pg 11.1e0 is active+clean+inconsistent, acting [35,28,4] pg 11.1e3 is active+clean+inconsistent, acting [11,34,38] … [root@ceph-mon1 ~]# ceph osd lspools 0 data,1 metadata,2 rbd,6 volumes,7 images,9 afs,10 .rgw,11 .rgw.gc,12 .rgw.control,13 .users.uid,14 .users.email,15 .users,16 .rgw.buckets,17 .usage, On the relevant hosts, I checked what was in those directories: [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a total 20 drwxr-xr-x. 2 root root 6 Apr 16 10:48 . drwxr-xr-x. 419 root root 12288 Apr 16 11:15 .. They were all empty like that. I checked the log files: 2013-04-18 14:53:56.532054 7fe5457fb700 0 log [ERR] : 11.0 deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 14:53:56.532065 7fe5457fb700 0 log [ERR] : 11.0 deep-scrub 1 errors 2013-04-18 14:53:59.532401 7fe5457fb700 0 log [ERR] : 11.8 deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 14:53:59.532411 7fe5457fb700 0 log [ERR] : 11.8 deep-scrub 1 errors 2013-04-18 14:54:01.532602 7fe5457fb700 0 log [ERR] : 11.10 deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 14:54:01.532614 7fe5457fb700 0 log [ERR] : 11.10 deep-scrub 1 errors 2013-04-18 14:54:02.532839 7fe5457fb700 0 log [ERR] : 11.18 deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 14:54:02.532848 7fe5457fb700 0 log [ERR] : 11.18 deep-scrub 1 errors … 2013-04-18 14:57:14.554431 7fe5457fb700 0 log [ERR] : 11.1f0 deep-scrub stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 14:57:14.554438 7fe5457fb700 0 log [ERR] : 11.1f0 deep-scrub 1 errors So it looks like something deleted all the objects from those pg directories. Next I tried a repair: [root@ceph-mon1 ~]# ceph pg repair 11.1f0 instructing pg 11.1f0 on osd.35 to repair [root@ceph-mon1 ~]# ceph -w … 2013-04-18 15:19:23.676728 osd.35 [ERR] 11.1f0 repair stat mismatch, got 0/3 objects, 0/0 clones, 0/0 bytes. 2013-04-18 15:19:23.676783 osd.35 [ERR] 11.1f0 repair 1 errors, 1 fixed [root@ceph-mon1 ~]# ceph pg deep-scrub 11.1f0 instructing pg 11.1f0 on osd.35 to deep-scrub [root@ceph-mon1 ~]# ceph -w … 2013-04-18 15:20:21.769446 mon.0 [INF] pgmap v31714: 3808 pgs: 3690 active+clean, 118 active+clean+inconsistent; 73284 MB data, 276 GB used, 44389 GB / 44665 GB avail 2013-04-18 15:20:17.677058 osd.35 [INF] 11.1f0 deep-scrub ok So indeed the repair "fixed" the problem (now there are only 118 inconsistent pgs, down from 119). And note that there is still nothing in the directory for that pg, as expected: [root@lxfsrc4906 ~]# ls -l //var/lib/ceph/osd/ceph-35/current/11.1f0_head/ -a total 20 drwxr-xr-x. 2 root root 6 Apr 16 10:48 . drwxr-xr-x. 419 root root 12288 Apr 16 11:15 .. So my question is: can anyone explain what happened here? It seems that something deleted the objects from the .rgw.gc pool (as one would expect) but the pgs were left inconsistent afterwards. Best Regards, Dan van der Ster CERN IT _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com