Hi Christian, On Fri, 16 Mar 2012, Christian Brunner wrote: > This is probably going in the same direction as the report by Oliver Francke. > > Ceph is reporting an inconsistent PG. Running a scrub on the PG gave > me the folling messages: > > 2012-03-16 12:55:17.287415 log 2012-03-16 12:55:12.179529 osd.14 > 10.255.0.63:6818/2014 34280 : [ERR] 2.117 osd.0: soid > 818bc117/vol-FTU86AEJ.rbd/headsize 0 != known size 112 > 2012-03-16 12:55:17.287415 log 2012-03-16 12:55:12.179555 osd.14 > 10.255.0.63:6818/2014 34281 : [ERR] 2.117 scrub 0 missing, 1 > inconsistent objects > 2012-03-16 12:55:17.287415 log 2012-03-16 12:55:12.181397 osd.14 > 10.255.0.63:6818/2014 34282 : [ERR] scrub 2.117 > 818bc117/vol-FTU86AEJ.rbd/head on disk size (112) does not match > object info size (0) > 2012-03-16 12:55:17.287415 log 2012-03-16 12:55:12.181925 osd.14 > 10.255.0.63:6818/2014 34283 : [ERR] 2.117 scrub stat mismatch, got > 956/955 objects, 0/0 clones, 3951263856/3951263744 bytes. > 2012-03-16 12:55:17.287415 log 2012-03-16 12:55:12.181947 osd.14 > 10.255.0.63:6818/2014 34284 : [ERR] 2.117 scrub 2 errors > > > A "PG repair" fixed one error: > > 2012-03-16 12:56:27.288690 log 2012-03-16 12:56:21.598432 osd.14 > 10.255.0.63:6818/2014 34285 : [ERR] 2.117 osd.0: soid > 818bc117/vol-FTU86AEJ.rbd/headsize 0 != known size 112 > 2012-03-16 12:56:27.288690 log 2012-03-16 12:56:21.598457 osd.14 > 10.255.0.63:6818/2014 34286 : [ERR] 2.117 repair 0 missing, 1 > inconsistent objects > 2012-03-16 12:56:27.288690 log 2012-03-16 12:56:21.600277 osd.14 > 10.255.0.63:6818/2014 34287 : [ERR] repair 2.117 > 818bc117/vol-FTU86AEJ.rbd/head on disk size (112) does not match > object info size (0) > 2012-03-16 12:56:27.288690 log 2012-03-16 12:56:21.600805 osd.14 > 10.255.0.63:6818/2014 34288 : [ERR] 2.117 repair stat mismatch, got > 956/955 objects, 0/0 clones, 3951263856/3951263744 bytes. > 2012-03-16 12:56:27.288690 log 2012-03-16 12:56:21.600849 osd.14 > 10.255.0.63:6818/2014 34289 : [ERR] 2.117 repair 2 errors, 1 fixed This looks like #2080, which I've only just managed to reproduce with logs. I'd post what I find to that bug. sage > > On the filesystem (XFS) I can see the corresponding file: > > # ls -l /ceph/osd.014/current/2.117_head/DIR_7/DIR_1/DIR_1/vol-FTU86AEJ.rbd__head_818BC117 > -rw-r--r-- 1 root root 112 Mar 1 20:32 vol-FTU86AEJ.rbd__head_818BC117 > > and I can read the object with rbd info: > > # rbd info vol-FTU86AEJ > rbd image 'vol-FTU86AEJ': > size 102400 MB in 25600 objects > order 22 (4096 KB objects) > block_name_prefix: rb.0.32 > parent: (pool -1) > > What I do not understand, is the fact that ceph seems to think that > the object should not exist any longer. > > Any hint's on how to proceed? - Please note that I can do only limited > testing, because the cluster is in production. > > Thanks, > Christian > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html