Also, was there at any point a power failure/power cycle event, perhaps on osd 56? -Sam On Thu, Aug 20, 2015 at 9:23 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: > Ok, you appear to be using a replicated cache tier in front of a > replicated base tier. Please scrub both inconsistent pgs and post the > ceph.log from before when you started the scrub until after. Also, > what command are you using to take snapshots? > -Sam > > On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor > <igor.voloshanenko@xxxxxxxxx> wrote: >> Hi Samuel, we try to fix it in trick way. >> >> we check all rbd_data chunks from logs (OSD) which are affected, then query >> rbd info to compare which rbd consist bad rbd_data, after that we mount this >> rbd as rbd0, create empty rbd, and DD all info from bad volume to new one. >> >> But after that - scrub errors growing... Was 15 errors.. .Now 35... We laos >> try to out OSD which was lead, but after rebalancing this 2 pgs still have >> 35 scrub errors... >> >> ceph osd getmap -o <outfile> - attached >> >> >> 2015-08-18 18:48 GMT+03:00 Samuel Just <sjust@xxxxxxxxxx>: >>> >>> Is the number of inconsistent objects growing? Can you attach the >>> whole ceph.log from the 6 hours before and after the snippet you >>> linked above? Are you using cache/tiering? Can you attach the osdmap >>> (ceph osd getmap -o <outfile>)? >>> -Sam >>> >>> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor >>> <igor.voloshanenko@xxxxxxxxx> wrote: >>> > ceph - 0.94.2 >>> > Its happen during rebalancing >>> > >>> > I thought too, that some OSD miss copy, but looks like all miss... >>> > So any advice in which direction i need to go >>> > >>> > 2015-08-18 14:14 GMT+03:00 Gregory Farnum <gfarnum@xxxxxxxxxx>: >>> >> >>> >> From a quick peek it looks like some of the OSDs are missing clones of >>> >> objects. I'm not sure how that could happen and I'd expect the pg >>> >> repair to handle that but if it's not there's probably something >>> >> wrong; what version of Ceph are you running? Sam, is this something >>> >> you've seen, a new bug, or some kind of config issue? >>> >> -Greg >>> >> >>> >> On Tue, Aug 18, 2015 at 6:27 AM, Voloshanenko Igor >>> >> <igor.voloshanenko@xxxxxxxxx> wrote: >>> >> > Hi all, at our production cluster, due high rebalancing ((( we have 2 >>> >> > pgs in >>> >> > inconsistent state... >>> >> > >>> >> > root@temp:~# ceph health detail | grep inc >>> >> > HEALTH_ERR 2 pgs inconsistent; 18 scrub errors >>> >> > pg 2.490 is active+clean+inconsistent, acting [56,15,29] >>> >> > pg 2.c4 is active+clean+inconsistent, acting [56,10,42] >>> >> > >>> >> > From OSD logs, after recovery attempt: >>> >> > >>> >> > root@test:~# ceph pg dump | grep -i incons | cut -f 1 | while read i; >>> >> > do >>> >> > ceph pg repair ${i} ; done >>> >> > dumped all in format plain >>> >> > instructing pg 2.490 on osd.56 to repair >>> >> > instructing pg 2.c4 on osd.56 to repair >>> >> > >>> >> > /var/log/ceph/ceph-osd.56.log:51:2015-08-18 07:26:37.035910 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/head//2 expected >>> >> > clone >>> >> > 90c59490/rbd_data.eb486436f2beb.0000000000007a65/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:52:2015-08-18 07:26:37.035960 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/head//2 expected >>> >> > clone >>> >> > f5759490/rbd_data.1631755377d7e.00000000000004da/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:53:2015-08-18 07:26:37.036133 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/head//2 expected >>> >> > clone >>> >> > fee49490/rbd_data.12483d3ba0794b.000000000000522f/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:54:2015-08-18 07:26:37.036243 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/head//2 expected >>> >> > clone >>> >> > a9b39490/rbd_data.12483d3ba0794b.00000000000037b3/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:55:2015-08-18 07:26:37.036289 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/head//2 expected >>> >> > clone >>> >> > bac19490/rbd_data.1238e82ae8944a.000000000000032e/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:56:2015-08-18 07:26:37.036314 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/head//2 expected >>> >> > clone >>> >> > 98519490/rbd_data.123e9c2ae8944a.0000000000000807/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:57:2015-08-18 07:26:37.036363 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/head//2 expected >>> >> > clone >>> >> > c3c09490/rbd_data.1238e82ae8944a.0000000000000c2b/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:58:2015-08-18 07:26:37.036432 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : deep-scrub 2.490 >>> >> > e1509490/rbd_data.1423897545e146.00000000000009a6/head//2 expected >>> >> > clone >>> >> > 28809490/rbd_data.edea7460fe42b.00000000000001d9/141//2 >>> >> > /var/log/ceph/ceph-osd.56.log:59:2015-08-18 07:26:38.548765 >>> >> > 7f94663b3700 >>> >> > -1 >>> >> > log_channel(cluster) log [ERR] : 2.490 deep-scrub 17 errors >>> >> > >>> >> > So, how i can solve "expected clone" situation by hand? >>> >> > Thank in advance! >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > ceph-users mailing list >>> >> > ceph-users@xxxxxxxxxxxxxx >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > >>> > >>> > >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com