Re: inconsistent PG -> unfound objects on an erasure coded system

Jeffrey McDonald <jmcdonal@xxxxxxx> · Mon, 7 Mar 2016 16:33:48 -0600

Hi Sam, 

I've done as you requested: 

pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450] 

# for i in 307 210 273 191 132 450 ; do  
> ceph tell osd.$i injectargs  '--debug-osd 20 --debug-filestore 20 --debug-ms 1' 
> done 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1  

# date  
Mon Mar  7 16:03:38 CST 2016 

# ceph pg deep-scrub 70.459  
instructing pg 70.459 on osd.307 to deep-scrub 

Scrub finished around 
# date 
Mon Mar  7 16:13:03 CST 2016 

I've tar'd+gziped the files which can be downloaded from here.   The logs start a minute or two after today at 16:00.    

https://drive.google.com/folderview?id=0Bzz8TrxFvfema2NQUmotd1BOTnM&usp=sharing

Oddly(to me anyways), this pg is now active+clean: 

# ceph pg dump  | grep 70.459 
dumped all in format plain
70.459	21377	0	0	0	0	64515446306	3088	3088	active+clean	2016-03-07 16:26:57.796537	279563'212832	279602:628151	[307,210,273,191,132,450]	307	[307,210,273,191,132,450]	307	279563'212832	2016-03-07 16:12:30.741984	279563'212832	2016-03-07 16:12:30.741984

Regards,
Jeff

On Mon, Mar 7, 2016 at 4:11 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
I think the unfound object on repair is fixed by

d51806f5b330d5f112281fbb95ea6addf994324e (not in hammer yet).  I

opened http://tracker.ceph.com/issues/15002 for the backport and to

make sure it's covered in ceph-qa-suite.  No idea at this time why the

objects are disappearing though.

-Sam

On Mon, Mar 7, 2016 at 1:57 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

> The one just scrubbed and now inconsistent.

> -Sam

>

> On Mon, Mar 7, 2016 at 1:57 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:

>> Do you want me to enable this for the pg already with unfound objects or the

>> placement group just scrubbed and now inconsistent?

>> Jeff

>>

>> On Mon, Mar 7, 2016 at 3:54 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>>>

>>> Can you enable

>>>

>>> debug osd = 20

>>> debug filestore = 20

>>> debug ms = 1

>>>

>>> on all osds in that PG, rescrub, and convey to us the resulting logs?

>>> -Sam

>>>

>>> On Mon, Mar 7, 2016 at 1:36 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:

>>> > Here is a PG which just went inconsistent:

>>> >

>>> > pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450]

>>> >

>>> > Attached is the result of a pg query on this.   I will wait for your

>>> > feedback before issuing a repair.

>>> >

>>> > From what I read, the inconsistencies are more likely the result of ntp,

>>> > but

>>> > all nodes have the local ntp master and all are showing sync.

>>> >

>>> > Regards,

>>> > Jeff

>>> >

>>> > On Mon, Mar 7, 2016 at 3:15 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>

>>> > wrote:

>>> >>

>>> >> [ Keeping this on the users list. ]

>>> >>

>>> >> Okay, so next time this happens you probably want to do a pg query on

>>> >> the PG which has been reported as dirty. I can't help much beyond

>>> >> that, but hopefully Kefu or David will chime in once there's a little

>>> >> more for them to look at.

>>> >> -Greg

>>> >>

>>> >> On Mon, Mar 7, 2016 at 1:00 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>

>>> >> wrote:

>>> >> > Hi Greg,

>>> >> >

>>> >> > I'm running the ceph version hammer,

>>> >> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

>>> >> >

>>> >> > The hardware migration was performed by just setting the crush map to

>>> >> > zero

>>> >> > for the OSD we wanted to retire.   The system was performing poorly

>>> >> > with

>>> >> > these older OSDs and we had a difficult time maintaining stability of

>>> >> > the

>>> >> > system.    The old OSDs are still there but all of the data is now

>>> >> > migrated

>>> >> > to new and/or existing hardware.

>>> >> >

>>> >> > Thanks,

>>> >> > Jeff

>>> >> >

>>> >> >

>>> >> >

>>> >> >

>>> >> >

>>> >> > On Mon, Mar 7, 2016 at 2:56 PM, Gregory Farnum <gfarnum@xxxxxxxxxx>

>>> >> > wrote:

>>> >> >>

>>> >> >> On Mon, Mar 7, 2016 at 12:07 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>

>>> >> >> wrote:

>>> >> >> > Hi,

>>> >> >> >

>>> >> >> > For a while, we've been seeing inconsistent placement groups on

>>> >> >> > our

>>> >> >> > erasure

>>> >> >> > coded system.   The placement groups go from a state of

>>> >> >> > active+clean

>>> >> >> > to

>>> >> >> > active+clean+inconsistent after a deep scrub:

>>> >> >> >

>>> >> >> >

>>> >> >> > 2016-03-07 13:45:42.044131 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320s0 deep-scrub stat mismatch, got 21446/21428 objects, 0/0

>>> >> >> > clones,

>>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,

>>> >> >> > 64682334170/64624353083 bytes,0/0 hit_set_archive bytes.

>>> >> >> > 2016-03-07 13:45:42.044416 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320s0 deep-scrub 18 missing, 0 inconsistent objects

>>> >> >> > 2016-03-07 13:45:42.044464 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320 deep-scrub 73 errors

>>> >> >> >

>>> >> >> > So I tell the placement group to perform a repair:

>>> >> >> >

>>> >> >> > 2016-03-07 13:49:26.047177 7f385d118700  0 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [INF] :

>>> >> >> > 70.320 repair starts

>>> >> >> > 2016-03-07 13:49:57.087291 7f3858b0a700  0 -- 10.31.0.2:6874/13937

>>> >> >> > >>

>>> >> >> > 10.31.0.6:6824/8127 pipe(0x2e578000 sd=697 :6874

>>> >> >> >

>>> >> >> > The repair finds missing shards and repairs them, but then I have

>>> >> >> > 18

>>> >> >> > 'unfound objects' :

>>> >> >> >

>>> >> >> >

>>> >> >> > 2016-03-07 13:51:28.467590 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320s0 repair stat mismatch, got 21446/21428 objects, 0/0

>>> >> >> > clones,

>>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,

>>> >> >> > 64682334170/64624353083 bytes,0/0 hit_set_archive bytes.

>>> >> >> > 2016-03-07 13:51:28.468358 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320s0 repair 18 missing, 0 inconsistent objects

>>> >> >> > 2016-03-07 13:51:28.469431 7f385d118700 -1 log_channel(cluster)

>>> >> >> > log

>>> >> >> > [ERR] :

>>> >> >> > 70.320 repair 73 errors, 73 fixed

>>> >> >> >

>>> >> >> >

>>> >> >> > I've traced one of the unfound objects all the way through the

>>> >> >> > system

>>> >> >> > and

>>> >> >> > I've found that they are not really lost.   I can fail over the

>>> >> >> > osd

>>> >> >> > and

>>> >> >> > recover the files.   This is happening quite regularly now after a

>>> >> >> > large

>>> >> >> > migration of data from old hardware to new(migration is now

>>> >> >> > complete).

>>> >> >> >

>>> >> >> > The system sets the PG into 'recovery', but we've seen the system

>>> >> >> > in

>>> >> >> > a

>>> >> >> > recovering state for many days.    Should we just be patient or do

>>> >> >> > we

>>> >> >> > need

>>> >> >> > to dig further into the issue?

>>> >> >>

>>> >> >> You may need to dig into this more, although I'm not sure what the

>>> >> >> issue is likely to be. What version of Ceph are you running? How did

>>> >> >> you do this hardware migration?

>>> >> >> -Greg

>>> >> >>

>>> >> >> >

>>> >> >> >

>>> >> >> > pg 70.320 is stuck unclean for 704.803040, current state

>>> >> >> > active+recovering,

>>> >> >> > last acting [277,101,218,49,304,412]

>>> >> >> > pg 70.320 is active+recovering, acting [277,101,218,49,304,412],

>>> >> >> > 18

>>> >> >> > unfound

>>> >> >> >

>>> >> >> > There is no indication of any problems with down OSDs or network

>>> >> >> > issues

>>> >> >> > with

>>> >> >> > OSDs.

>>> >> >> >

>>> >> >> > Thanks,

>>> >> >> > Jeff

>>> >> >> >

>>> >> >> >

>>> >> >> > --

>>> >> >> >

>>> >> >> > Jeffrey McDonald, PhD

>>> >> >> > Assistant Director for HPC Operations

>>> >> >> > Minnesota Supercomputing Institute

>>> >> >> > University of Minnesota Twin Cities

>>> >> >> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>>> >> >> > 117 Pleasant St SE           phone: +1 612 625-6905

>>> >> >> > Minneapolis, MN 55455        fax:   +1 612 624-8861

>>> >> >> >

>>> >> >> >

>>> >> >> >

>>> >> >> > _______________________________________________

>>> >> >> > ceph-users mailing list

>>> >> >> > ceph-users@xxxxxxxxxxxxxx

>>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >> >> >

>>> >> >

>>> >> >

>>> >> >

>>> >> >

>>> >> > --

>>> >> >

>>> >> > Jeffrey McDonald, PhD

>>> >> > Assistant Director for HPC Operations

>>> >> > Minnesota Supercomputing Institute

>>> >> > University of Minnesota Twin Cities

>>> >> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>>> >> > 117 Pleasant St SE           phone: +1 612 625-6905

>>> >> > Minneapolis, MN 55455        fax:   +1 612 624-8861

>>> >> >

>>> >> >

>>> >

>>> >

>>> >

>>> >

>>> > --

>>> >

>>> > Jeffrey McDonald, PhD

>>> > Assistant Director for HPC Operations

>>> > Minnesota Supercomputing Institute

>>> > University of Minnesota Twin Cities

>>> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>>> > 117 Pleasant St SE           phone: +1 612 625-6905

>>> > Minneapolis, MN 55455        fax:   +1 612 624-8861

>>> >

>>> >

>>> >

>>> > _______________________________________________

>>> > ceph-users mailing list

>>> > ceph-users@xxxxxxxxxxxxxx

>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >

>>

>>

>>

>>

>> --

>>

>> Jeffrey McDonald, PhD

>> Assistant Director for HPC Operations

>> Minnesota Supercomputing Institute

>> University of Minnesota Twin Cities

>> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>> 117 Pleasant St SE           phone: +1 612 625-6905

>> Minneapolis, MN 55455        fax:   +1 612 624-8861

>>

>>

-- 
Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
117 Pleasant St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com