Re: inconsistent PG -> unfound objects on an erasure coded system

Jeffrey McDonald <jmcdonal@xxxxxxx> · Tue, 8 Mar 2016 08:44:50 -0600

Resent to ceph-users to be under the message size limit....
On Tue, Mar 8, 2016 at 6:16 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:
OK, this is  done and I've observed the state change of 70.459 from active+clean to active+clean+inconsistent after the first scrub.  
Files attached:  bash script of commands (setuposddebug.bash), log script from the script (setuposddebug.log), three pg queries, one at the start, one at the end of the first scrub, one at the end of the second scrub. 

At this point, I now have 27 active+clean+inconsistent PGs.    While I'm not too concerned about how they are labeled, clients cannot extract objects which are in the  PGs and are labeled as unfound.    Its important for us to maintain user confidence in the system so I need a fix as soon as possible......

The log files from the OSDs are here:   

https://drive.google.com/open?id=0Bzz8TrxFvfembkt2XzlCZFVJZFU

Thanks,
Jeff

On Mon, Mar 7, 2016 at 7:26 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
Yep, just as before.  Actually, do it twice (wait for 'scrubbing' to

go away each time).

-Sam

On Mon, Mar 7, 2016 at 5:25 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:

> Just to be sure I grab what you need:

>

> 1- set debug logs for the pg 70.459

> 2 - Issue a deep-scrub ceph pg deep-scrub 70.459

> 3- stop once the 70.459 pg goes inconsistent?

>

> Thanks,

> Jeff

>

>

> On Mon, Mar 7, 2016 at 6:52 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>>

>> Hmm, I'll look into this a bit more tomorrow.  Can you get the tree

>> structure of the 70.459 pg directory on osd.307 (find . will do fine).

>> -Sam

>>

>> On Mon, Mar 7, 2016 at 4:50 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:

>> > 307 is on ceph03.

>> > Jeff

>> >

>> > On Mon, Mar 7, 2016 at 6:48 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>> >>

>> >> Which node is osd.307 on?

>> >> -Sam

>> >>

>> >> On Mon, Mar 7, 2016 at 4:43 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:

>> >> > ' I didn't see the errors in the tracker on the new nodes, but they

>> >> > were only receiving new data, not migrating it.' -- What do you mean

>> >> > by that?

>> >> > -Sam

>> >> >

>> >> > On Mon, Mar 7, 2016 at 4:42 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>

>> >> > wrote:

>> >> >> The filesystem is xfs everywhere, there are nine hosts.   The two

>> >> >> new

>> >> >> ceph

>> >> >> nodes 08, 09 have a new kernel.    I didn't see the errors in the

>> >> >> tracker on

>> >> >> the new nodes, but they were only receiving new data, not migrating

>> >> >> it.

>> >> >> Jeff

>> >> >>

>> >> >> ceph2: Linux ceph2 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC

>> >> >> 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph1: Linux ceph1 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC

>> >> >> 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph3: Linux ceph3 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC

>> >> >> 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph03: Linux ceph03 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph01: Linux ceph01 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph02: Linux ceph02 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph06: Linux ceph06 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph05: Linux ceph05 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph04: Linux ceph04 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph08: Linux ceph08 3.19.0-51-generic #58~14.04.1-Ubuntu SMP Fri

>> >> >> Feb

>> >> >> 26

>> >> >> 22:02:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph07: Linux ceph07 3.13.0-65-generic #106-Ubuntu SMP Fri Oct 2

>> >> >> 22:08:27

>> >> >> UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

>> >> >> ceph09: Linux ceph09 3.19.0-51-generic #58~14.04.1-Ubuntu SMP Fri

>> >> >> Feb

>> >> >> 26

>> >> >> 22:02:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

>> >> >>

>> >> >>

>> >> >> On Mon, Mar 7, 2016 at 6:39 PM, Samuel Just <sjust@xxxxxxxxxx>

>> >> >> wrote:

>> >> >>>

>> >> >>> What filesystem and kernel are you running on the osds?  This (and

>> >> >>> your other bug, actually) could be explained by some kind of weird

>> >> >>> kernel readdir behavior.

>> >> >>> -Sam

>> >> >>>

>> >> >>> On Mon, Mar 7, 2016 at 4:36 PM, Samuel Just <sjust@xxxxxxxxxx>

>> >> >>> wrote:

>> >> >>> > Hmm, so much for that theory, still looking.  If you can produce

>> >> >>> > another set of logs (as before) from scrubbing that pg, it might

>> >> >>> > help.

>> >> >>> > -Sam

>> >> >>> >

>> >> >>> > On Mon, Mar 7, 2016 at 4:34 PM, Jeffrey McDonald

>> >> >>> > <jmcdonal@xxxxxxx>

>> >> >>> > wrote:

>> >> >>> >> they're all the same.....see attached.

>> >> >>> >>

>> >> >>> >> On Mon, Mar 7, 2016 at 6:31 PM, Samuel Just <sjust@xxxxxxxxxx>

>> >> >>> >> wrote:

>> >> >>> >>>

>> >> >>> >>> Have you confirmed the versions?

>> >> >>> >>> -Sam

>> >> >>> >>>

>> >> >>> >>> On Mon, Mar 7, 2016 at 4:29 PM, Jeffrey McDonald

>> >> >>> >>> <jmcdonal@xxxxxxx>

>> >> >>> >>> wrote:

>> >> >>> >>> > I have one other very strange event happening, I've opened a

>> >> >>> >>> > ticket

>> >> >>> >>> > on

>> >> >>> >>> > it:

>> >> >>> >>> > http://tracker.ceph.com/issues/14766

>> >> >>> >>> >

>> >> >>> >>> > During this migration, OSD failed probably over 400 times

>> >> >>> >>> > while

>> >> >>> >>> > moving

>> >> >>> >>> > data

>> >> >>> >>> > around.   We move the empty directories and restarted the

>> >> >>> >>> > OSDs.

>> >> >>> >>> > I

>> >> >>> >>> > can't

>> >> >>> >>> > say if this is related--I have no reason to suspect it is.

>> >> >>> >>> >

>> >> >>> >>> > Jeff

>> >> >>> >>> >

>> >> >>> >>> > On Mon, Mar 7, 2016 at 5:31 PM, Shinobu Kinjo

>> >> >>> >>> > <shinobu.kj@xxxxxxxxx>

>> >> >>> >>> > wrote:

>> >> >>> >>> >>

>> >> >>> >>> >> What could cause this kind of unexpected behaviour?

>> >> >>> >>> >> Any assumption??

>> >> >>> >>> >> Sorry for interrupting you.

>> >> >>> >>> >>

>> >> >>> >>> >> Cheers,

>> >> >>> >>> >> S

>> >> >>> >>> >>

>> >> >>> >>> >> On Tue, Mar 8, 2016 at 8:19 AM, Samuel Just

>> >> >>> >>> >> <sjust@xxxxxxxxxx>

>> >> >>> >>> >> wrote:

>> >> >>> >>> >> > Hmm, at the end of the log, the pg is still inconsistent.

>> >> >>> >>> >> > Can

>> >> >>> >>> >> > you

>> >> >>> >>> >> > attach a ceph pg query on that pg?

>> >> >>> >>> >> > -Sam

>> >> >>> >>> >> >

>> >> >>> >>> >> > On Mon, Mar 7, 2016 at 3:05 PM, Samuel Just

>> >> >>> >>> >> > <sjust@xxxxxxxxxx>

>> >> >>> >>> >> > wrote:

>> >> >>> >>> >> >> If so, that strongly suggests that the pg was actually

>> >> >>> >>> >> >> never

>> >> >>> >>> >> >> inconsistent in the first place and that the bug is in

>> >> >>> >>> >> >> scrub

>> >> >>> >>> >> >> itself

>> >> >>> >>> >> >> presumably getting confused about an object during a

>> >> >>> >>> >> >> write.

>> >> >>> >>> >> >> The

>> >> >>> >>> >> >> next

>> >> >>> >>> >> >> step would be to get logs like the above from a pg as it

>> >> >>> >>> >> >> scrubs

>> >> >>> >>> >> >> transitioning from clean to inconsistent.  If it's really

>> >> >>> >>> >> >> a

>> >> >>> >>> >> >> race

>> >> >>> >>> >> >> between scrub and a write, it's probably just

>> >> >>> >>> >> >> non-deterministic,

>> >> >>> >>> >> >> you

>> >> >>> >>> >> >> could set logging on a set of osds and continuously scrub

>> >> >>> >>> >> >> any

>> >> >>> >>> >> >> pgs

>> >> >>> >>> >> >> which only map to those osds until you reproduce the

>> >> >>> >>> >> >> problem.

>> >> >>> >>> >> >> -Sam

>> >> >>> >>> >> >>

>> >> >>> >>> >> >> On Mon, Mar 7, 2016 at 2:44 PM, Samuel Just

>> >> >>> >>> >> >> <sjust@xxxxxxxxxx>

>> >> >>> >>> >> >> wrote:

>> >> >>> >>> >> >>> So after the scrub, it came up clean?  The

>> >> >>> >>> >> >>> inconsistent/missing

>> >> >>> >>> >> >>> objects reappeared?

>> >> >>> >>> >> >>> -Sam

>> >> >>> >>> >> >>>

>> >> >>> >>> >> >>> On Mon, Mar 7, 2016 at 2:33 PM, Jeffrey McDonald

>> >> >>> >>> >> >>> <jmcdonal@xxxxxxx>

>> >> >>> >>> >> >>> wrote:

>> >> >>> >>> >> >>>> Hi Sam,

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> I've done as you requested:

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> pg 70.459 is active+clean+inconsistent, acting

>> >> >>> >>> >> >>>> [307,210,273,191,132,450]

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> # for i in 307 210 273 191 132 450 ; do

>> >> >>> >>> >> >>>>> ceph tell osd.$i injectargs  '--debug-osd 20

>> >> >>> >>> >> >>>>> --debug-filestore 20

>> >> >>> >>> >> >>>>> --debug-ms 1'

>> >> >>> >>> >> >>>>> done

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> # date

>> >> >>> >>> >> >>>> Mon Mar  7 16:03:38 CST 2016

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> # ceph pg deep-scrub 70.459

>> >> >>> >>> >> >>>> instructing pg 70.459 on osd.307 to deep-scrub

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> Scrub finished around

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> # date

>> >> >>> >>> >> >>>> Mon Mar  7 16:13:03 CST 2016

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> I've tar'd+gziped the files which can be downloaded

>> >> >>> >>> >> >>>> from

>> >> >>> >>> >> >>>> here.

>> >> >>> >>> >> >>>> The

>> >> >>> >>> >> >>>> logs

>> >> >>> >>> >> >>>> start a minute or two after today at 16:00.

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> https://drive.google.com/folderview?id=0Bzz8TrxFvfema2NQUmotd1BOTnM&usp=sharing

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> Oddly(to me anyways), this pg is now active+clean:

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> # ceph pg dump  | grep 70.459

>> >> >>> >>> >> >>>> dumped all in format plain

>> >> >>> >>> >> >>>> 70.459 21377 0 0 0 0 64515446306 3088 3088 active+clean

>> >> >>> >>> >> >>>> 2016-03-07

>> >> >>> >>> >> >>>> 16:26:57.796537 279563'212832 279602:628151

>> >> >>> >>> >> >>>> [307,210,273,191,132,450]

>> >> >>> >>> >> >>>> 307

>> >> >>> >>> >> >>>> [307,210,273,191,132,450] 307 279563'212832 2016-03-07

>> >> >>> >>> >> >>>> 16:12:30.741984

>> >> >>> >>> >> >>>> 279563'212832 2016-03-07 16:12:30.741984

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> Regards,

>> >> >>> >>> >> >>>> Jeff

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> On Mon, Mar 7, 2016 at 4:11 PM, Samuel Just

>> >> >>> >>> >> >>>> <sjust@xxxxxxxxxx>

>> >> >>> >>> >> >>>> wrote:

>> >> >>> >>> >> >>>>>

>> >> >>> >>> >> >>>>> I think the unfound object on repair is fixed by

>> >> >>> >>> >> >>>>> d51806f5b330d5f112281fbb95ea6addf994324e (not in

>> >> >>> >>> >> >>>>> hammer

>> >> >>> >>> >> >>>>> yet).

>> >> >>> >>> >> >>>>> I

>> >> >>> >>> >> >>>>> opened http://tracker.ceph.com/issues/15002 for the

>> >> >>> >>> >> >>>>> backport

>> >> >>> >>> >> >>>>> and

>> >> >>> >>> >> >>>>> to

>> >> >>> >>> >> >>>>> make sure it's covered in ceph-qa-suite.  No idea at

>> >> >>> >>> >> >>>>> this

>> >> >>> >>> >> >>>>> time

>> >> >>> >>> >> >>>>> why

>> >> >>> >>> >> >>>>> the

>> >> >>> >>> >> >>>>> objects are disappearing though.

>> >> >>> >>> >> >>>>> -Sam

>> >> >>> >>> >> >>>>>

>> >> >>> >>> >> >>>>> On Mon, Mar 7, 2016 at 1:57 PM, Samuel Just

>> >> >>> >>> >> >>>>> <sjust@xxxxxxxxxx>

>> >> >>> >>> >> >>>>> wrote:

>> >> >>> >>> >> >>>>> > The one just scrubbed and now inconsistent.

>> >> >>> >>> >> >>>>> > -Sam

>> >> >>> >>> >> >>>>> >

>> >> >>> >>> >> >>>>> > On Mon, Mar 7, 2016 at 1:57 PM, Jeffrey McDonald

>> >> >>> >>> >> >>>>> > <jmcdonal@xxxxxxx>

>> >> >>> >>> >> >>>>> > wrote:

>> >> >>> >>> >> >>>>> >> Do you want me to enable this for the pg already

>> >> >>> >>> >> >>>>> >> with

>> >> >>> >>> >> >>>>> >> unfound

>> >> >>> >>> >> >>>>> >> objects

>> >> >>> >>> >> >>>>> >> or the

>> >> >>> >>> >> >>>>> >> placement group just scrubbed and now inconsistent?

>> >> >>> >>> >> >>>>> >> Jeff

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >> On Mon, Mar 7, 2016 at 3:54 PM, Samuel Just

>> >> >>> >>> >> >>>>> >> <sjust@xxxxxxxxxx>

>> >> >>> >>> >> >>>>> >> wrote:

>> >> >>> >>> >> >>>>> >>>

>> >> >>> >>> >> >>>>> >>> Can you enable

>> >> >>> >>> >> >>>>> >>>

>> >> >>> >>> >> >>>>> >>> debug osd = 20

>> >> >>> >>> >> >>>>> >>> debug filestore = 20

>> >> >>> >>> >> >>>>> >>> debug ms = 1

>> >> >>> >>> >> >>>>> >>>

>> >> >>> >>> >> >>>>> >>> on all osds in that PG, rescrub, and convey to us

>> >> >>> >>> >> >>>>> >>> the

>> >> >>> >>> >> >>>>> >>> resulting

>> >> >>> >>> >> >>>>> >>> logs?

>> >> >>> >>> >> >>>>> >>> -Sam

>> >> >>> >>> >> >>>>> >>>

>> >> >>> >>> >> >>>>> >>> On Mon, Mar 7, 2016 at 1:36 PM, Jeffrey McDonald

>> >> >>> >>> >> >>>>> >>> <jmcdonal@xxxxxxx>

>> >> >>> >>> >> >>>>> >>> wrote:

>> >> >>> >>> >> >>>>> >>> > Here is a PG which just went inconsistent:

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > pg 70.459 is active+clean+inconsistent, acting

>> >> >>> >>> >> >>>>> >>> > [307,210,273,191,132,450]

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > Attached is the result of a pg query on this.

>> >> >>> >>> >> >>>>> >>> > I

>> >> >>> >>> >> >>>>> >>> > will

>> >> >>> >>> >> >>>>> >>> > wait

>> >> >>> >>> >> >>>>> >>> > for your

>> >> >>> >>> >> >>>>> >>> > feedback before issuing a repair.

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > From what I read, the inconsistencies are more

>> >> >>> >>> >> >>>>> >>> > likely

>> >> >>> >>> >> >>>>> >>> > the

>> >> >>> >>> >> >>>>> >>> > result of

>> >> >>> >>> >> >>>>> >>> > ntp,

>> >> >>> >>> >> >>>>> >>> > but

>> >> >>> >>> >> >>>>> >>> > all nodes have the local ntp master and all are

>> >> >>> >>> >> >>>>> >>> > showing

>> >> >>> >>> >> >>>>> >>> > sync.

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > Regards,

>> >> >>> >>> >> >>>>> >>> > Jeff

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > On Mon, Mar 7, 2016 at 3:15 PM, Gregory Farnum

>> >> >>> >>> >> >>>>> >>> > <gfarnum@xxxxxxxxxx>

>> >> >>> >>> >> >>>>> >>> > wrote:

>> >> >>> >>> >> >>>>> >>> >>

>> >> >>> >>> >> >>>>> >>> >> [ Keeping this on the users list. ]

>> >> >>> >>> >> >>>>> >>> >>

>> >> >>> >>> >> >>>>> >>> >> Okay, so next time this happens you probably

>> >> >>> >>> >> >>>>> >>> >> want

>> >> >>> >>> >> >>>>> >>> >> to

>> >> >>> >>> >> >>>>> >>> >> do a

>> >> >>> >>> >> >>>>> >>> >> pg

>> >> >>> >>> >> >>>>> >>> >> query

>> >> >>> >>> >> >>>>> >>> >> on

>> >> >>> >>> >> >>>>> >>> >> the PG which has been reported as dirty. I

>> >> >>> >>> >> >>>>> >>> >> can't

>> >> >>> >>> >> >>>>> >>> >> help

>> >> >>> >>> >> >>>>> >>> >> much

>> >> >>> >>> >> >>>>> >>> >> beyond

>> >> >>> >>> >> >>>>> >>> >> that, but hopefully Kefu or David will chime in

>> >> >>> >>> >> >>>>> >>> >> once

>> >> >>> >>> >> >>>>> >>> >> there's

>> >> >>> >>> >> >>>>> >>> >> a

>> >> >>> >>> >> >>>>> >>> >> little

>> >> >>> >>> >> >>>>> >>> >> more for them to look at.

>> >> >>> >>> >> >>>>> >>> >> -Greg

>> >> >>> >>> >> >>>>> >>> >>

>> >> >>> >>> >> >>>>> >>> >> On Mon, Mar 7, 2016 at 1:00 PM, Jeffrey

>> >> >>> >>> >> >>>>> >>> >> McDonald

>> >> >>> >>> >> >>>>> >>> >> <jmcdonal@xxxxxxx>

>> >> >>> >>> >> >>>>> >>> >> wrote:

>> >> >>> >>> >> >>>>> >>> >> > Hi Greg,

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > I'm running the ceph version hammer,

>> >> >>> >>> >> >>>>> >>> >> > ceph version 0.94.5

>> >> >>> >>> >> >>>>> >>> >> > (9764da52395923e0b32908d83a9f7304401fee43)

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > The hardware migration was performed by just

>> >> >>> >>> >> >>>>> >>> >> > setting

>> >> >>> >>> >> >>>>> >>> >> > the

>> >> >>> >>> >> >>>>> >>> >> > crush

>> >> >>> >>> >> >>>>> >>> >> > map to

>> >> >>> >>> >> >>>>> >>> >> > zero

>> >> >>> >>> >> >>>>> >>> >> > for the OSD we wanted to retire.   The system

>> >> >>> >>> >> >>>>> >>> >> > was

>> >> >>> >>> >> >>>>> >>> >> > performing

>> >> >>> >>> >> >>>>> >>> >> > poorly

>> >> >>> >>> >> >>>>> >>> >> > with

>> >> >>> >>> >> >>>>> >>> >> > these older OSDs and we had a difficult time

>> >> >>> >>> >> >>>>> >>> >> > maintaining

>> >> >>> >>> >> >>>>> >>> >> > stability of

>> >> >>> >>> >> >>>>> >>> >> > the

>> >> >>> >>> >> >>>>> >>> >> > system.    The old OSDs are still there but

>> >> >>> >>> >> >>>>> >>> >> > all

>> >> >>> >>> >> >>>>> >>> >> > of

>> >> >>> >>> >> >>>>> >>> >> > the

>> >> >>> >>> >> >>>>> >>> >> > data

>> >> >>> >>> >> >>>>> >>> >> > is

>> >> >>> >>> >> >>>>> >>> >> > now

>> >> >>> >>> >> >>>>> >>> >> > migrated

>> >> >>> >>> >> >>>>> >>> >> > to new and/or existing hardware.

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > Thanks,

>> >> >>> >>> >> >>>>> >>> >> > Jeff

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > On Mon, Mar 7, 2016 at 2:56 PM, Gregory

>> >> >>> >>> >> >>>>> >>> >> > Farnum

>> >> >>> >>> >> >>>>> >>> >> > <gfarnum@xxxxxxxxxx>

>> >> >>> >>> >> >>>>> >>> >> > wrote:

>> >> >>> >>> >> >>>>> >>> >> >>

>> >> >>> >>> >> >>>>> >>> >> >> On Mon, Mar 7, 2016 at 12:07 PM, Jeffrey

>> >> >>> >>> >> >>>>> >>> >> >> McDonald

>> >> >>> >>> >> >>>>> >>> >> >> <jmcdonal@xxxxxxx>

>> >> >>> >>> >> >>>>> >>> >> >> wrote:

>> >> >>> >>> >> >>>>> >>> >> >> > Hi,

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > For a while, we've been seeing

>> >> >>> >>> >> >>>>> >>> >> >> > inconsistent

>> >> >>> >>> >> >>>>> >>> >> >> > placement

>> >> >>> >>> >> >>>>> >>> >> >> > groups

>> >> >>> >>> >> >>>>> >>> >> >> > on

>> >> >>> >>> >> >>>>> >>> >> >> > our

>> >> >>> >>> >> >>>>> >>> >> >> > erasure

>> >> >>> >>> >> >>>>> >>> >> >> > coded system.   The placement groups go

>> >> >>> >>> >> >>>>> >>> >> >> > from

>> >> >>> >>> >> >>>>> >>> >> >> > a

>> >> >>> >>> >> >>>>> >>> >> >> > state

>> >> >>> >>> >> >>>>> >>> >> >> > of

>> >> >>> >>> >> >>>>> >>> >> >> > active+clean

>> >> >>> >>> >> >>>>> >>> >> >> > to

>> >> >>> >>> >> >>>>> >>> >> >> > active+clean+inconsistent after a deep

>> >> >>> >>> >> >>>>> >>> >> >> > scrub:

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:45:42.044131 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320s0 deep-scrub stat mismatch, got

>> >> >>> >>> >> >>>>> >>> >> >> > 21446/21428

>> >> >>> >>> >> >>>>> >>> >> >> > objects,

>> >> >>> >>> >> >>>>> >>> >> >> > 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > clones,

>> >> >>> >>> >> >>>>> >>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > hit_set_archive,

>> >> >>> >>> >> >>>>> >>> >> >> > 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > whiteouts,

>> >> >>> >>> >> >>>>> >>> >> >> > 64682334170/64624353083 bytes,0/0

>> >> >>> >>> >> >>>>> >>> >> >> > hit_set_archive

>> >> >>> >>> >> >>>>> >>> >> >> > bytes.

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:45:42.044416 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320s0 deep-scrub 18 missing, 0

>> >> >>> >>> >> >>>>> >>> >> >> > inconsistent

>> >> >>> >>> >> >>>>> >>> >> >> > objects

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:45:42.044464 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320 deep-scrub 73 errors

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > So I tell the placement group to perform a

>> >> >>> >>> >> >>>>> >>> >> >> > repair:

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:49:26.047177 7f385d118700  0

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [INF] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320 repair starts

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:49:57.087291 7f3858b0a700  0

>> >> >>> >>> >> >>>>> >>> >> >> > --

>> >> >>> >>> >> >>>>> >>> >> >> > 10.31.0.2:6874/13937

>> >> >>> >>> >> >>>>> >>> >> >> > >>

>> >> >>> >>> >> >>>>> >>> >> >> > 10.31.0.6:6824/8127 pipe(0x2e578000 sd=697

>> >> >>> >>> >> >>>>> >>> >> >> > :6874

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > The repair finds missing shards and

>> >> >>> >>> >> >>>>> >>> >> >> > repairs

>> >> >>> >>> >> >>>>> >>> >> >> > them,

>> >> >>> >>> >> >>>>> >>> >> >> > but

>> >> >>> >>> >> >>>>> >>> >> >> > then I

>> >> >>> >>> >> >>>>> >>> >> >> > have

>> >> >>> >>> >> >>>>> >>> >> >> > 18

>> >> >>> >>> >> >>>>> >>> >> >> > 'unfound objects' :

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:51:28.467590 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320s0 repair stat mismatch, got

>> >> >>> >>> >> >>>>> >>> >> >> > 21446/21428

>> >> >>> >>> >> >>>>> >>> >> >> > objects,

>> >> >>> >>> >> >>>>> >>> >> >> > 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > clones,

>> >> >>> >>> >> >>>>> >>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > hit_set_archive,

>> >> >>> >>> >> >>>>> >>> >> >> > 0/0

>> >> >>> >>> >> >>>>> >>> >> >> > whiteouts,

>> >> >>> >>> >> >>>>> >>> >> >> > 64682334170/64624353083 bytes,0/0

>> >> >>> >>> >> >>>>> >>> >> >> > hit_set_archive

>> >> >>> >>> >> >>>>> >>> >> >> > bytes.

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:51:28.468358 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320s0 repair 18 missing, 0 inconsistent

>> >> >>> >>> >> >>>>> >>> >> >> > objects

>> >> >>> >>> >> >>>>> >>> >> >> > 2016-03-07 13:51:28.469431 7f385d118700 -1

>> >> >>> >>> >> >>>>> >>> >> >> > log_channel(cluster)

>> >> >>> >>> >> >>>>> >>> >> >> > log

>> >> >>> >>> >> >>>>> >>> >> >> > [ERR] :

>> >> >>> >>> >> >>>>> >>> >> >> > 70.320 repair 73 errors, 73 fixed

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > I've traced one of the unfound objects all

>> >> >>> >>> >> >>>>> >>> >> >> > the

>> >> >>> >>> >> >>>>> >>> >> >> > way

>> >> >>> >>> >> >>>>> >>> >> >> > through the

>> >> >>> >>> >> >>>>> >>> >> >> > system

>> >> >>> >>> >> >>>>> >>> >> >> > and

>> >> >>> >>> >> >>>>> >>> >> >> > I've found that they are not really lost.

>> >> >>> >>> >> >>>>> >>> >> >> > I

>> >> >>> >>> >> >>>>> >>> >> >> > can

>> >> >>> >>> >> >>>>> >>> >> >> > fail

>> >> >>> >>> >> >>>>> >>> >> >> > over

>> >> >>> >>> >> >>>>> >>> >> >> > the

>> >> >>> >>> >> >>>>> >>> >> >> > osd

>> >> >>> >>> >> >>>>> >>> >> >> > and

>> >> >>> >>> >> >>>>> >>> >> >> > recover the files.   This is happening

>> >> >>> >>> >> >>>>> >>> >> >> > quite

>> >> >>> >>> >> >>>>> >>> >> >> > regularly

>> >> >>> >>> >> >>>>> >>> >> >> > now

>> >> >>> >>> >> >>>>> >>> >> >> > after a

>> >> >>> >>> >> >>>>> >>> >> >> > large

>> >> >>> >>> >> >>>>> >>> >> >> > migration of data from old hardware to

>> >> >>> >>> >> >>>>> >>> >> >> > new(migration

>> >> >>> >>> >> >>>>> >>> >> >> > is

>> >> >>> >>> >> >>>>> >>> >> >> > now

>> >> >>> >>> >> >>>>> >>> >> >> > complete).

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > The system sets the PG into 'recovery',

>> >> >>> >>> >> >>>>> >>> >> >> > but

>> >> >>> >>> >> >>>>> >>> >> >> > we've

>> >> >>> >>> >> >>>>> >>> >> >> > seen

>> >> >>> >>> >> >>>>> >>> >> >> > the

>> >> >>> >>> >> >>>>> >>> >> >> > system

>> >> >>> >>> >> >>>>> >>> >> >> > in

>> >> >>> >>> >> >>>>> >>> >> >> > a

>> >> >>> >>> >> >>>>> >>> >> >> > recovering state for many days.    Should

>> >> >>> >>> >> >>>>> >>> >> >> > we

>> >> >>> >>> >> >>>>> >>> >> >> > just

>> >> >>> >>> >> >>>>> >>> >> >> > be

>> >> >>> >>> >> >>>>> >>> >> >> > patient

>> >> >>> >>> >> >>>>> >>> >> >> > or do

>> >> >>> >>> >> >>>>> >>> >> >> > we

>> >> >>> >>> >> >>>>> >>> >> >> > need

>> >> >>> >>> >> >>>>> >>> >> >> > to dig further into the issue?

>> >> >>> >>> >> >>>>> >>> >> >>

>> >> >>> >>> >> >>>>> >>> >> >> You may need to dig into this more, although

>> >> >>> >>> >> >>>>> >>> >> >> I'm

>> >> >>> >>> >> >>>>> >>> >> >> not

>> >> >>> >>> >> >>>>> >>> >> >> sure

>> >> >>> >>> >> >>>>> >>> >> >> what

>> >> >>> >>> >> >>>>> >>> >> >> the

>> >> >>> >>> >> >>>>> >>> >> >> issue is likely to be. What version of Ceph

>> >> >>> >>> >> >>>>> >>> >> >> are

>> >> >>> >>> >> >>>>> >>> >> >> you

>> >> >>> >>> >> >>>>> >>> >> >> running? How

>> >> >>> >>> >> >>>>> >>> >> >> did

>> >> >>> >>> >> >>>>> >>> >> >> you do this hardware migration?

>> >> >>> >>> >> >>>>> >>> >> >> -Greg

>> >> >>> >>> >> >>>>> >>> >> >>

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > pg 70.320 is stuck unclean for 704.803040,

>> >> >>> >>> >> >>>>> >>> >> >> > current

>> >> >>> >>> >> >>>>> >>> >> >> > state

>> >> >>> >>> >> >>>>> >>> >> >> > active+recovering,

>> >> >>> >>> >> >>>>> >>> >> >> > last acting [277,101,218,49,304,412]

>> >> >>> >>> >> >>>>> >>> >> >> > pg 70.320 is active+recovering, acting

>> >> >>> >>> >> >>>>> >>> >> >> > [277,101,218,49,304,412],

>> >> >>> >>> >> >>>>> >>> >> >> > 18

>> >> >>> >>> >> >>>>> >>> >> >> > unfound

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > There is no indication of any problems

>> >> >>> >>> >> >>>>> >>> >> >> > with

>> >> >>> >>> >> >>>>> >>> >> >> > down

>> >> >>> >>> >> >>>>> >>> >> >> > OSDs

>> >> >>> >>> >> >>>>> >>> >> >> > or

>> >> >>> >>> >> >>>>> >>> >> >> > network

>> >> >>> >>> >> >>>>> >>> >> >> > issues

>> >> >>> >>> >> >>>>> >>> >> >> > with

>> >> >>> >>> >> >>>>> >>> >> >> > OSDs.

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > Thanks,

>> >> >>> >>> >> >>>>> >>> >> >> > Jeff

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > --

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > Jeffrey McDonald, PhD

>> >> >>> >>> >> >>>>> >>> >> >> > Assistant Director for HPC Operations

>> >> >>> >>> >> >>>>> >>> >> >> > Minnesota Supercomputing Institute

>> >> >>> >>> >> >>>>> >>> >> >> > University of Minnesota Twin Cities

>> >> >>> >>> >> >>>>> >>> >> >> > 599 Walter Library           email:

>> >> >>> >>> >> >>>>> >>> >> >> > jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> >> >>>>> >>> >> >> > 117 Pleasant St SE           phone: +1 612

>> >> >>> >>> >> >>>>> >>> >> >> > 625-6905

>> >> >>> >>> >> >>>>> >>> >> >> > Minneapolis, MN 55455        fax:   +1 612

>> >> >>> >>> >> >>>>> >>> >> >> > 624-8861

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > _______________________________________________

>> >> >>> >>> >> >>>>> >>> >> >> > ceph-users mailing list

>> >> >>> >>> >> >>>>> >>> >> >> > ceph-users@xxxxxxxxxxxxxx

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >>> >>> >> >>>>> >>> >> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > --

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> > Jeffrey McDonald, PhD

>> >> >>> >>> >> >>>>> >>> >> > Assistant Director for HPC Operations

>> >> >>> >>> >> >>>>> >>> >> > Minnesota Supercomputing Institute

>> >> >>> >>> >> >>>>> >>> >> > University of Minnesota Twin Cities

>> >> >>> >>> >> >>>>> >>> >> > 599 Walter Library           email:

>> >> >>> >>> >> >>>>> >>> >> > jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> >> >>>>> >>> >> > 117 Pleasant St SE           phone: +1 612

>> >> >>> >>> >> >>>>> >>> >> > 625-6905

>> >> >>> >>> >> >>>>> >>> >> > Minneapolis, MN 55455        fax:   +1 612

>> >> >>> >>> >> >>>>> >>> >> > 624-8861

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > --

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > Jeffrey McDonald, PhD

>> >> >>> >>> >> >>>>> >>> > Assistant Director for HPC Operations

>> >> >>> >>> >> >>>>> >>> > Minnesota Supercomputing Institute

>> >> >>> >>> >> >>>>> >>> > University of Minnesota Twin Cities

>> >> >>> >>> >> >>>>> >>> > 599 Walter Library           email:

>> >> >>> >>> >> >>>>> >>> > jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> >> >>>>> >>> > 117 Pleasant St SE           phone: +1 612

>> >> >>> >>> >> >>>>> >>> > 625-6905

>> >> >>> >>> >> >>>>> >>> > Minneapolis, MN 55455        fax:   +1 612

>> >> >>> >>> >> >>>>> >>> > 624-8861

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > _______________________________________________

>> >> >>> >>> >> >>>>> >>> > ceph-users mailing list

>> >> >>> >>> >> >>>>> >>> > ceph-users@xxxxxxxxxxxxxx

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >>> >>> >> >>>>> >>> >

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >> --

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >> Jeffrey McDonald, PhD

>> >> >>> >>> >> >>>>> >> Assistant Director for HPC Operations

>> >> >>> >>> >> >>>>> >> Minnesota Supercomputing Institute

>> >> >>> >>> >> >>>>> >> University of Minnesota Twin Cities

>> >> >>> >>> >> >>>>> >> 599 Walter Library           email:

>> >> >>> >>> >> >>>>> >> jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> >> >>>>> >> 117 Pleasant St SE           phone: +1 612 625-6905

>> >> >>> >>> >> >>>>> >> Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>> >>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> --

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>> Jeffrey McDonald, PhD

>> >> >>> >>> >> >>>> Assistant Director for HPC Operations

>> >> >>> >>> >> >>>> Minnesota Supercomputing Institute

>> >> >>> >>> >> >>>> University of Minnesota Twin Cities

>> >> >>> >>> >> >>>> 599 Walter Library           email:

>> >> >>> >>> >> >>>> jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> >> >>>> 117 Pleasant St SE           phone: +1 612 625-6905

>> >> >>> >>> >> >>>> Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> >>>>

>> >> >>> >>> >> > _______________________________________________

>> >> >>> >>> >> > ceph-users mailing list

>> >> >>> >>> >> > ceph-users@xxxxxxxxxxxxxx

>> >> >>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >> >>> >>> >>

>> >> >>> >>> >>

>> >> >>> >>> >>

>> >> >>> >>> >> --

>> >> >>> >>> >> Email:

>> >> >>> >>> >> shinobu@xxxxxxxxx

>> >> >>> >>> >> GitHub:

>> >> >>> >>> >> shinobu-x

>> >> >>> >>> >> Blog:

>> >> >>> >>> >> Life with Distributed Computational System based on

>> >> >>> >>> >> OpenSource

>> >> >>> >>> >

>> >> >>> >>> >

>> >> >>> >>> >

>> >> >>> >>> >

>> >> >>> >>> > --

>> >> >>> >>> >

>> >> >>> >>> > Jeffrey McDonald, PhD

>> >> >>> >>> > Assistant Director for HPC Operations

>> >> >>> >>> > Minnesota Supercomputing Institute

>> >> >>> >>> > University of Minnesota Twin Cities

>> >> >>> >>> > 599 Walter Library           email:

>> >> >>> >>> > jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >>> > 117 Pleasant St SE           phone: +1 612 625-6905

>> >> >>> >>> > Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >> >>> >>> >

>> >> >>> >>> >

>> >> >>> >>

>> >> >>> >>

>> >> >>> >>

>> >> >>> >>

>> >> >>> >> --

>> >> >>> >>

>> >> >>> >> Jeffrey McDonald, PhD

>> >> >>> >> Assistant Director for HPC Operations

>> >> >>> >> Minnesota Supercomputing Institute

>> >> >>> >> University of Minnesota Twin Cities

>> >> >>> >> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>> >> >>> >> 117 Pleasant St SE           phone: +1 612 625-6905

>> >> >>> >> Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >> >>> >>

>> >> >>> >>

>> >> >>

>> >> >>

>> >> >>

>> >> >>

>> >> >> --

>> >> >>

>> >> >> Jeffrey McDonald, PhD

>> >> >> Assistant Director for HPC Operations

>> >> >> Minnesota Supercomputing Institute

>> >> >> University of Minnesota Twin Cities

>> >> >> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>> >> >> 117 Pleasant St SE           phone: +1 612 625-6905

>> >> >> Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >> >>

>> >> >>

>> >

>> >

>> >

>> >

>> > --

>> >

>> > Jeffrey McDonald, PhD

>> > Assistant Director for HPC Operations

>> > Minnesota Supercomputing Institute

>> > University of Minnesota Twin Cities

>> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

>> > 117 Pleasant St SE           phone: +1 612 625-6905

>> > Minneapolis, MN 55455        fax:   +1 612 624-8861

>> >

>> >

>

>

>

>

> --

>

> Jeffrey McDonald, PhD

> Assistant Director for HPC Operations

> Minnesota Supercomputing Institute

> University of Minnesota Twin Cities

> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx

> 117 Pleasant St SE           phone: +1 612 625-6905

> Minneapolis, MN 55455        fax:   +1 612 624-8861

>

>

-- 
Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
117 Pleasant St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861

-- 
Jeffrey McDonald, PhD
Assistant Director for HPC Operations
Minnesota Supercomputing Institute
University of Minnesota Twin Cities
599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
117 Pleasant St SE           phone: +1 612 625-6905
Minneapolis, MN 55455        fax:   +1 612 624-8861

ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
    cluster 5221cc73-869e-4c20-950f-18824ddd6692
     health HEALTH_ERR
            11 pgs inconsistent
            5 pgs recovering
            5 pgs stuck unclean
            recovery 912/536222136 objects degraded (0.000%)
            recovery 120/90915640 unfound (0.000%)
            976 scrub errors
     monmap e9: 3 mons at {cephmon1=10.32.16.93:6789/0,cephmon2=10.32.16.85:6789/0,cephmon3=10.32.16.89:6789/0}
            election epoch 112718, quorum 0,1,2 cephmon2,cephmon3,cephmon1
     mdsmap e11408: 1/1/1 up {0=0=up:active}
     osdmap e279602: 449 osds: 449 up, 422 in
      pgmap v26478394: 7788 pgs, 21 pools, 251 TB data, 88784 kobjects
            412 TB used, 2777 TB / 3190 TB avail
            912/536222136 objects degraded (0.000%)
            120/90915640 unfound (0.000%)
                7759 active+clean
                  11 active+clean+inconsistent
                   8 active+clean+scrubbing
                   5 active+recovering
                   5 active+clean+scrubbing+deep
Starting scrub test Mon Mar  7 21:31:20 CST 2016
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 
instructing pg 70.459 on osd.307 to deep-scrub
    cluster 5221cc73-869e-4c20-950f-18824ddd6692
     health HEALTH_ERR
            18 pgs inconsistent
            3 pgs recovering
            3 pgs stuck unclean
            recovery 557/536222133 objects degraded (0.000%)
            recovery 79/90915639 unfound (0.000%)
            1385 scrub errors
     monmap e9: 3 mons at {cephmon1=10.32.16.93:6789/0,cephmon2=10.32.16.85:6789/0,cephmon3=10.32.16.89:6789/0}
            election epoch 112718, quorum 0,1,2 cephmon2,cephmon3,cephmon1
     mdsmap e11408: 1/1/1 up {0=0=up:active}
     osdmap e279613: 449 osds: 449 up, 422 in
      pgmap v26481367: 7788 pgs, 21 pools, 251 TB data, 88784 kobjects
            412 TB used, 2777 TB / 3190 TB avail
            557/536222133 objects degraded (0.000%)
            79/90915639 unfound (0.000%)
                7767 active+clean
                  18 active+clean+inconsistent
                   3 active+recovering
instructing pg 70.459 on osd.307 to deep-scrub
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
debug_osd=0/0 debug_filestore=0/0 debug_ms=0/0 
Ending scrub test Mon Mar  7 23:31:25 CST 2016
Attachment:
pg70.459queryStart.gz

Description: GNU Zip compressed data
Attachment:
pg70.459queryEnd2.gz

Description: GNU Zip compressed data
Attachment:
pg70.459queryEnd1.gz

Description: GNU Zip compressed data
Attachment:
setuposddebug.bash

Description: Binary data
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com