Jeffrey: can you confirm through the admin socket the versions running on each of those osds and include the output in your reply? I have a theory about what's causing the objects to be erroneously reported as inconsistent, but it requires that osd.307 be running a different version. -Sam On Mon, Mar 7, 2016 at 3:34 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: > Well, the fact that different objects are being selected as > inconsistent strongly suggests that the objects are not actually > inconsistent. Thus, at the moment my assumption is a bug in scrub... > -Sam > > On Mon, Mar 7, 2016 at 3:31 PM, Shinobu Kinjo <shinobu.kj@xxxxxxxxx> wrote: >> What could cause this kind of unexpected behaviour? >> Any assumption?? >> Sorry for interrupting you. >> >> Cheers, >> S >> >> On Tue, Mar 8, 2016 at 8:19 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>> Hmm, at the end of the log, the pg is still inconsistent. Can you >>> attach a ceph pg query on that pg? >>> -Sam >>> >>> On Mon, Mar 7, 2016 at 3:05 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>> If so, that strongly suggests that the pg was actually never >>>> inconsistent in the first place and that the bug is in scrub itself >>>> presumably getting confused about an object during a write. The next >>>> step would be to get logs like the above from a pg as it scrubs >>>> transitioning from clean to inconsistent. If it's really a race >>>> between scrub and a write, it's probably just non-deterministic, you >>>> could set logging on a set of osds and continuously scrub any pgs >>>> which only map to those osds until you reproduce the problem. >>>> -Sam >>>> >>>> On Mon, Mar 7, 2016 at 2:44 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>> So after the scrub, it came up clean? The inconsistent/missing >>>>> objects reappeared? >>>>> -Sam >>>>> >>>>> On Mon, Mar 7, 2016 at 2:33 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote: >>>>>> Hi Sam, >>>>>> >>>>>> I've done as you requested: >>>>>> >>>>>> pg 70.459 is active+clean+inconsistent, acting [307,210,273,191,132,450] >>>>>> >>>>>> # for i in 307 210 273 191 132 450 ; do >>>>>>> ceph tell osd.$i injectargs '--debug-osd 20 --debug-filestore 20 >>>>>>> --debug-ms 1' >>>>>>> done >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> debug_osd=20/20 debug_filestore=20/20 debug_ms=1/1 >>>>>> >>>>>> >>>>>> # date >>>>>> Mon Mar 7 16:03:38 CST 2016 >>>>>> >>>>>> >>>>>> # ceph pg deep-scrub 70.459 >>>>>> instructing pg 70.459 on osd.307 to deep-scrub >>>>>> >>>>>> >>>>>> >>>>>> Scrub finished around >>>>>> >>>>>> # date >>>>>> Mon Mar 7 16:13:03 CST 2016 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I've tar'd+gziped the files which can be downloaded from here. The logs >>>>>> start a minute or two after today at 16:00. >>>>>> >>>>>> https://drive.google.com/folderview?id=0Bzz8TrxFvfema2NQUmotd1BOTnM&usp=sharing >>>>>> >>>>>> >>>>>> Oddly(to me anyways), this pg is now active+clean: >>>>>> >>>>>> # ceph pg dump | grep 70.459 >>>>>> dumped all in format plain >>>>>> 70.459 21377 0 0 0 0 64515446306 3088 3088 active+clean 2016-03-07 >>>>>> 16:26:57.796537 279563'212832 279602:628151 [307,210,273,191,132,450] 307 >>>>>> [307,210,273,191,132,450] 307 279563'212832 2016-03-07 16:12:30.741984 >>>>>> 279563'212832 2016-03-07 16:12:30.741984 >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> Jeff >>>>>> >>>>>> >>>>>> On Mon, Mar 7, 2016 at 4:11 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> I think the unfound object on repair is fixed by >>>>>>> d51806f5b330d5f112281fbb95ea6addf994324e (not in hammer yet). I >>>>>>> opened http://tracker.ceph.com/issues/15002 for the backport and to >>>>>>> make sure it's covered in ceph-qa-suite. No idea at this time why the >>>>>>> objects are disappearing though. >>>>>>> -Sam >>>>>>> >>>>>>> On Mon, Mar 7, 2016 at 1:57 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>> > The one just scrubbed and now inconsistent. >>>>>>> > -Sam >>>>>>> > >>>>>>> > On Mon, Mar 7, 2016 at 1:57 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> >>>>>>> > wrote: >>>>>>> >> Do you want me to enable this for the pg already with unfound objects >>>>>>> >> or the >>>>>>> >> placement group just scrubbed and now inconsistent? >>>>>>> >> Jeff >>>>>>> >> >>>>>>> >> On Mon, Mar 7, 2016 at 3:54 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>> >>> >>>>>>> >>> Can you enable >>>>>>> >>> >>>>>>> >>> debug osd = 20 >>>>>>> >>> debug filestore = 20 >>>>>>> >>> debug ms = 1 >>>>>>> >>> >>>>>>> >>> on all osds in that PG, rescrub, and convey to us the resulting logs? >>>>>>> >>> -Sam >>>>>>> >>> >>>>>>> >>> On Mon, Mar 7, 2016 at 1:36 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> >>>>>>> >>> wrote: >>>>>>> >>> > Here is a PG which just went inconsistent: >>>>>>> >>> > >>>>>>> >>> > pg 70.459 is active+clean+inconsistent, acting >>>>>>> >>> > [307,210,273,191,132,450] >>>>>>> >>> > >>>>>>> >>> > Attached is the result of a pg query on this. I will wait for your >>>>>>> >>> > feedback before issuing a repair. >>>>>>> >>> > >>>>>>> >>> > From what I read, the inconsistencies are more likely the result of >>>>>>> >>> > ntp, >>>>>>> >>> > but >>>>>>> >>> > all nodes have the local ntp master and all are showing sync. >>>>>>> >>> > >>>>>>> >>> > Regards, >>>>>>> >>> > Jeff >>>>>>> >>> > >>>>>>> >>> > On Mon, Mar 7, 2016 at 3:15 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> >>>>>>> >>> > wrote: >>>>>>> >>> >> >>>>>>> >>> >> [ Keeping this on the users list. ] >>>>>>> >>> >> >>>>>>> >>> >> Okay, so next time this happens you probably want to do a pg query >>>>>>> >>> >> on >>>>>>> >>> >> the PG which has been reported as dirty. I can't help much beyond >>>>>>> >>> >> that, but hopefully Kefu or David will chime in once there's a >>>>>>> >>> >> little >>>>>>> >>> >> more for them to look at. >>>>>>> >>> >> -Greg >>>>>>> >>> >> >>>>>>> >>> >> On Mon, Mar 7, 2016 at 1:00 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> >>>>>>> >>> >> wrote: >>>>>>> >>> >> > Hi Greg, >>>>>>> >>> >> > >>>>>>> >>> >> > I'm running the ceph version hammer, >>>>>>> >>> >> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) >>>>>>> >>> >> > >>>>>>> >>> >> > The hardware migration was performed by just setting the crush >>>>>>> >>> >> > map to >>>>>>> >>> >> > zero >>>>>>> >>> >> > for the OSD we wanted to retire. The system was performing >>>>>>> >>> >> > poorly >>>>>>> >>> >> > with >>>>>>> >>> >> > these older OSDs and we had a difficult time maintaining >>>>>>> >>> >> > stability of >>>>>>> >>> >> > the >>>>>>> >>> >> > system. The old OSDs are still there but all of the data is >>>>>>> >>> >> > now >>>>>>> >>> >> > migrated >>>>>>> >>> >> > to new and/or existing hardware. >>>>>>> >>> >> > >>>>>>> >>> >> > Thanks, >>>>>>> >>> >> > Jeff >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > On Mon, Mar 7, 2016 at 2:56 PM, Gregory Farnum >>>>>>> >>> >> > <gfarnum@xxxxxxxxxx> >>>>>>> >>> >> > wrote: >>>>>>> >>> >> >> >>>>>>> >>> >> >> On Mon, Mar 7, 2016 at 12:07 PM, Jeffrey McDonald >>>>>>> >>> >> >> <jmcdonal@xxxxxxx> >>>>>>> >>> >> >> wrote: >>>>>>> >>> >> >> > Hi, >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > For a while, we've been seeing inconsistent placement groups >>>>>>> >>> >> >> > on >>>>>>> >>> >> >> > our >>>>>>> >>> >> >> > erasure >>>>>>> >>> >> >> > coded system. The placement groups go from a state of >>>>>>> >>> >> >> > active+clean >>>>>>> >>> >> >> > to >>>>>>> >>> >> >> > active+clean+inconsistent after a deep scrub: >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > 2016-03-07 13:45:42.044131 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320s0 deep-scrub stat mismatch, got 21446/21428 objects, >>>>>>> >>> >> >> > 0/0 >>>>>>> >>> >> >> > clones, >>>>>>> >>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 >>>>>>> >>> >> >> > whiteouts, >>>>>>> >>> >> >> > 64682334170/64624353083 bytes,0/0 hit_set_archive bytes. >>>>>>> >>> >> >> > 2016-03-07 13:45:42.044416 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320s0 deep-scrub 18 missing, 0 inconsistent objects >>>>>>> >>> >> >> > 2016-03-07 13:45:42.044464 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320 deep-scrub 73 errors >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > So I tell the placement group to perform a repair: >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > 2016-03-07 13:49:26.047177 7f385d118700 0 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [INF] : >>>>>>> >>> >> >> > 70.320 repair starts >>>>>>> >>> >> >> > 2016-03-07 13:49:57.087291 7f3858b0a700 0 -- >>>>>>> >>> >> >> > 10.31.0.2:6874/13937 >>>>>>> >>> >> >> > >> >>>>>>> >>> >> >> > 10.31.0.6:6824/8127 pipe(0x2e578000 sd=697 :6874 >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > The repair finds missing shards and repairs them, but then I >>>>>>> >>> >> >> > have >>>>>>> >>> >> >> > 18 >>>>>>> >>> >> >> > 'unfound objects' : >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > 2016-03-07 13:51:28.467590 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320s0 repair stat mismatch, got 21446/21428 objects, 0/0 >>>>>>> >>> >> >> > clones, >>>>>>> >>> >> >> > 21446/21428 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 >>>>>>> >>> >> >> > whiteouts, >>>>>>> >>> >> >> > 64682334170/64624353083 bytes,0/0 hit_set_archive bytes. >>>>>>> >>> >> >> > 2016-03-07 13:51:28.468358 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320s0 repair 18 missing, 0 inconsistent objects >>>>>>> >>> >> >> > 2016-03-07 13:51:28.469431 7f385d118700 -1 >>>>>>> >>> >> >> > log_channel(cluster) >>>>>>> >>> >> >> > log >>>>>>> >>> >> >> > [ERR] : >>>>>>> >>> >> >> > 70.320 repair 73 errors, 73 fixed >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > I've traced one of the unfound objects all the way through the >>>>>>> >>> >> >> > system >>>>>>> >>> >> >> > and >>>>>>> >>> >> >> > I've found that they are not really lost. I can fail over >>>>>>> >>> >> >> > the >>>>>>> >>> >> >> > osd >>>>>>> >>> >> >> > and >>>>>>> >>> >> >> > recover the files. This is happening quite regularly now >>>>>>> >>> >> >> > after a >>>>>>> >>> >> >> > large >>>>>>> >>> >> >> > migration of data from old hardware to new(migration is now >>>>>>> >>> >> >> > complete). >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > The system sets the PG into 'recovery', but we've seen the >>>>>>> >>> >> >> > system >>>>>>> >>> >> >> > in >>>>>>> >>> >> >> > a >>>>>>> >>> >> >> > recovering state for many days. Should we just be patient >>>>>>> >>> >> >> > or do >>>>>>> >>> >> >> > we >>>>>>> >>> >> >> > need >>>>>>> >>> >> >> > to dig further into the issue? >>>>>>> >>> >> >> >>>>>>> >>> >> >> You may need to dig into this more, although I'm not sure what >>>>>>> >>> >> >> the >>>>>>> >>> >> >> issue is likely to be. What version of Ceph are you running? How >>>>>>> >>> >> >> did >>>>>>> >>> >> >> you do this hardware migration? >>>>>>> >>> >> >> -Greg >>>>>>> >>> >> >> >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > pg 70.320 is stuck unclean for 704.803040, current state >>>>>>> >>> >> >> > active+recovering, >>>>>>> >>> >> >> > last acting [277,101,218,49,304,412] >>>>>>> >>> >> >> > pg 70.320 is active+recovering, acting >>>>>>> >>> >> >> > [277,101,218,49,304,412], >>>>>>> >>> >> >> > 18 >>>>>>> >>> >> >> > unfound >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > There is no indication of any problems with down OSDs or >>>>>>> >>> >> >> > network >>>>>>> >>> >> >> > issues >>>>>>> >>> >> >> > with >>>>>>> >>> >> >> > OSDs. >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > Thanks, >>>>>>> >>> >> >> > Jeff >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > -- >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > Jeffrey McDonald, PhD >>>>>>> >>> >> >> > Assistant Director for HPC Operations >>>>>>> >>> >> >> > Minnesota Supercomputing Institute >>>>>>> >>> >> >> > University of Minnesota Twin Cities >>>>>>> >>> >> >> > 599 Walter Library email: >>>>>>> >>> >> >> > jeffrey.mcdonald@xxxxxxxxxxx >>>>>>> >>> >> >> > 117 Pleasant St SE phone: +1 612 625-6905 >>>>>>> >>> >> >> > Minneapolis, MN 55455 fax: +1 612 624-8861 >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > >>>>>>> >>> >> >> > _______________________________________________ >>>>>>> >>> >> >> > ceph-users mailing list >>>>>>> >>> >> >> > ceph-users@xxxxxxxxxxxxxx >>>>>>> >>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>> >> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> >> > -- >>>>>>> >>> >> > >>>>>>> >>> >> > Jeffrey McDonald, PhD >>>>>>> >>> >> > Assistant Director for HPC Operations >>>>>>> >>> >> > Minnesota Supercomputing Institute >>>>>>> >>> >> > University of Minnesota Twin Cities >>>>>>> >>> >> > 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >>>>>>> >>> >> > 117 Pleasant St SE phone: +1 612 625-6905 >>>>>>> >>> >> > Minneapolis, MN 55455 fax: +1 612 624-8861 >>>>>>> >>> >> > >>>>>>> >>> >> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > -- >>>>>>> >>> > >>>>>>> >>> > Jeffrey McDonald, PhD >>>>>>> >>> > Assistant Director for HPC Operations >>>>>>> >>> > Minnesota Supercomputing Institute >>>>>>> >>> > University of Minnesota Twin Cities >>>>>>> >>> > 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >>>>>>> >>> > 117 Pleasant St SE phone: +1 612 625-6905 >>>>>>> >>> > Minneapolis, MN 55455 fax: +1 612 624-8861 >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > >>>>>>> >>> > _______________________________________________ >>>>>>> >>> > ceph-users mailing list >>>>>>> >>> > ceph-users@xxxxxxxxxxxxxx >>>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>> > >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> >>>>>>> >> Jeffrey McDonald, PhD >>>>>>> >> Assistant Director for HPC Operations >>>>>>> >> Minnesota Supercomputing Institute >>>>>>> >> University of Minnesota Twin Cities >>>>>>> >> 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >>>>>>> >> 117 Pleasant St SE phone: +1 612 625-6905 >>>>>>> >> Minneapolis, MN 55455 fax: +1 612 624-8861 >>>>>>> >> >>>>>>> >> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Jeffrey McDonald, PhD >>>>>> Assistant Director for HPC Operations >>>>>> Minnesota Supercomputing Institute >>>>>> University of Minnesota Twin Cities >>>>>> 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >>>>>> 117 Pleasant St SE phone: +1 612 625-6905 >>>>>> Minneapolis, MN 55455 fax: +1 612 624-8861 >>>>>> >>>>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> -- >> Email: >> shinobu@xxxxxxxxx >> GitHub: >> shinobu-x >> Blog: >> Life with Distributed Computational System based on OpenSource _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com