Re: OSD trashed by simple reboot (Debian Jessie, systemd?)

Christian Balzer <chibi@xxxxxxx> · Fri, 5 Jun 2015 13:49:40 +0900



Hello,

On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:

Well, whatever it is, I appear to not be the only one after all:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361

> Looking quickly at the relevant code:
> 
> FileJournal::stop_writer() in src/os/FileJpurnal.cc
> 
> I see that we didn't start seeing the (original) issue until changes in 
> 0.83, which suggests that 0.80 tree might not be doing the same thing. 
> *However* I note that I'm not happy with the placement of the two thread 
> join operations in there - it *looks* to me like 0.80 could in fact be 
> vulnerable to the same journal corrupting problem, so if it occurs again 
> might be interesting to apply the gist of
> https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of 
> course would be best if this was on a test system)!
>
Alas this is neither a test cluster, nor do I have things set up to compile
from source here ATM.
 
Christian

> Cheers
> 
> Mark
> 
> On 05/06/15 15:28, Christian Balzer wrote:
> >
> > Hello Mark,
> >
> > On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:
> >
> >> Sorry Christian,
> >>
> >> I did briefly wonder, then thought, oh yeah, that fix is already
> >> merged in...However - on reflection, perhaps *not* in the 0.80
> >> tree...doh!
> >>
> > No worries, I'm just happy to hear that you think it's the same thing
> > as well.
> >
> > I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
> > after setting "straw_calc_version 1" and doing a reweight all) today.
> >
> > Should it happen again, I know who and where to poke. ^^
> >
> > Christian
> >
> >> On 04/06/15 18:57, Christian Balzer wrote:
> >>>
> >>> Hello,
> >>>
> >>> Actually after going through the changelogs with a fine comb and the
> >>> ole Mark I eyeball I think I might be seeing this:
> >>> ---
> >>> osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma
> >>> Jianpeng, Somnath Roy) ---
> >>>
> >>> The details in the various related bug reports certainly make it look
> >>> related.
> >>> Funny that nobody involved in those bug reports noticed the
> >>> similarity.
> >>>
> >>> Now I wouldn't have installed 0.80.8 due to the regression speed bug
> >>> anyway, but now that 0.80.9 has made it into Jessie backports I shall
> >>> install that tomorrow and hopefully never see that problem again.
> >>>
> >>> Christian
> >>>
> >>> On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:
> >>>
> >>>> On Thu, May 28, 2015 at 12:22 AM, Christian Balzer <chibi@xxxxxxx>
> >>>> wrote:
> >>>>>
> >>>>> Hello Greg,
> >>>>>
> >>>>> On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
> >>>>>
> >>>>>> The description of the logging abruptly ending and the journal
> >>>>>> being bad really sounds like part of the disk is going back in
> >>>>>> time. I'm not sure if XFS internally is set up in such a way that
> >>>>>> something like losing part of its journal would allow that?
> >>>>>>
> >>>>> I'm special. ^o^
> >>>>> No XFS, EXT4. As stated in the original thread, below.
> >>>>> And the (OSD) journal is a raw partition on a DC S3700.
> >>>>>
> >>>>> And since there was at least a 30 seconds pause between the
> >>>>> completion of the "/etc/init.d/ceph stop" and issuing of the
> >>>>> shutdown command, the logging abruptly ending seems to be unlikely
> >>>>> related to the shutdown at all.
> >>>>
> >>>> Oh, sorry...
> >>>> I happened to read this article last night:
> >>>> http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/
> >>>>
> >>>> Depending on configuration (I think you'd need to have a
> >>>> journal-as-file) you could be experiencing that. And again, not many
> >>>> people use ext4 so who knows what other ways there are of things
> >>>> being broken that nobody else has seen yet.
> >>>>
> >>>>>
> >>>>>> If any of the OSD developers have the time it's conceivable a copy
> >>>>>> of the OSD journal would be enlightening (if e.g. the header
> >>>>>> offsets are wrong but there are a bunch of valid journal entries),
> >>>>>> but this is two reports of this issue from you and none very
> >>>>>> similar from anybody else. I'm still betting on something in the
> >>>>>> software or hardware stack misbehaving. (There aren't that many
> >>>>>> people running Debian; there are lots of people running Ubuntu and
> >>>>>> we find bad XFS kernels there not infrequently; I think you're
> >>>>>> hitting something like that.)
> >>>>>>
> >>>>> There should be no file system involved with the raw partition SSD
> >>>>> journal, n'est-ce pas?
> >>>>
> >>>> ...and I guess probably you aren't since you are using partitions.
> >>>>
> >>>>>
> >>>>> The hardware is vastly different, the previous case was on an AMD
> >>>>> system with onboard SATA (SP5100), this one is a SM storage goat
> >>>>> with LSI 3008.
> >>>>>
> >>>>> The only thing they have in common is the Ceph version 0.80.7 (via
> >>>>> the Debian repository, not Ceph) and Debian Jessie as OS with
> >>>>> kernel 3.16 (though there were minor updates on that between those
> >>>>> incidents, backported fixes)
> >>>>>
> >>>>> A copy of the journal would consist of the entire 10GB partition,
> >>>>> since we don't know where in loop it was at the time, right?
> >>>>
> >>>> Yeah.
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com