Re: OSD trashed by simple reboot (Debian Jessie, systemd?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Mark,

On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:

> Sorry Christian,
> 
> I did briefly wonder, then thought, oh yeah, that fix is already merged 
> in...However - on reflection, perhaps *not* in the 0.80 tree...doh!
>
No worries, I'm just happy to hear that you think it's the same thing as
well.

I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
after setting "straw_calc_version 1" and doing a reweight all) today.

Should it happen again, I know who and where to poke. ^^
  
Christian

> On 04/06/15 18:57, Christian Balzer wrote:
> >
> > Hello,
> >
> > Actually after going through the changelogs with a fine comb and the
> > ole Mark I eyeball I think I might be seeing this:
> > ---
> > osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng,
> > Somnath Roy) ---
> >
> > The details in the various related bug reports certainly make it look
> > related.
> > Funny that nobody involved in those bug reports noticed the similarity.
> >
> > Now I wouldn't have installed 0.80.8 due to the regression speed bug
> > anyway, but now that 0.80.9 has made it into Jessie backports I shall
> > install that tomorrow and hopefully never see that problem again.
> >
> > Christian
> >
> > On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:
> >
> >> On Thu, May 28, 2015 at 12:22 AM, Christian Balzer <chibi@xxxxxxx>
> >> wrote:
> >>>
> >>> Hello Greg,
> >>>
> >>> On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:
> >>>
> >>>> The description of the logging abruptly ending and the journal being
> >>>> bad really sounds like part of the disk is going back in time. I'm
> >>>> not sure if XFS internally is set up in such a way that something
> >>>> like losing part of its journal would allow that?
> >>>>
> >>> I'm special. ^o^
> >>> No XFS, EXT4. As stated in the original thread, below.
> >>> And the (OSD) journal is a raw partition on a DC S3700.
> >>>
> >>> And since there was at least a 30 seconds pause between the
> >>> completion of the "/etc/init.d/ceph stop" and issuing of the
> >>> shutdown command, the logging abruptly ending seems to be unlikely
> >>> related to the shutdown at all.
> >>
> >> Oh, sorry...
> >> I happened to read this article last night:
> >> http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/
> >>
> >> Depending on configuration (I think you'd need to have a
> >> journal-as-file) you could be experiencing that. And again, not many
> >> people use ext4 so who knows what other ways there are of things being
> >> broken that nobody else has seen yet.
> >>
> >>>
> >>>> If any of the OSD developers have the time it's conceivable a copy
> >>>> of the OSD journal would be enlightening (if e.g. the header
> >>>> offsets are wrong but there are a bunch of valid journal entries),
> >>>> but this is two reports of this issue from you and none very
> >>>> similar from anybody else. I'm still betting on something in the
> >>>> software or hardware stack misbehaving. (There aren't that many
> >>>> people running Debian; there are lots of people running Ubuntu and
> >>>> we find bad XFS kernels there not infrequently; I think you're
> >>>> hitting something like that.)
> >>>>
> >>> There should be no file system involved with the raw partition SSD
> >>> journal, n'est-ce pas?
> >>
> >> ...and I guess probably you aren't since you are using partitions.
> >>
> >>>
> >>> The hardware is vastly different, the previous case was on an AMD
> >>> system with onboard SATA (SP5100), this one is a SM storage goat with
> >>> LSI 3008.
> >>>
> >>> The only thing they have in common is the Ceph version 0.80.7 (via
> >>> the Debian repository, not Ceph) and Debian Jessie as OS with kernel
> >>> 3.16 (though there were minor updates on that between those
> >>> incidents, backported fixes)
> >>>
> >>> A copy of the journal would consist of the entire 10GB partition,
> >>> since we don't know where in loop it was at the time, right?
> >>
> >> Yeah.
> >>
> >
> >
> 
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux