Re: OSD trashed by simple reboot (Debian Jessie, systemd?)

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Fri, 05 Jun 2015 16:33:46 +1200

Looking quickly at the relevant code:

FileJournal::stop_writer() in src/os/FileJpurnal.cc

I see that we didn't start seeing the (original) issue until changes in 
0.83, which suggests that 0.80 tree might not be doing the same thing. 
*However* I note that I'm not happy with the placement of the two thread 
join operations in there - it *looks* to me like 0.80 could in fact be 
vulnerable to the same journal corrupting problem, so if it occurs again 
might be interesting to apply the gist of
https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of 
course would be best if this was on a test system)!

Cheers

Mark

On 05/06/15 15:28, Christian Balzer wrote:

Hello Mark,

On Thu, 04 Jun 2015 20:34:55 +1200 Mark Kirkwood wrote:

Sorry Christian,

I did briefly wonder, then thought, oh yeah, that fix is already merged
in...However - on reflection, perhaps *not* in the 0.80 tree...doh!

No worries, I'm just happy to hear that you think it's the same thing as
well.

I upgraded to 0.80.9 (fun fact, NO predicted and actual data movement
after setting "straw_calc_version 1" and doing a reweight all) today.

Should it happen again, I know who and where to poke. ^^

Christian

On 04/06/15 18:57, Christian Balzer wrote:

Hello,

Actually after going through the changelogs with a fine comb and the
ole Mark I eyeball I think I might be seeing this:
---
osd: fix journal direct-io shutdown (#9073 Mark Kirkwood, Ma Jianpeng,
Somnath Roy) ---

The details in the various related bug reports certainly make it look
related.
Funny that nobody involved in those bug reports noticed the similarity.

Now I wouldn't have installed 0.80.8 due to the regression speed bug
anyway, but now that 0.80.9 has made it into Jessie backports I shall
install that tomorrow and hopefully never see that problem again.

Christian

On Thu, 28 May 2015 07:01:15 -0700 Gregory Farnum wrote:

On Thu, May 28, 2015 at 12:22 AM, Christian Balzer <chibi@xxxxxxx>
wrote:

Hello Greg,

On Wed, 27 May 2015 22:53:43 -0700 Gregory Farnum wrote:

The description of the logging abruptly ending and the journal being
bad really sounds like part of the disk is going back in time. I'm
not sure if XFS internally is set up in such a way that something
like losing part of its journal would allow that?

I'm special. ^o^
No XFS, EXT4. As stated in the original thread, below.
And the (OSD) journal is a raw partition on a DC S3700.

And since there was at least a 30 seconds pause between the
completion of the "/etc/init.d/ceph stop" and issuing of the
shutdown command, the logging abruptly ending seems to be unlikely
related to the shutdown at all.

Oh, sorry...
I happened to read this article last night:
http://lwn.net/SubscriberLink/645720/01149aa7c58954eb/

Depending on configuration (I think you'd need to have a
journal-as-file) you could be experiencing that. And again, not many
people use ext4 so who knows what other ways there are of things being
broken that nobody else has seen yet.

If any of the OSD developers have the time it's conceivable a copy
of the OSD journal would be enlightening (if e.g. the header
offsets are wrong but there are a bunch of valid journal entries),
but this is two reports of this issue from you and none very
similar from anybody else. I'm still betting on something in the
software or hardware stack misbehaving. (There aren't that many
people running Debian; there are lots of people running Ubuntu and
we find bad XFS kernels there not infrequently; I think you're
hitting something like that.)

There should be no file system involved with the raw partition SSD
journal, n'est-ce pas?

...and I guess probably you aren't since you are using partitions.

The hardware is vastly different, the previous case was on an AMD
system with onboard SATA (SP5100), this one is a SM storage goat with
LSI 3008.

The only thing they have in common is the Ceph version 0.80.7 (via
the Debian repository, not Ceph) and Debian Jessie as OS with kernel
3.16 (though there were minor updates on that between those
incidents, backported fixes)

A copy of the journal would consist of the entire 10GB partition,
since we don't know where in loop it was at the time, right?

Yeah.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com