Hi,
Is the osd journal flushed completely on a clean shutdown?
In this case, with Jewel, and FileStore osds, and a "clean shutdown"
being:
systemctl stop ceph-osd@${osd}
I understand it's documented practice to issue a --flush-journal after
shutting down down an osd if you're intending to do anything with the
journal, but herein lies the sorry tale...
I've accidentally issued a 'blkdiscard' on a whole SSD device containing
the journals for multiple osds, rather than for a specific partition as
intended.
The affected osds themselves continue to work along happily.
I assume the journals are write-only during normal operation, in which
case it's understandable the osds are oblivious to the underlying
zeroing of the journals (and partition table!).
The GPT partition table and the individual journal partition types and
guids etc. have been recreated, so, in theory at least, a clean shutdown
and restart should be fine *if* the clean shutdown means there's nothing
in the journal to replay on startup.
I've experimented with one of the affected osds (used for "scatch"
purposes, so safe to play with), shutting it down and starting it up
again, and it seems to be happy - somewhat to my surprise. I thought I'd
have to at least use --mkjournal before it would start up again, to
reinstate whatever header/signature is used in the journals.
There are other affected osds which hold live data, so I want to be more
careful there.
One option is to simply kill the affected osds and recreate them, and
allow the data redundancy to take care of things.
However I'm wondering if things should theoretically be ok if I carefully
shutdown and restart each of the remaining osds in turn, or am I taking
some kind of data corruption risk?
Tks,
Chris
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com