Success! There was an issue related to my operating system install procedure that was causing the journals to become corrupt, but it was not caused by ceph! That bug fixed; now the procedure on shutdown in this thread has been verified to work as expected. Thanks for all the help.
-Chris
On 03/01/17 15:36, Heller, Chris wrote:
I see. My journal is specified in ceph.conf. I'm not removing it from the OSD so sounds like flushing isn't needed in my case.
Okay but it seems it's not right if it's saying it's a non-block journal. (meaning a file, not a block device).Double check your ceph.conf... make sure the path works, and somehow make sure the [osd.x] actually matches that osd (no idea how to test that, esp. if the osd doesn't start ... maybe just increase logging).Or just make a symlink for now, just to see if it solves the problem, which would imply the ceph.conf is wrong.-Chris
On 03/01/17 14:41, Heller, Chris wrote:
That is a good question, and I'm not sure how to answer. The journal is on its own volume, and is not a symlink. Also how does one flush the journal? That seems like an important step when bringing down a cluster safely.
You only need to flush the journal if you are removing it from the osd, replacing it with a different journal. So since your journal is on its own, then you need either a symlink in the osd directory named "journal" which points to the device (ideally not /dev/sdx but /dev/disk/by-.../), or you put it in the ceph.conf. And since it said you have a non-block journal now, it probably means there is a file... you should remove that (rename it to journal.junk until you're sure it's not an important file, and delete it later). This is where I've stopped. All but one OSD came back online. One has this backtrace:
2017-02-28 17:44:54.884235 7fb2ba3187c0 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
Are the journals inline? or separate? If they're separate, the above means the journal symlink/config is missing, so it would possibly make a new journal, which would be bad if you didn't flush the old journal before. And also just one osd is easy enough to replace (which I wouldn't do until the cluster settled down and recovered). So it's lame for it to be broken, but it's still recoverable if that's the only issue.
--
--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------
|
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com