Le 09/07/2012 19:14, Samuel Just a écrit :
Can you restart the node that failed to complete the upgrade with
Well, it's a little big complicated ; I now run those nodes with XFS,
and I've long-running jobs on it right now, so I can't stop the ceph
cluster at the moment.
As I've keeped the original broken btrfs volumes, I tried this morning
to run the old osd in parrallel, using the $cluster variable. I only
have partial success.
I tried using different port for the mons, but ceph want to use the old
mon map. I can edit it (epoch 1) but it seems to use 'latest' instead,
the format isn't compatible with monmaptool and I don't know how to
"inject" the modified on a non running cluster.
Anyway, osd seems to start fine, and I can reproduce the bug :
debug filestore = 20
debug osd = 20
I've put it in [global], is it sufficient ?
and post the log after an hour or so of running? The upgrade process
might legitimately take a while.
-Sam
Only 15 minutes running, but ceph-osd is consumming lots of cpu, and a
strace shows lots of pread.
Here is the log :
[..]
2012-07-10 11:33:29.560052 7f3e615ac780 0
filestore(/CEPH-PROD/data/osd.1) mount syncfs(2) syscall not support by
glibc
2012-07-10 11:33:29.560062 7f3e615ac780 0
filestore(/CEPH-PROD/data/osd.1) mount no syncfs(2), but the btrfs SYNC
ioctl will suffice
2012-07-10 11:33:29.560172 7f3e615ac780 -1
filestore(/CEPH-PROD/data/osd.1) FileStore::mount : stale version stamp
detected: 2. Proceeding, do_update is set, performing disk format upgrade.
2012-07-10 11:33:29.560233 7f3e615ac780 0
filestore(/CEPH-PROD/data/osd.1) mount found snaps <3744666,3746725>
2012-07-10 11:33:29.560263 7f3e615ac780 10
filestore(/CEPH-PROD/data/osd.1) current/ seq was 3746725
2012-07-10 11:33:29.560267 7f3e615ac780 10
filestore(/CEPH-PROD/data/osd.1) most recent snap from
<3744666,3746725> is 3746725
2012-07-10 11:33:29.560280 7f3e615ac780 10
filestore(/CEPH-PROD/data/osd.1) mount rolling back to consistent snap
3746725
2012-07-10 11:33:29.839281 7f3e615ac780 5
filestore(/CEPH-PROD/data/osd.1) mount op_seq is 3746725
... and nothing more.
I'll let him running for 3 hours. If I have another message, I'll let
you know.
Cheers,
--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html