Hello,
I was on a old version of ceph. And it showed a warning saying:
crush map has straw_calc_version=0
I rode that adjusting it will only rebalance all so admin should select when to do it. So I went straigth and ran:
ceph osd crush tunables optimal
It rebalanced as it said but then I started to have lots of pg wrong. I discovered that it was because my OSD1. I thought it was disk faillure so I added a new OSD6 and system started to rebalance. Anyway OSD was not starting.I thought to wipe it all. But I preferred to leave disk as it was, and journal intact, in case I can recover and get data from it. (See mail: Scrub failing all the time, new inconsistencies keep appearing).
So here's the information. But it has OSD1 replaced by OSD3, sorry.
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
As I said I still have OSD1 intact so I can do whatever you need except readding to the cluster. Since I don't know what It will do, maybe cause havok.
0 1.00000 1.00000 926G 271G 654G 29.34 1.10 369
2 1.00000 1.00000 460G 284G 176G 61.67 2.32 395
4 1.00000 1.00000 465G 151G 313G 32.64 1.23 214
3 1.36380 1.00000 1396G 239G 1157G 17.13 0.64 340
6 0.90919 1.00000 931G 164G 766G 17.70 0.67 210
TOTAL 4179G 1111G 3067G 26.60
MIN/MAX VAR: 0.64/2.32 STDDEV: 16.99
Best regards,
On 14/09/17 17:12, David Turner wrote:
What do you mean by "updated crush map to 1"? Can you please provide a copy of your crush map and `ceph osd df`?
On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado <gaguilar@xxxxxxxxxxxxxxxxxx> wrote:
_______________________________________________Hi,
I'recently updated crush map to 1 and did all relocation of the pgs. At the end I found that one of the OSD is not starting.
This is what it shows:
2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal (Aborted) **
in thread 7f49cbe12700 thread_name:filestore_sync
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
1: (()+0x9616ee) [0xa93c6ef6ee]
2: (()+0x11390) [0x7f49d9937390]
3: (gsignal()+0x38) [0x7f49d78d3428]
4: (abort()+0x16a) [0x7f49d78d502a]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0xa93c7ef43b]
6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
8: (()+0x76ba) [0x7f49d992d6ba]
9: (clone()+0x6d) [0x7f49d79a53dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-3> 2017-09-13 10:37:34.253808 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.029683 0 0.000000
-2> 2017-09-13 10:37:34.253848 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
-1> 2017-09-13 10:37:34.255018 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 pg[10.90(unlocked)] enter Initial
0> 2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal (Aborted) **
in thread 7f49cbe12700 thread_name:filestore_sync
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
1: (()+0x9616ee) [0xa93c6ef6ee]
2: (()+0x11390) [0x7f49d9937390]
3: (gsignal()+0x38) [0x7f49d78d3428]
4: (abort()+0x16a) [0x7f49d78d502a]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x26b) [0xa93c7ef43b]
6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb]
7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd]
8: (()+0x76ba) [0x7f49d992d6ba]
9: (clone()+0x6d) [0x7f49d79a53dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-osd.1.log
--- end dump of recent events ---
Is there any way to recover it or should I open a bug?
Best regards
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com