Hi, back to work, i face my problem.
@Alexandre : AMDTurion for N54L HP Microserver.
This server is OSD and LXC only, no mon working in.
After rebooting the whole cluster and attempting to add a third time the
same disk :
ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.47226 root default
-2 3.65898 host jon
1 2.29999 osd.1 up 1.00000 1.00000
3 1.35899 osd.3 up 1.00000 1.00000
-3 0.34999 host daenerys
0 0.34999 osd.0 up 1.00000 1.00000
-4 1.64969 host tyrion
2 0.44969 osd.2 up 1.00000 1.00000
4 1.20000 osd.4 up 1.00000 1.00000
-5 1.81360 host jaime
5 1.81360 osd.5 up 1.00000 1.00000
6 0 osd.6 down 0 1.00000
7 0 osd.7 down 0 1.00000
8 0 osd.8 down 0 1.00000
6,7,8 disks are the same issue for the same disk (which isn't faulty).
Any clue ?
I'm gonna try soon to create the osd on this disk in another server.
Thanks.
Best regards
Le 26/07/2017 à 15:53, Alexandre DERUMIER a écrit :
Hi Phil,
It's possible that rocksdb have a bug with some old cpus currently (old xeon and some opteron)
I have the same behaviour with new cluster when creating mons
http://tracker.ceph.com/issues/20529
What is your cpu model ?
in your log:
sh[1869]: in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]: ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) luminous (dev)
sh[1869]: 1: (()+0x9bc562) [0x558561169562]
sh[1869]: 2: (()+0x110c0) [0x7f6d835cb0c0]
sh[1869]: 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x871) [0x5585615788b1]
sh[1869]: 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x26bc) [0x55856145ca4c]
sh[1869]: 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x11f) [0x558561423e6f]
sh[1869]: 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std:
sh[1869]: 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb:
sh[1869]: 8: (RocksDBStore::do_open(std::ostream&, bool)+0x68e) [0x5585610af76e]
sh[1869]: 9: (RocksDBStore::create_and_open(std::ostream&)+0xd7) [0x5585610b0d27]
sh[1869]: 10: (BlueStore::_open_db(bool)+0x326) [0x55856103c6d6]
sh[1869]: 11: (BlueStore::mkfs()+0x856) [0x55856106d406]
sh[1869]: 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x348) [0x558560bc98f8]
sh[1869]: 13: (main()+0xe58) [0x558560b1da78]
sh[1869]: 14: (__libc_start_main()+0xf1) [0x7f6d825802b1]
sh[1869]: 15: (_start()+0x2a) [0x558560ba4dfa]
sh[1869]: 2017-07-16 14:46:00.763521 7f6d85db3c80 -1 *** Caught signal (Illegal instruction) **
sh[1869]: in thread 7f6d85db3c80 thread_name:ceph-osd
sh[1869]: ceph version 12.1.0 (330b5d17d66c6c05b08ebc129d3e6e8f92f73c60) luminous (dev)
sh[1869]: 1: (()+0x9bc562) [0x558561169562]
----- Mail original -----
De: "Phil Schwarz" <infolist@xxxxxxxxxxxxxx>
À: "Udo Lembke" <ulembke@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Envoyé: Dimanche 16 Juillet 2017 15:04:16
Objet: Re: Broken Ceph Cluster when adding new one - Proxmox 5.0 & Ceph Luminous
Le 15/07/2017 à 23:09, Udo Lembke a écrit :
Hi,
On 15.07.2017 16:01, Phil Schwarz wrote:
Hi,
...
While investigating, i wondered about my config :
Question relative to /etc/hosts file :
Should i use private_replication_LAN Ip or public ones ?
private_replication_LAN!! And the pve-cluster should use another network
(nics) if possible.
Udo
OK, thanks Udo.
After investigation, i did :
- set Noout OSDs
- Stopped CPU-pegging LXC
- Check the cabling
- Restart the whole cluster
Everything went fine !
But, when i tried to add a new OSD :
fdisk /dev/sdc --> Deleted the partition table
parted /dev/sdc --> mklabel msdos (Disk came from a ZFS FreeBSD system)
dd if=/dev/null of=/dev/sdc
ceph-disk zap /dev/sdc
dd if=/dev/zero of=/dev/sdc bs=10M count=1000
And recreated the OSD via Web GUI.
Same result, the OSD is known by the node, but not by the cluster.
Logs seem to show an issue with this bluestore OSD, have a look at the file.
I'm gonna give a try to OSD recreating using Filestore.
Thanks
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com