Found some more info, but getting weird... All three OSD nodes shows the same unknown cluster message on all the OSD disks. I don't know where it came from, all the nodes were configured using ceph-deploy on the admin node. In any case, the OSD's seem to be up and running, the health is ok.no ceph-disk@ services are running on any of the OSD nodes which I didn't notice before and each node was setup the exact same, yet there are different services listed under systemctl:OSD NODE 1:Output in earlier emailOSD NODE 2:● ceph-disk@dev-sdb1.service loaded failed failed Ceph disk activation: /dev/sdb1
● ceph-disk@dev-sdb2.service loaded failed failed Ceph disk activation: /dev/sdb2
● ceph-disk@dev-sdb5.service loaded failed failed Ceph disk activation: /dev/sdb5
● ceph-disk@dev-sdc2.service loaded failed failed Ceph disk activation: /dev/sdc2
● ceph-disk@dev-sdc4.service loaded failed failed Ceph disk activation: /dev/sdc4
OSD NODE 3:
● ceph-disk@dev-sdb1.service loaded failed failed Ceph disk activation: /dev/sdb1
● ceph-disk@dev-sdb3.service loaded failed failed Ceph disk activation: /dev/sdb3
● ceph-disk@dev-sdb4.service loaded failed failed Ceph disk activation: /dev/sdb4
● ceph-disk@dev-sdb5.service loaded failed failed Ceph disk activation: /dev/sdb5
● ceph-disk@dev-sdc2.service loaded failed failed Ceph disk activation: /dev/sdc2
● ceph-disk@dev-sdc3.service loaded failed failed Ceph disk activation: /dev/sdc3
● ceph-disk@dev-sdc4.service loaded failed failed Ceph disk activation: /dev/sdc4
From my understanding, the disks have already been activated... Should these services even be running or enabled?
Mike
On Tue, Nov 29, 2016 at 6:33 PM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:Sorry about that... Here is the output of ceph-disk list:ceph-disk list/dev/dm-0 other, xfs, mounted on //dev/dm-1 swap, swap/dev/dm-2 other, xfs, mounted on /home/dev/sda :/dev/sda2 other, LVM2_member/dev/sda1 other, xfs, mounted on /boot/dev/sdb :/dev/sdb1 ceph journal/dev/sdb2 ceph journal/dev/sdb3 ceph journal/dev/sdb4 ceph journal/dev/sdb5 ceph journal/dev/sdc :/dev/sdc1 ceph journal/dev/sdc2 ceph journal/dev/sdc3 ceph journal/dev/sdc4 ceph journal/dev/sdc5 ceph journal/dev/sdd :/dev/sdd1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.0 /dev/sde :/dev/sde1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.1 /dev/sdf :/dev/sdf1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.2 /dev/sdg :/dev/sdg1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.3 /dev/sdh :/dev/sdh1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.4 /dev/sdi :/dev/sdi1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.5 /dev/sdj :/dev/sdj1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.6 /dev/sdk :/dev/sdk1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.7 /dev/sdl :/dev/sdl1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.8 /dev/sdm :/dev/sdm1 ceph data, active, unknown cluster e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9, osd.9 On Tue, Nov 29, 2016 at 6:32 PM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:I forgot to add:On Tue, Nov 29, 2016 at 6:28 PM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:So it looks like the journal partition is mounted:ls -lah /var/lib/ceph/osd/ceph-0/journal lrwxrwxrwx. 1 ceph ceph 9 Oct 10 16:11 /var/lib/ceph/osd/ceph-0/journal -> /dev/sdb1 Here is the output of journalctl -xe when I try to start the ceph-diak@dev-sdb1 service:sh[17481]: mount_activate: Failed to activatesh[17481]: unmount: Unmounting /var/lib/ceph/tmp/mnt.m9ek7Wsh[17481]: command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.m9ek7Wsh[17481]: Traceback (most recent call last):sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module>sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5011, in run sh[17481]: main(sys.argv[1:])sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4962, in main sh[17481]: args.func(args)sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4720, in <lambda> sh[17481]: func=lambda args: main_activate_space(name, args),sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3739, in main_activate_space sh[17481]: reactivate=args.reactivate,sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3073, in mount_activate sh[17481]: (osd_id, cluster) = activate(path, activate_key_template, init)sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3220, in activate sh[17481]: ' with fsid %s' % ceph_fsid)sh[17481]: ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 sh[17481]: Traceback (most recent call last):sh[17481]: File "/usr/sbin/ceph-disk", line 9, in <module>sh[17481]: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5011, in run sh[17481]: main(sys.argv[1:])sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4962, in main sh[17481]: args.func(args)sh[17481]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4399, in main_trigger sh[17481]: raise Error('return code ' + str(ret))sh[17481]: ceph_disk.main.Error: Error: return code 1systemd[1]: ceph-disk@dev-sdb1.service: main process exited, code=exited, status=1/FAILUREsystemd[1]: Failed to start Ceph disk activation: /dev/sdb1.I dont understand this error:ceph_disk.main.Error: Error: No cluster conf found in /etc/ceph with fsid e1d7b4ae-2dcd-40ee-bea5-d103fe1fa9c9 My fsid in ceph.conf is:fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8 I don't know why the fsid would change or be different. I thought I had a basic cluster setup, I don't understand what's going wrong.MikeOn Tue, Nov 29, 2016 at 5:15 PM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:Hi John,Thanks I wasn't sure if something happened to the journal partitions or not.Right now, the ceph-osd.0-9 services are back up and the cluster health is good, but none of the ceph-disk@dev-sd* services are running. How can I get the Journal partitions mounted again?Cheers,MikeOn Tue, Nov 29, 2016 at 4:30 PM, John Petrini <jpetrini@xxxxxxxxxxxx> wrote:Also, don't run sgdisk again; that's just for creating the journal partitions. ceph-disk is a service used for prepping disks, only the OSD services need to be running as far as I know. Are the ceph-osd@x. services running now that you've mounted the disks?___
John Petrini
NOC Systems Administrator // CoreDial, LLC // coredial.com //
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
P: 215.297.4400 x232 // F: 215.297.4401 // E:jpetrini@xxxxxxxxxxxx The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
On Tue, Nov 29, 2016 at 7:27 PM, John Petrini <jpetrini@xxxxxxxxxxxx> wrote:What command are you using to start your OSD's?___
John Petrini
NOC Systems Administrator // CoreDial, LLC // coredial.com //
Hillcrest I, 751 Arbor Way, Suite 150, Blue Bell PA, 19422
P: 215.297.4400 x232 // F: 215.297.4401 // E:jpetrini@xxxxxxxxxxxx The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
On Tue, Nov 29, 2016 at 7:19 PM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:I was able to bring the osd's up by looking at my other OSD node which is the exact same hardware/disks and finding out which disks map. But I still cant bring up any of the start ceph-disk@dev-sd* services... When I first installed the cluster and got the OSD's up, I had to run the following:# sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdb # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdb # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdb # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdb # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdb # sgdisk -t 1:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdc # sgdisk -t 2:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdc # sgdisk -t 3:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdc # sgdisk -t 4:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdc # sgdisk -t 5:45b0969e-9b03-4f30-b4c6-b4b8
0ceff106 /dev/sdc
Do i need to run that again?
Cheers,
Mike
On Tue, Nov 29, 2016 at 4:13 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:Normally they mount based upon the gpt label, if it's not working you can mount the disk under /mnt and then cat the file called whoami to find out the osd number
On 29 Nov 2016 23:56, "Mike Jacobacci" <mikej@xxxxxxxxxx> wrote:______________________________OK I am in some trouble now and would love some help! After updating none of the OSDs on the node will come back up:● ceph-disk@dev-sdb1.service loaded failed failed Ceph disk activation: /dev/sdb1● ceph-disk@dev-sdb2.service loaded failed failed Ceph disk activation: /dev/sdb2● ceph-disk@dev-sdb3.service loaded failed failed Ceph disk activation: /dev/sdb3● ceph-disk@dev-sdb4.service loaded failed failed Ceph disk activation: /dev/sdb4● ceph-disk@dev-sdb5.service loaded failed failed Ceph disk activation: /dev/sdb5● ceph-disk@dev-sdc1.service loaded failed failed Ceph disk activation: /dev/sdc1● ceph-disk@dev-sdc2.service loaded failed failed Ceph disk activation: /dev/sdc2● ceph-disk@dev-sdc3.service loaded failed failed Ceph disk activation: /dev/sdc3● ceph-disk@dev-sdc4.service loaded failed failed Ceph disk activation: /dev/sdc4● ceph-disk@dev-sdc5.service loaded failed failed Ceph disk activation: /dev/sdc5● ceph-disk@dev-sdd1.service loaded failed failed Ceph disk activation: /dev/sdd1● ceph-disk@dev-sde1.service loaded failed failed Ceph disk activation: /dev/sde1● ceph-disk@dev-sdf1.service loaded failed failed Ceph disk activation: /dev/sdf1● ceph-disk@dev-sdg1.service loaded failed failed Ceph disk activation: /dev/sdg1● ceph-disk@dev-sdh1.service loaded failed failed Ceph disk activation: /dev/sdh1● ceph-disk@dev-sdi1.service loaded failed failed Ceph disk activation: /dev/sdi1● ceph-disk@dev-sdj1.service loaded failed failed Ceph disk activation: /dev/sdj1● ceph-disk@dev-sdk1.service loaded failed failed Ceph disk activation: /dev/sdk1● ceph-disk@dev-sdl1.service loaded failed failed Ceph disk activation: /dev/sdl1● ceph-disk@dev-sdm1.service loaded failed failed Ceph disk activation: /dev/sdm1● ceph-osd@0.service loaded failed failed Ceph object storage daemon● ceph-osd@1.service loaded failed failed Ceph object storage daemon● ceph-osd@2.service loaded failed failed Ceph object storage daemon● ceph-osd@3.service loaded failed failed Ceph object storage daemon● ceph-osd@4.service loaded failed failed Ceph object storage daemon● ceph-osd@5.service loaded failed failed Ceph object storage daemon● ceph-osd@6.service loaded failed failed Ceph object storage daemon● ceph-osd@7.service loaded failed failed Ceph object storage daemon● ceph-osd@8.service loaded failed failed Ceph object storage daemon● ceph-osd@9.service loaded failed failed Ceph object storage daemonI did some searching and saw that the issue is that the disks aren't mounting... My question is how can I mount them correctly again (note sdb and sdc are ssd for cache)? I am not sure which disk maps to ceph-osd@0 and so on. Also, can I add them to /etc/fstab to work around?Cheers,MikeOn Tue, Nov 29, 2016 at 10:41 AM, Mike Jacobacci <mikej@xxxxxxxxxx> wrote:Hello,I would like to install OS updates on the ceph cluster and activate a second 10gb port on the OSD nodes, so I wanted to verify the correct steps to perform maintenance on the cluster. We are only using rbd to back our xenserver vm's at this point, and our cluster consists of 3 OSD nodes, 3 Mon nodes and 1 admin node... So would this be the correct steps:1. Shut down VM's?2. run "ceph osd set noout" on admin node3. install updates on each monitoring node and reboot one at a time.4. install updates on OSD nodes and activate second 10gb port, reboot one OSD node at a time5. once all nodes back up, run "ceph osd unset noout"6. bring VM's back onlineDoes this sound correct?Cheers,Mike_________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com