Re: osds fails to start with mismatch in id

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ramakrishna,

 

we use the phy. path (containing the serial number) to a disk to prevent complexity and wrong mapping... This path will never change:

/etc/ceph/ceph.conf

                [osd.16]

                devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z0SDCY-part1

                osd_journal = /dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1

                ...

 

regards

Danny

 

 

 

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Irek Fasikhov
Sent: Tuesday, November 11, 2014 6:36 AM
To: Ramakrishna Nishtala (rnishtal); Gregory Farnum
Cc: ceph-users@xxxxxxxx
Subject: Re: [ceph-users] osds fails to start with mismatch in id

 

Hi, Ramakrishna.

I think you understand what the problem is:

[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-56/whoami

56

[ceph@ceph05 ~]$ cat /var/lib/ceph/osd/ceph-57/whoami

57

 

 

Tue Nov 11 2014 at 6:01:40, Ramakrishna Nishtala (rnishtal) <rnishtal@xxxxxxxxx>:

Hi Greg,

Thanks for the pointer. I think you are right. The full story is like this.

 

After installation, everything works fine until I reboot. I do observe udevadm getting triggered in logs, but the devices do not come up after reboot. Exact issue as http://tracker.ceph.com/issues/5194. But this has been fixed a while back per the case details.

As a workaround, I copied the contents from /proc/mounts to fstab and that’s where I landed into the issue.

 

After your suggestion, defined as UUID in fstab, but similar problem.

blkid.tab now moved to tmpfs and also isn’t consistent ever after issuing blkid explicitly to get the UUID’s. Goes in line with ceph-disk comments.

 

Decided to reinstall, dd the partitions, zapdisks etc. Did not help. Very weird that links below change in /dev/disk/by-uuid and /dev/disk/by-partuuid etc.

 

Before reboot

lrwxrwxrwx 1 root root 10 Nov 10 06:31 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 -> ../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 89594989-90cb-4144-ac99-0ffd6a04146e -> ../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c17fe791-5525-4b09-92c4-f90eaaf80dc6 -> ../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 c57541a1-6820-44a8-943f-94d68b4b03d4 -> ../../sdc2

lrwxrwxrwx 1 root root 10 Nov 10 06:31 da7030dd-712e-45e4-8d89-6e795d9f8011 -> ../../sdb2

 

After reboot

lrwxrwxrwx 1 root root 10 Nov 10 09:50 11aca3e2-a9d5-4bcc-a5b0-441c53d473b6 -> ../../sdd2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 89594989-90cb-4144-ac99-0ffd6a04146e -> ../../sde2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c17fe791-5525-4b09-92c4-f90eaaf80dc6 -> ../../sda2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 c57541a1-6820-44a8-943f-94d68b4b03d4 -> ../../sdb2

lrwxrwxrwx 1 root root 10 Nov 10 09:50 da7030dd-712e-45e4-8d89-6e795d9f8011 -> ../../sdh2

 

Essentially, the transformation here is sdb2->sdh2 and sdc2-> sdb2. In fact I haven’t partitioned my sdh at all before the test. The only difference probably from the standard procedure is I have pre-created the partitions for the journal and data, with parted.

 

/lib/udev/rules.d  osd rules has four different partition GUID codes,

"45b0969e-9b03-4f30-b4c6-5ec00ceff106",

"45b0969e-9b03-4f30-b4c6-b4b80ceff106",

"4fbd7e29-9d25-41b8-afd0-062c0ceff05d",

"4fbd7e29-9d25-41b8-afd0-5ec00ceff05d",

 

But all my partitions journal/data are having ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 as partition guid code.

 

Appreciate any help.

 

Regards,

 

Rama

=====

-----Original Message-----
From: Gregory Farnum [mailto:greg@xxxxxxxxxxx]
Sent: Sunday, November 09, 2014 3:36 PM
To: Ramakrishna Nishtala (rnishtal)
Cc: ceph-users@xxxxxxxx
Subject: Re: [ceph-users] osds fails to start with mismatch in id

 

On Sun, Nov 9, 2014 at 3:21 PM, Ramakrishna Nishtala (rnishtal) <rnishtal@xxxxxxxxx> wrote:

> Hi

> I am on ceph 0.87, RHEL 7

> Out of 60 few osd’s start and the rest complain about mismatch about

> id’s as below.

> 2014-11-09 07:09:55.501177 7f4633e01880 -1 OSD id 56 != my id 53

> 2014-11-09 07:09:55.810048 7f636edf4880 -1 OSD id 57 != my id 54

> 2014-11-09 07:09:56.122957 7f459a766880 -1 OSD id 58 != my id 55

> 2014-11-09 07:09:56.429771 7f87f8e0c880 -1 OSD id 0 != my id 56

> 2014-11-09 07:09:56.741329 7fadd9b91880 -1 OSD id 2 != my id 57

> Found one OSD ID in /var/lib/ceph/cluster-id/keyring. To check this

> out manually corrected it and turned authentication to none too, but

> did not help.

> Any clues, how it can be corrected?

 

It sounds like maybe the symlinks to data and journal aren't matching up with where they're supposed to be. This is usually a result of using unstable /dev links that don't always match to the same physical disks. Have you checked that?

-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux