Re: ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

SCHAER Frederic <frederic.schaer@xxxxxx> · Thu, 9 Oct 2014 13:20:26 +0000

Hi Loic,

With this example disk/machine that I left untouched until now :

/dev/sdb :
 /dev/sdb1 ceph data, prepared, cluster ceph, osd.44, journal /dev/sdb2
 /dev/sdb2 ceph journal, for /dev/sdb1

[root@ceph1 ~]# ll /dev/disk/by-partuuid/
total 0
lrwxrwxrwx 1 root root 10 Oct  9 15:09 2c27dbda-fbe3-48d6-80fe-b513e1c11702 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Oct  9 15:09 d2352e3b-f7f2-40c7-8273-8bfa8ab4206a -> ../../sdb2

This is the blkid output :

[root@ceph1 ~]# blkid  /dev/sdb2
[root@ceph1 ~]# blkid  /dev/sdb1
/dev/sdb1: UUID="c8feaaad-bd83-41a3-a82a-0a8727d0b067" TYPE="xfs" PARTLABEL="ceph data" PARTUUID="2c27dbda-fbe3-48d6-80fe-b513e1c11702"

If I run "partx -u /dev/sdb", then the filesystem will get activated and the OSD started.
And sometimes, it just works without intervention, but that's the exception.

I modified the udev script this morning, so I can give you the output of what happens when things go wrong : links are created, but somewhere the UUIDD is wrongly detected by ceph-osd, as far as I understand :

Thu Oct  9 11:15:13 CEST 2014
+ PARTNO=2
+ NAME=sde2
+ PARENT_NAME=sde
++ /usr/sbin/sgdisk --info=2 /dev/sde
++ grep 'Partition GUID code'
++ awk '{print $4}'
++ tr '[:upper:]' '[:lower:]'
+ ID_PART_ENTRY_TYPE=45b0969e-9b03-4f30-b4c6-b4b80ceff106
+ '[' -z 45b0969e-9b03-4f30-b4c6-b4b80ceff106 ']'
++ /usr/sbin/sgdisk --info=2 /dev/sde
++ grep 'Partition unique GUID'
++ awk '{print $4}'
++ tr '[:upper:]' '[:lower:]'
+ ID_PART_ENTRY_UUID=a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ mkdir -p /dev/disk/by-partuuid
+ ln -sf ../../sde2 /dev/disk/by-partuuid/a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ mkdir -p /dev/disk/by-parttypeuuid
+ ln -sf ../../sde2 /dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.a9e8d490-82a7-48c1-8ef1-aff92351c69c
+ case $ID_PART_ENTRY_TYPE in
+ /usr/sbin/ceph-disk -v activate-journal /dev/sde2
INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid --osd-journal /dev/sde2
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
DEBUG:ceph-disk:Journal /dev/sde2 has OSD UUID 00000000-0000-0000-0000-000000000000
INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory
ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
+ exit
+ exec

regards

Frederic.

P.S : in your puppet module, it seems impossible to specify osd disks by path, i.e : 
ceph::profile::params::osds:
  '/dev/disk/by-path/pci-0000\:0a\:00.0-scsi-0\:2\:':
(I tried without the backslashes too)

-----Message d'origine-----
De : Loic Dachary [mailto:loic@xxxxxxxxxxx] 
Envoyé : jeudi 9 octobre 2014 15:01
À : SCHAER Frederic; ceph-users@xxxxxxxxxxxxxx
Objet : Re:  ceph-dis prepare : UUID=00000000-0000-0000-0000-000000000000

Bonjour,

I'm not familiar with RHEL7 but willing to learn ;-) I recently ran into confusing situations regarding the content of /dev/disk/by-partuuid because partprobe was not called when it should have (ubuntu). On RHEL, kpartx is used instead because partprobe reboots, apparently. What is the content of /dev/disk/by-partuuid on your machine ?

ls -l /dev/disk/by-partuuid 

Cheers

On 09/10/2014 12:24, SCHAER Frederic wrote:
> Hi,
> 
>  
> 
> I am setting up a test ceph cluster, on decommissioned  hardware (hence : not optimal, I know).
> 
> I have installed CentOS7, installed and setup ceph mons and OSD machines using puppet, and now I'm trying to add OSDs with the servers OSD disks. and I have issues (of course ;) )
> 
> I used the Ceph RHEL7 RPMs (ceph-0.80.6-0.el7.x86_64)
> 
>  
> 
> When I run "ceph-disk prepare" for a disk, I most of the time (but not always) get the partitions created, but not activated :
> 
>  
> 
> [root@ceph4 ~]# ceph-disk list|grep sdh
> 
> WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
> 
> /dev/sdh :
> 
> /dev/sdh1 ceph data, prepared, cluster ceph, journal /dev/sdh2
> 
> /dev/sdh2 ceph journal, for /dev/sdh1
> 
>  
> 
> I tried to debug udev rules thinking they were not launched to activate the OSD, but they are, and they fail on this error :
> 
>  
> 
> + ln -sf ../../sdh2 /dev/disk/by-partuuid/5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + mkdir -p /dev/disk/by-parttypeuuid
> 
> + ln -sf ../../sdh2 /dev/disk/by-parttypeuuid/45b0969e-9b03-4f30-b4c6-b4b80ceff106.5b3bde8f-ccad-4093-a8a5-ad6413ae8931
> 
> + case $ID_PART_ENTRY_TYPE in
> 
> + /usr/sbin/ceph-disk -v activate-journal /dev/sdh2
> 
> INFO:ceph-disk:Running command: /usr/bin/ceph-osd -i 0 --get-journal-uuid --osd-journal /dev/sdh2
> 
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 
> DEBUG:ceph-disk:Journal /dev/sdh2 has OSD UUID 00000000-0000-0000-0000-000000000000
> 
> INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
> 
> error: /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: No such file or directory
> 
> ceph-disk: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
> 
> + exit
> 
> + exec
> 
>  
> 
> You'll notice the zeroed UUID.
> 
> Because of this, I looked at the output of ceph-disk prepare, and saw that partx complains at the end (this is the partx -a command) :
> 
>  
> 
> Warning: The kernel is still using the old partition table.
> 
> The new table will be used at the next reboot.
> 
> The operation has completed successfully.
> 
> partx: /dev/sdh: error adding partitions 1-2
> 
>  
> 
> And indeed, running "partx -a /dev/sdh" does not change anything.
> 
> But I just discovered that running "partx -u /dev/sdh" will fix everything ..????
> 
> I.e : right after I send this update command to the kernel, my debug logs show that the udev rule does everything fine and the OSD starts up.
> 
>  
> 
> I'm therefore wondering what I did wrong ?
> 
> is this CentOS 7 that is misbehaving, or the kernel, or.?
> 
> Any reason why partx -a is used instead of partx -u ?
> 
>  
> 
> I'd be glad to hear others advice on this !
> 
> Thanks && regards
> 
>  
> 
> Frederic Schaer
> 
>  
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com