Hello, > If you manually create your journal partition, you need to specify the correct > Ceph partition GUID in order for the system and Ceph to identify the partition > as Ceph journal and affect correct ownership and permissions at boot via udev. In my latest run, I let ceph-ansible creating partitions, everything seem to be fine. > I used something like this to create the partition : > sudo sgdisk --new=1:0G:15G --typecode=1:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 > --partition-guid=$(uuidgen -r) --mbrtogpt -- /dev/sda > > 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 being the GUID. More info on GTP GUID is > available on wikipedia [1]. > > I think the issue with the label you had was linked to some bugs in the disk > initialization process. This was discussed a few weeks back on this mailing list. > > [1] https://en.wikipedia.org/wiki/GUID_Partition_Table That what I read on the irc channel, it seem to be a common mistake, might be good to talk about that in the doc or FAQ ? Yoann > On Tue, Mar 8, 2016 at 5:21 PM, Yoann Moulin <yoann.moulin@xxxxxxx > <mailto:yoann.moulin@xxxxxxx>> wrote: > > Hello Adrien, > > > I think I faced the same issue setting up my own cluster. If it is the same, > > it's one of the many people encounter(ed) during disk initialization. > > Could you please give the output of : > > - ll /dev/disk/by-partuuid/ > > - ll /var/lib/ceph/osd/ceph-* > > unfortunately, I already reinstall my test cluster, but I got some information > that might explain this issue. > > I was creating the journal partition before running the ansible playbook. > firstly, owner and right was not persistent at boot (had to add udev's rules). > And I strongly suspect a side effect of not let ceph-disk create journal > partition. > > Yoann > > > On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.moulin@xxxxxxx <mailto:yoann.moulin@xxxxxxx> > > <mailto:yoann.moulin@xxxxxxx <mailto:yoann.moulin@xxxxxxx>>> wrote: > > > > Hello, > > > > I'm (almost) a new user of ceph (couple of month). In my university, > we start to > > do some test with ceph a couple of months ago. > > > > We have 2 clusters. Each cluster have 100 OSDs on 10 servers : > > > > Each server as this setup : > > > > CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > > Memory : 128GB of Memory > > OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) > > Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) > > OSD Disk : 10 x HGST ultrastar-7k6000 6TB > > Network : 1 x 10Gb/s > > OS : Ubuntu 14.04 > > Ceph version : infernalis 9.2.0 > > > > One cluster give access to some user through a S3 gateway (service is > still in > > beta). We call this cluster "ceph-beta". > > > > One cluster is for our internal need to learn more about ceph. We call > this > > cluster "ceph-test". (those servers will be integrated into the ceph-beta > > cluster when we will need more space) > > > > We have deploy both clusters with the ceph-ansible playbook[1] > > > > Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5 > > journals partitions on each SSDs. > > > > OSDs disk are format in XFS. > > > > 1. https://github.com/ceph/ceph-ansible > > > > We have an issue. Some OSDs go down and don't start. It seem to be > related to > > the fsid of the journal partition : > > > > > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal > FileJournal::open: > > ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected > > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal > > > > in attachment, the full logs of one of the dead OSDs > > > > We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, > zapping > > and readding it. > > > > Now, we have the same issue on ceph-test cluster but on 18 OSDs. > > > > Now the stats of this cluster > > > > > root@icadmin004:~# ceph -s > > > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8 > > > health HEALTH_WARN > > > 1024 pgs incomplete > > > 1024 pgs stuck inactive > > > 1024 pgs stuck unclean > > > monmap e1: 3 mons at > > > {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0 > <http://10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0> > > > <http://10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0>} > > > election epoch 62, quorum 0,1,2 > > iccluster003,iccluster014,iccluster022 > > > osdmap e242: 100 osds: 82 up, 82 in > > > flags sortbitwise > > > pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects > > > 4812 MB used, 447 TB / 447 TB avail > > > 1280 active+clean > > > 1024 creating+incomplete > > > > We have install this cluster at the begin of February. We did not use that > > cluster at all even at the begin to troubleshoot an issue with ceph-ansible. We > > did not push any data neither create pool. What could explain this behaviour ? > > > > Thanks for your help > > > > Best regards, > > > > -- > > Yoann Moulin > > EPFL IC-IT > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > <mailto:ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > -- > Yoann Moulin > EPFL IC-IT > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com