Hello Adrien, > I think I faced the same issue setting up my own cluster. If it is the same, > it's one of the many people encounter(ed) during disk initialization. > Could you please give the output of : > - ll /dev/disk/by-partuuid/ > - ll /var/lib/ceph/osd/ceph-* unfortunately, I already reinstall my test cluster, but I got some information that might explain this issue. I was creating the journal partition before running the ansible playbook. firstly, owner and right was not persistent at boot (had to add udev's rules). And I strongly suspect a side effect of not let ceph-disk create journal partition. Yoann > On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.moulin@xxxxxxx > <mailto:yoann.moulin@xxxxxxx>> wrote: > > Hello, > > I'm (almost) a new user of ceph (couple of month). In my university, we start to > do some test with ceph a couple of months ago. > > We have 2 clusters. Each cluster have 100 OSDs on 10 servers : > > Each server as this setup : > > CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > Memory : 128GB of Memory > OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) > Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) > OSD Disk : 10 x HGST ultrastar-7k6000 6TB > Network : 1 x 10Gb/s > OS : Ubuntu 14.04 > Ceph version : infernalis 9.2.0 > > One cluster give access to some user through a S3 gateway (service is still in > beta). We call this cluster "ceph-beta". > > One cluster is for our internal need to learn more about ceph. We call this > cluster "ceph-test". (those servers will be integrated into the ceph-beta > cluster when we will need more space) > > We have deploy both clusters with the ceph-ansible playbook[1] > > Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5 > journals partitions on each SSDs. > > OSDs disk are format in XFS. > > 1. https://github.com/ceph/ceph-ansible > > We have an issue. Some OSDs go down and don't start. It seem to be related to > the fsid of the journal partition : > > > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal FileJournal::open: > ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal > > in attachment, the full logs of one of the dead OSDs > > We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, zapping > and readding it. > > Now, we have the same issue on ceph-test cluster but on 18 OSDs. > > Now the stats of this cluster > > > root@icadmin004:~# ceph -s > > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8 > > health HEALTH_WARN > > 1024 pgs incomplete > > 1024 pgs stuck inactive > > 1024 pgs stuck unclean > > monmap e1: 3 mons at > {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0 > <http://10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0>} > > election epoch 62, quorum 0,1,2 > iccluster003,iccluster014,iccluster022 > > osdmap e242: 100 osds: 82 up, 82 in > > flags sortbitwise > > pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects > > 4812 MB used, 447 TB / 447 TB avail > > 1280 active+clean > > 1024 creating+incomplete > > We have install this cluster at the begin of February. We did not use that > cluster at all even at the begin to troubleshoot an issue with ceph-ansible. We > did not push any data neither create pool. What could explain this behaviour ? > > Thanks for your help > > Best regards, > > -- > Yoann Moulin > EPFL IC-IT > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com