On Thu, 8 Aug 2013, Joao Pedras wrote: > Let me just clarify... the prepare process created all 10 partitions in sdg > the thing is that only 2 (sdg1, sdg2) would be present in /dev. The partx > bit is just a hack as I am not familiar with the entire sequence. Initially > I was deploying this test cluster in 5 nodes, each with 10 spinners, 1 OS > spinner, 1 ssd for journal. *All* nodes would only bring up the first 2 > osds. > > From the start the partitions for journals are there: > ~]# parted /dev/sdg > GNU Parted 2.1 > Using /dev/sdg > Welcome to GNU Parted! Type 'help' to view a list of commands. > (parted) p > Model: ATA Samsung SSD 840 (scsi) > Disk /dev/sdg: 512GB > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 1049kB 4295MB 4294MB ceph journal > 2 4296MB 8590MB 4294MB ceph journal > 3 8591MB 12.9GB 4294MB ceph journal > 4 12.9GB 17.2GB 4294MB ceph journal > 5 17.2GB 21.5GB 4294MB ceph journal > 6 21.5GB 25.8GB 4294MB ceph journal > 7 25.8GB 30.1GB 4294MB ceph journal > 8 30.1GB 34.4GB 4294MB ceph journal > 9 34.4GB 38.7GB 4294MB ceph journal > 10 38.7GB 42.9GB 4294MB ceph journal > > After partx all the entries show up under /dev and I have been able to > install the cluster successfully. This really seems like something that udev should be doing. I think the next step would be to reproduce the problem directly, by wiping the partition table (ceph-disk zap /dev/sdg) and running the sgdisk commands to create the partitions directly from the command line, and then verifying that the /dev entries are (not) present. It may be that our ugly ceph-disk-udev helper is throwing a wrench in things, but I'm not sure offhand how that would be. Once you have a sequence that reproduces the problem, though, we can experiement (by e.g. disabling the ceph helper to rule that out). sage > > The only weirdness happened with only one node. Not everything was entirely > active+clean. That got resolved after I added the 2nd node. > > At the moment with 3 nodes: > 2013-08-08 17:38:38.328991 mon.0 [INF] pgmap v412: 192 pgs: 192 > active+clean; 9518 bytes data, 1153 MB used, 83793 GB / 83794 GB avail > > Thanks, > > > > On Thu, Aug 8, 2013 at 8:17 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Wed, 7 Aug 2013, Tren Blackburn wrote: > > On Tue, Aug 6, 2013 at 11:14 AM, Joao Pedras > <jppedras@xxxxxxxxx> wrote: > > Greetings all. > > I am installing a test cluster using one ssd (/dev/sdg) to > hold the > > journals. Ceph's version is 0.61.7 and I am using ceph-deploy > obtained > > from ceph's git yesterday. This is on RHEL6.4, fresh install. > > > > When preparing the first 2 drives, sda and sdb, all goes well > and the > > journals get created in sdg1 and sdg2: > > > > $> ceph-deploy osd prepare ceph00:sda:sdg ceph00:sdb:sdg > > [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks > > ceph00:/dev/sda:/dev/sdg ceph00:/dev/sdb:/dev/sdg > > [ceph_deploy.osd][DEBUG ] Deploying osd to ceph00 > > [ceph_deploy.osd][DEBUG ] Host ceph00 is now ready for osd > use. > > [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sda > journal > > /dev/sdg activate False > > [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sdb > journal > > /dev/sdg activate False > > > > When preparing sdc or any disk after the first 2 I get the > following > > in that osd's log but no errors on ceph-deploy: > > > > # tail -f /var/log/ceph/ceph-osd.2.log > > 2013-08-06 10:51:36.655053 7f5ba701a780 0 ceph version 0.61.7 > > (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, > pid > > 11596 > > 2013-08-06 10:51:36.658671 7f5ba701a780 1 > > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs in > > /var/lib/ceph/tmp/mnt.i2NK47 > > 2013-08-06 10:51:36.658697 7f5ba701a780 1 > > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs fsid is already > set to > > 5d1beb09-1f80-421d-a88c-57789e2fc33e > > 2013-08-06 10:51:36.813783 7f5ba701a780 1 > > filestore(/var/lib/ceph/tmp/mnt.i2NK47) leveldb db > exists/created > > 2013-08-06 10:51:36.813964 7f5ba701a780 -1 journal > FileJournal::_open: > > disabling aio for non-block journal. Use journal_force_aio to > force > > use of aio anyway > > 2013-08-06 10:51:36.813999 7f5ba701a780 1 journal _open > > /var/lib/ceph/tmp/mnt.i2NK47/journal fd 10: 0 bytes, block > size 4096 > > bytes, directio = 1, aio = 0 > > 2013-08-06 10:51:36.814035 7f5ba701a780 -1 journal check: > ondisk fsid > > 00000000-0000-0000-0000-000000000000 doesn't match expected > > 5d1beb09-1f80-421d-a88c-57789e2fc33e, invalid (someone > else's?) > > journal > > 2013-08-06 10:51:36.814093 7f5ba701a780 -1 > > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkjournal error > creating > > journal on /var/lib/ceph/tmp/mnt.i2NK47/journal: (22) Invalid > argument > > 2013-08-06 10:51:36.814125 7f5ba701a780 -1 OSD::mkfs: > FileStore::mkfs > > failed with error -22 > > 2013-08-06 10:51:36.814185 7f5ba701a780 -1 ** ERROR: error > creating > > empty object store in /var/lib/ceph/tmp/mnt.i2NK47: (22) > Invalid > > argument > > > > I have cleaned the disks with dd, zapped them and so forth but > this > > always occurs. If doing sdc/sdd first, for example, then sda > or > > whatever follows fails with similar errors. > > > > Does anyone have any insight on this issue? > > Very strange! > > What does the partition table look like at this point? Does the > joural > nsymlink in the osd data directory point to the right partition/device > on > the failing osd? > > sage > > > > > -- > Joao Pedras > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com