Re: journal on ssd

Joao Pedras <jppedras@xxxxxxxxx> · Thu, 8 Aug 2013 18:37:50 -0700

I might be able to give that a shot tomorrow as I will probably reinstall this set.

On Thu, Aug 8, 2013 at 6:19 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:

On Thu, 8 Aug 2013, Joao Pedras wrote:

> Let me just clarify... the prepare process created all 10 partitions in sdg

> the thing is that only 2 (sdg1, sdg2) would be present in /dev. The partx

> bit is just a hack as I am not familiar with the entire sequence. Initially

> I was deploying this test cluster in 5 nodes, each with 10 spinners, 1 OS

> spinner, 1 ssd for journal. *All* nodes would only bring up the first 2

> osds.

>

> From the start the partitions for journals are there:

> ~]# parted /dev/sdg

> GNU Parted 2.1

> Using /dev/sdg

> Welcome to GNU Parted! Type 'help' to view a list of commands.

> (parted) p

> Model: ATA Samsung SSD 840 (scsi)

> Disk /dev/sdg: 512GB

> Sector size (logical/physical): 512B/512B

> Partition Table: gpt

>

> Number  Start   End     Size    File system  Name          Flags

>  1      1049kB  4295MB  4294MB               ceph journal

>  2      4296MB  8590MB  4294MB               ceph journal

>  3      8591MB  12.9GB  4294MB               ceph journal

>  4      12.9GB  17.2GB  4294MB               ceph journal

>  5      17.2GB  21.5GB  4294MB               ceph journal

>  6      21.5GB  25.8GB  4294MB               ceph journal

>  7      25.8GB  30.1GB  4294MB               ceph journal

>  8      30.1GB  34.4GB  4294MB               ceph journal

>  9      34.4GB  38.7GB  4294MB               ceph journal

> 10      38.7GB  42.9GB  4294MB               ceph journal

>

> After partx all the entries show up under /dev and I have been able to

> install the cluster successfully.

This really seems like something that udev should be doing.  I think the

next step would be to reproduce the problem directly, by wiping the

partition table (ceph-disk zap /dev/sdg) and running the sgdisk commands

to create the partitions directly from the command line, and then

verifying that the /dev entries are (not) present.

It may be that our ugly ceph-disk-udev helper is throwing a wrench in

things, but I'm not sure offhand how that would be.  Once you have a

sequence that reproduces the problem, though, we can experiement (by e.g.

disabling the ceph helper to rule that out).

sage

>

> The only weirdness happened with only one node. Not everything was entirely

> active+clean. That got resolved after I added the 2nd node.

>

> At the moment with 3 nodes:

> 2013-08-08 17:38:38.328991 mon.0 [INF] pgmap v412: 192 pgs: 192

> active+clean; 9518 bytes data, 1153 MB used, 83793 GB / 83794 GB avail

>

> Thanks,

>

>

>

> On Thu, Aug 8, 2013 at 8:17 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:

>       On Wed, 7 Aug 2013, Tren Blackburn wrote:

>       > On Tue, Aug 6, 2013 at 11:14 AM, Joao Pedras

>       <jppedras@xxxxxxxxx> wrote:

>       >       Greetings all.

>       > I am installing a test cluster using one ssd (/dev/sdg) to

>       hold the

>       > journals. Ceph's version is 0.61.7 and I am using ceph-deploy

>       obtained

>       > from ceph's git yesterday. This is on RHEL6.4, fresh install.

>       >

>       > When preparing the first 2 drives, sda and sdb, all goes well

>       and the

>       > journals get created in sdg1 and sdg2:

>       >

>       > $> ceph-deploy osd prepare ceph00:sda:sdg ceph00:sdb:sdg

>       > [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks

>       > ceph00:/dev/sda:/dev/sdg ceph00:/dev/sdb:/dev/sdg

>       > [ceph_deploy.osd][DEBUG ] Deploying osd to ceph00

>       > [ceph_deploy.osd][DEBUG ] Host ceph00 is now ready for osd

>       use.

>       > [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sda

>       journal

>       > /dev/sdg activate False

>       > [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sdb

>       journal

>       > /dev/sdg activate False

>       >

>       > When preparing sdc or any disk after the first 2 I get the

>       following

>       > in that osd's log but no errors on ceph-deploy:

>       >

>       > # tail -f /var/log/ceph/ceph-osd.2.log

>       > 2013-08-06 10:51:36.655053 7f5ba701a780  0 ceph version 0.61.7

>       > (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd,

>       pid

>       > 11596

>       > 2013-08-06 10:51:36.658671 7f5ba701a780  1

>       > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs in

>       > /var/lib/ceph/tmp/mnt.i2NK47

>       > 2013-08-06 10:51:36.658697 7f5ba701a780  1

>       > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs fsid is already

>       set to

>       > 5d1beb09-1f80-421d-a88c-57789e2fc33e

>       > 2013-08-06 10:51:36.813783 7f5ba701a780  1

>       > filestore(/var/lib/ceph/tmp/mnt.i2NK47) leveldb db

>       exists/created

>       > 2013-08-06 10:51:36.813964 7f5ba701a780 -1 journal

>       FileJournal::_open:

>       > disabling aio for non-block journal.  Use journal_force_aio to

>       force

>       > use of aio anyway

>       > 2013-08-06 10:51:36.813999 7f5ba701a780  1 journal _open

>       > /var/lib/ceph/tmp/mnt.i2NK47/journal fd 10: 0 bytes, block

>       size 4096

>       > bytes, directio = 1, aio = 0

>       > 2013-08-06 10:51:36.814035 7f5ba701a780 -1 journal check:

>       ondisk fsid

>       > 00000000-0000-0000-0000-000000000000 doesn't match expected

>       > 5d1beb09-1f80-421d-a88c-57789e2fc33e, invalid (someone

>       else's?)

>       > journal

>       > 2013-08-06 10:51:36.814093 7f5ba701a780 -1

>       > filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkjournal error

>       creating

>       > journal on /var/lib/ceph/tmp/mnt.i2NK47/journal: (22) Invalid

>       argument

>       > 2013-08-06 10:51:36.814125 7f5ba701a780 -1 OSD::mkfs:

>       FileStore::mkfs

>       > failed with error -22

>       > 2013-08-06 10:51:36.814185 7f5ba701a780 -1  ** ERROR: error

>       creating

>       > empty object store in /var/lib/ceph/tmp/mnt.i2NK47: (22)

>       Invalid

>       > argument

>       >

>       > I have cleaned the disks with dd, zapped them and so forth but

>       this

>       > always occurs. If doing sdc/sdd first, for example, then sda

>       or

>       > whatever follows fails with similar errors.

>       >

>       > Does anyone have any insight on this issue?

>

> Very strange!

>

> What does the partition table look like at this point?  Does the

> joural

> nsymlink in the osd data directory point to the right partition/device

> on

> the failing osd?

>

> sage

>

>

>

>

> --

> Joao Pedras

>

> 

-- 
Joao Pedras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com