Hi , Have you ever done the performance comparison between using journal file and journal partition? Regards, Leander Yu. On Tue, Feb 14, 2012 at 8:45 PM, Wido den Hollander <wido@xxxxxxxxx> wrote: > Hi, > > > On 02/14/2012 01:39 AM, Paul Pettigrew wrote: >> >> G'day all >> >> About to commence an R&D eval of the Ceph platform having been impressed >> with the momentum achieved over the past 12mths. >> >> I have one question re design before rolling out to metal........ >> >> I will be using 1x SSD drive per storage server node (assume it is >> /dev/sdb for this discussion), and cannot readily determine the pro/con's >> for the two methods of using it for OSD-Journal, being: >> #1. place it in the main [osd] stanza and reference the whole drive as a >> single partition; or > > > That won't work. If you do that all OSD's will try to open the journal. The > journal for each OSD has to be unique. > > >> #2. partition up the disk, so 1x partition per SATA HDD, and place each >> partition in the [osd.N] portion > > > That would be your best option. > > I'm doing the same: http://zooi.widodh.nl/ceph/ceph.conf > > the VG "data" is placed on a SSD (Intel X25-M). > > >> >> So if I were to code #1 in the ceph.conf file, it would be: >> [osd] >> osd journal = /dev/sdb >> >> Or, #2 would be like: >> [osd.0] >> host = ceph1 >> btrfs devs = /dev/sdc >> osd journal = /dev/sdb5 >> [osd.1] >> host = ceph1 >> btrfs devs = /dev/sdd >> osd journal = /dev/sdb6 >> [osd.2] >> host = ceph1 >> btrfs devs = /dev/sde >> osd journal = /dev/sdb7 >> [osd.3] >> host = ceph1 >> btrfs devs = /dev/sdf >> osd journal = /dev/sdb8 >> >> I am asking therefore, is the added work (and constraints) of specifying >> down to individual partitions per #2 worth it in performance gains? Does it >> not also have a constraint, in that if I wanted to add more HDD's into the >> server (we buy 45 bay units, and typically provision HDD's "on demand" i.e. >> 15x at a time as usage grows), I would have to additionally partition the >> SSD (taking it offline) - but if it were #1 option, I would only have to add >> more [osd.N] sections (and not have to worry about getting the SSD with 45x >> partitions)? >> > > You'd still have to go for #2. However, running 45 OSD's on a single machine > is a bit tricky imho. > > If that machine fails you would loose 45 OSD's at once, that will put a lot > of stress on the recovery of your cluster. > > You'd also need a lot of RAM to accommodate those 45 OSD's, at least 48GB of > RAM I guess. > > A last note, if you use a SSD for your journaling, make sure that you align > your partitions which the page size of the SSD, otherwise you'd run into the > write amplification of the SSD, resulting in a performance loss. > > Wido > > >> One final related question, if I were to use #1 method (which I would >> prefer if there is no material performance or other reason to use #2), then >> that specification (i.e. the "osd journal = /dev/sdb") SSD disk reference >> would have to be identical on all other hardware nodes, yes (I want to use >> the same ceph.conf file on all servers per the doco recommendations)? What >> would happen if for example, the SSD was on /dev/sde on a new node added >> into the cluster? References to /dev/disk/by-id etc are clearly no help, so >> should a symlink be used from the get-go? Eg something like "ln -s /dev/sdb >> /srv/ssd" on one box, and "ln -s /dev/sde /srv/ssd" on the other box, so >> that in the [osd] section we could use this line which would find the SSD >> disk on all nodes "osd journal = /srv/ssd"? >> >> Many thanks for any advice provided. >> >> Cheers >> >> Paul >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html