The rule of thumb is to match the journal throughput to the OSD throughout. I'm seeing ~180MB/s sequential write on my OSDs and I'm using one of the P3700 400GB units per six OSDs. The 400GB P3700 yields around 1200MB/s* and has around 1/10th the latency of any SATA SSD I've tested. I put a pair of them in a 12-drive chassis and get excellent performance. One could probably do the same in an 18-drive chassis without any issues. Failure domain for a journal starts to get pretty large at they point. I have dozens of the "Fultondale" SSDs deployed and have had zero failures. Endurance is excellent, etc. *the larger units yield much better write throughout but don't make sense financially for journals. -H On Mar 16, 2016, at 09:37, Nick Fisk <nick@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Stephen Harker >> Sent: 16 March 2016 16:22 >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: SSDs for journals vs SSDs for a cache tier, > which is >> better? >> >>> On 2016-02-17 11:07, Christian Balzer wrote: >>> >>> On Wed, 17 Feb 2016 10:04:11 +0100 Piotr Wachowicz wrote: >>> >>>>>> Let's consider both cases: >>>>>> Journals on SSDs - for writes, the write operation returns right >>>>>> after data lands on the Journal's SSDs, but before it's written >>>>>> to the backing HDD. So, for writes, SSD journal approach should >>>>>> be comparable to having a SSD cache tier. >>>>> Not quite, see below. >>>> Could you elaborate a bit more? >>>> >>>> Are you saying that with a Journal on a SSD writes from clients, >>>> before they can return from the operation to the client, must end up >>>> on both the SSD (Journal) *and* HDD (actual data store behind that >>>> journal)? >>> >>> No, your initial statement is correct. >>> >>> However that burst of speed doesn't last indefinitely. >>> >>> Aside from the size of the journal (which is incidentally NOT the most >>> limiting factor) there are various "filestore" parameters in Ceph, in >>> particular the sync interval ones. >>> There was a more in-depth explanation by a developer about this in >>> this ML, try your google-foo. >>> >>> For short bursts of activity, the journal helps a LOT. >>> If you send a huge number of for example 4KB writes to your cluster, >>> the speed will eventually (after a few seconds) go down to what your >>> backing storage (HDDs) are capable of sustaining. >>> >>>>> (Which SSDs do you plan to use anyway?) >>>> >>>> Intel DC S3700 >>> Good choice, with the 200GB model prefer the 3700 over the 3710 >>> (higher sequential write speed). >> >> Hi All, >> >> I am looking at using PCI-E SSDs as journals in our (4) Ceph OSD nodes, > each >> of which has 6 4TB SATA drives within. I had my eye on these: >> >> 400GB Intel P3500 DC AIC SSD, HHHL PCIe 3.0 >> >> but reading through this thread, it might be better to go with the P3700 > given >> the improved iops. So a couple of questions. >> >> * Are the PCI-E versions of these drives different in any other way than > the >> interface? > > Yes and no. Internally they are probably not much difference, but the > NVME/PCIE interface is a lot faster than SATA/SAS, both in terms of minimum > latency and bandwidth. > >> >> * Would one of these as a journal for 6 4TB OSDs be overkill (connectivity > is >> 10GE, or will be shortly anyway), would the SATA S3700 be sufficient? > > Again depends on your use case. The S3700 may suffer if you are doing large > sequential writes, it might not have a high enough sequential write speed > and will become the bottleneck. 6 Disks could potentially take around > 500-700MB/s of writes. A P3700 will have enough and will give slightly lower > write latency as well if this is important. You may even be able to run more > than 6 disk OSD's on it if needed. > >> >> Given they're not hot-swappable, it'd be good if they didn't wear out in >> 6 months too. > > Probably won't unless you are doing some really extreme write workloads and > even then I would imagine they would last 1-2 years. > >> >> I realise I've not given you much to go on and I'm Googling around as > well, I'm >> really just asking in case someone has tried this already and has some >> feedback or advice.. > > That's ok, I'm currently running S3700 100GB's on current cluster and new > cluster that's in planning stages will be using the 400Gb P3700's. > >> >> Thanks! :) >> >> Stephen >> >> -- >> Stephen Harker >> Chief Technology Officer >> The Positive Internet Company. >> >> -- >> All postal correspondence to: >> The Positive Internet Company, 24 Ganton Street, London. W1F 7QY >> >> *Follow us on Twitter* @posipeople >> >> The Positive Internet Company Limited is registered in England and Wales. >> Registered company number: 3673639. VAT no: 726 7072 28. >> Registered office: Northside House, Mount Pleasant, Barnet, Herts, EN4 > 9EE. >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com