Hi,
One of the benefits of PCIe NVMe is that it does not take a disk slot, resulting in a higher density. For example a 6048R-E1CR36N with 3x PCIe NVMe yields 36 OSDs per servers (12 OSD per NVMe) where it yields 30 OSDs per server if using SATA SSDs (6 OSDs per SSD).
Since you say that you used 10% of P3700 endurance in 1 year (7.3PB endurance, so 0.73PB/year), so a 400GB P3600 would work for 3 years. Maybe good enough until BlueStore is more stable.
Cheers,
Maxime
On Thu, 22 Jun 2017 at 03:59 Christian Balzer <chibi@xxxxxxx> wrote:
Hello,
Hmm, gmail client not grokking quoting these days?
On Wed, 21 Jun 2017 20:40:48 -0500 Brady Deetz wrote:
> On Jun 21, 2017 8:15 PM, "Christian Balzer" <chibi@xxxxxxx> wrote:
>
> On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote:
>
> > Hello,
> > I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have
> 12
> > osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe
> > drives providing 10GB journals for groups of 12 6TB spinning rust drives
> > and 2x lacp 40gbps ethernet.
> >
> > Our hardware provider is recommending that we start deploying P4600 drives
> > in place of our P3700s due to availability.
> >
> Welcome to the club and make sure to express your displeasure about
> Intel's "strategy" to your vendor.
>
> The P4600s are a poor replacement for P3700s and also still just
> "announced" according to ARK.
>
> Are you happy with your current NVMes?
> Firstly as in, what is their wearout, are you expecting them to easily
> survive 5 years at the current rate?
> Secondly, how about speed? with 12 HDDs and 1GB/s write capacity of the
> NVMe I'd expect them to not be a bottleneck in nearly all real life
> situations.
>
> Keep in mind that 1.6TB P4600 is going to last about as long as your 400GB
> P3700, so if wear-out is a concern, don't put more stress on them.
>
>
> Oddly enough, the Intel tools are telling me that we've only used about 10%
> of each drive's endurance over the past year. This honestly surprises me
> due to our workload, but maybe I'm thinking my researchers are doing more
> science than they actually are.
>
That's pretty impressive still, but also lets you do numbers as to what
kind of additional load you _may_ be able to consider, obviously not more
than twice the current amount to stay within 5 years before wearing
them out.
>
> Also the P4600 is only slightly faster in writes than the P3700, so that's
> where putting more workload onto them is going to be a notable issue.
>
> > I've seen some talk on here regarding this, but wanted to throw an idea
> > around. I was okay throwing away 280GB of fast capacity for the purpose of
> > providing reliable journals. But with as much free capacity as we'd have
> > with a 4600, maybe I could use that extra capacity as a cache tier for
> > writes on an rbd ec pool. If I wanted to go that route, I'd probably
> > replace several existing 3700s with 4600s to get additional cache
> capacity.
> > But, that sounds risky...
> >
> Risky as in high failure domain concentration and as mentioned above a
> cache-tier with obvious inline journals and thus twice the bandwidth needs
> will likely eat into the write speed capacity of the journals.
>
>
> Agreed. On the topic of journals and double bandwidth, am I correct in
> thinking that btrfs (as insane as it may be) does not require double
> bandwidth like xfs? Furthermore with bluestore being close to stable, will
> my architecture need to change?
>
BTRFS at this point is indeed a bit insane, given the current levels of
support, issues (search the ML archives) and future developments.
And you'll still wind up with double writes most likely, IIRC.
These aspects of Bluestore have been discussed here recently, too.
Your SSD/NVMe space requirements will go down, but if you want to have the
same speeds and more importantly low latencies you'll wind up with all
writes going through them again, so endurance wise you're still in that
"Lets make SSDs great again" hellhole.
>
> If (and seems to be a big IF) you can find them, the Samsung PM1725a 1.6TB
> seems to be a) cheaper and b) at 2GB/s write speed more likely to be
> suitable for double duty.
> Similar (slightly better on paper) endurance than then P4600, so keep that
> in mind, too.
>
>
> My vendor is an HPC vendor so /maybe/ they have access to these elusive
> creatures. In which case, how many do you want? Haha
>
I was just looking at availability with a few google searches, our current
needs are amply satisfied with S37xx SSDs, no need for NVMes really.
But as things are going, maybe I'll be forced to Optane and friends simply
by lack of alternatives.
Christian
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com