Re: Transitioning to Intel P4600 from P3700 Journals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Jun 21, 2017 8:15 PM, "Christian Balzer" <chibi@xxxxxxx> wrote:
On Wed, 21 Jun 2017 19:44:08 -0500 Brady Deetz wrote:

> Hello,
> I'm expanding my 288 OSD, primarily cephfs, cluster by about 16%. I have 12
> osd nodes with 24 osds each. Each osd node has 2 P3700 400GB NVMe PCIe
> drives providing 10GB journals for groups of 12 6TB spinning rust drives
> and 2x lacp 40gbps ethernet.
>
> Our hardware provider is recommending that we start deploying P4600 drives
> in place of our P3700s due to availability.
>
Welcome to the club and make sure to express your displeasure about
Intel's "strategy" to your vendor.

The P4600s are a poor replacement for P3700s and also still just
"announced" according to ARK.

Are you happy with your current NVMes?
Firstly as in, what is their wearout, are you expecting them to easily
survive 5 years at the current rate?
Secondly, how about speed? with 12 HDDs and 1GB/s write capacity of the
NVMe I'd expect them to not be a bottleneck in nearly all real life
situations.

Keep in mind that 1.6TB P4600 is going to last about as long as your 400GB
P3700, so if wear-out is a concern, don't put more stress on them.

Oddly enough, the Intel tools are telling me that we've only used about 10% of each drive's endurance over the past year. This honestly surprises me due to our workload, but maybe I'm thinking my researchers are doing more science than they actually are.


Also the P4600 is only slightly faster in writes than the P3700, so that's
where putting more workload onto them is going to be a notable issue.

> I've seen some talk on here regarding this, but wanted to throw an idea
> around. I was okay throwing away 280GB of fast capacity for the purpose of
> providing reliable journals. But with as much free capacity as we'd have
> with a 4600, maybe I could use that extra capacity as a cache tier for
> writes on an rbd ec pool. If I wanted to go that route, I'd probably
> replace several existing 3700s with 4600s to get additional cache capacity.
> But, that sounds risky...
>
Risky as in high failure domain concentration and as mentioned above a
cache-tier with obvious inline journals and thus twice the bandwidth needs
will likely eat into the write speed capacity of the journals.

Agreed. On the topic of journals and double bandwidth, am I correct in thinking that btrfs (as insane as it may be) does not require double bandwidth like xfs? Furthermore with bluestore being close to stable, will my architecture need to change? 


If (and seems to be a big IF) you can find them, the Samsung PM1725a 1.6TB
seems to be a) cheaper and b) at 2GB/s write speed more likely to be
suitable for double duty.
Similar (slightly better on paper) endurance than then P4600, so keep that
in mind, too.

My vendor is an HPC vendor so /maybe/ they have access to these elusive creatures. In which case, how many do you want? Haha 


Christian
--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Rakuten Communications

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux