Re: dense storage nodes

Christian Balzer <chibi@xxxxxxx> · Tue, 24 May 2016 11:28:54 +0900

Hello,

On Fri, 20 May 2016 10:57:10 -0700 Anthony D'Atri wrote:

> [ too much to quote ]
> 
> Dense nodes often work better for object-focused workloads than
> block-focused, the impact of delayed operations is simply speed vs. a
> tenant VM crashing.
> 
Especially if they're don't have SSD journals or are behind a correctly
designed and sized cache-tier. 

> Re RAID5 volumes to decrease the number of OSD’s:   This sort of
> approach is getting increasing attention in that it brings down the OSD
> count, reducing the resource demands of peering, especially during
> storms.  It also makes the OSD fillage bell curve narrower.   

Indeed, that's that I meant with diminishing return in my response to this
thread.

One word of advice here, I listed only RAID6 (for space) and RAID1 or 10
(for IOPS) for a reason.
Both will give more or less OSDs that are invulnerable to disk failures,
allowing you to reduce replication to 2 and of course never having to
suffer recovery/backfilling traffic from OSD (HDD) failures. 

With a RAID5 the chances for a double disk failure and thus loss of the
OSD are so high that I would not in good conscience use a replication
lower than 3.

> But one
> must also consider that the write speed of a RAID5 group is that of a
> single drive due to the parity recalc, and that if one does not adjust
> osd_op_threads and osd_disk_threads, throughput can suffer because fewer
> ops can run across the cluster at the same time.
> 
Correct, however a good RAID controller with a large cache (I like Areca)
will improve on that scenario.

> Re Intel P3700 NVMe cards, has anyone out there experienced reset issues
> that may be related to workload, kernel version, driver version,
> firmware version, etc?  Or even Firefly vs Hammer?
> 
Don't have any of those yet (nor likely in the near future), but given
that there were firmware bugs with the 3610s and 3710s that caused bus
resets I'd definitely look into that if I were you.

https://downloadcenter.intel.com/download/23931/Intel-Solid-State-Drive-Data-Center-Tool

Same for the kernel/drive if you're using LSI controllers, 4.5 is close to
what you would get if rolling kernels yourself using their latest SW.

> There was an excellent presentation at the Austin OpenStack Summit re
> optimizing dense nodes — pinning OSD processes, HBA/NIC interrupts etc.
> to cores/sockets to limit data sent over QPI links on NUMA
> architectures.  It’s easy to believe that modern inter-die links are
> Fast Enough For You Old Man but there’s more too it.
> 
Ayup, very much so.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com