Re: Ceph OSDs with bcache experience

Martin Millnert <martin@xxxxxxxxxxx> · Tue, 20 Oct 2015 22:33:07 +0200

OK - seems my android email client (native samsung) messed up
"in-reply-to" which confuses some MUA's. Apologies for that (&this)

/M

On Tue, Oct 20, 2015 at 09:45:25PM +0200, Martin Millnert wrote:
> The thing that worries me with your next-gen design (actually your current
> design aswell) is SSD wear. If you use Intel SSD at 10 DWPD, that's 12TB/day
> per 64TB total.  I guess use case dependant,  and perhaps 1:4 write read ratio
> is quite high in terms of writes as-is.
> 
> You're also throughput-limiting yourself to the pci-e bw of the NVME device
> (regardless of NVRAM/SSD). Compared to traditonal interface, that may be ok of
> course in relative terms. NVRAM vs SSD here is simply a choice between wear
> (NVRAM as journal minimum), and cache hit probability (size).  
> 
> Interesting thought experiment anyway for me, thanks for sharing Wido.
> 
> /M
> 
> 
> -------- Original message --------
> From: Wido den Hollander <wido@xxxxxxxx>
> Date: 20/10/2015 16:00 (GMT+01:00)
> To: ceph-users <ceph-users@xxxxxxxx>
> Subject:  Ceph OSDs with bcache experience
> 
> Hi,
> 
> In the "newstore direction" thread on ceph-devel I wrote that I'm using
> bcache in production and Mark Nelson asked me to share some details.
> 
> Bcache is running in two clusters now that I manage, but I'll keep this
> information to one of them (the one at PCextreme behind CloudStack).
> 
> In this cluster has been running for over 2 years now:
> 
> epoch 284353
> fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307
> created 2013-09-23 11:06:11.819520
> modified 2015-10-20 15:27:48.734213
> 
> The system consists out of 39 hosts:
> 
> 2U SuperMicro chassis:
> * 80GB Intel SSD for OS
> * 240GB Intel S3700 SSD for Journaling + Bcache
> * 6x 3TB disk
> 
> This isn't the newest hardware. The next batch of hardware will be more
> disks per chassis, but this is it for now.
> 
> All systems were installed with Ubuntu 12.04, but they are all running
> 14.04 now with bcache.
> 
> The Intel S3700 SSD is partitioned with a GPT label:
> - 5GB Journal for each OSD
> - 200GB Partition for bcache
> 
> root@ceph11:~# df -h|grep osd
> /dev/bcache0    2.8T  1.1T  1.8T  38% /var/lib/ceph/osd/ceph-60
> /dev/bcache1    2.8T  1.2T  1.7T  41% /var/lib/ceph/osd/ceph-61
> /dev/bcache2    2.8T  930G  1.9T  34% /var/lib/ceph/osd/ceph-62
> /dev/bcache3    2.8T  970G  1.8T  35% /var/lib/ceph/osd/ceph-63
> /dev/bcache4    2.8T  814G  2.0T  30% /var/lib/ceph/osd/ceph-64
> /dev/bcache5    2.8T  915G  1.9T  33% /var/lib/ceph/osd/ceph-65
> root@ceph11:~#
> 
> root@ceph11:~# lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 14.04.3 LTS
> Release: 14.04
> Codename: trusty
> root@ceph11:~# uname -r
> 3.19.0-30-generic
> root@ceph11:~#
> 
> "apply_latency": {
>     "avgcount": 2985023,
>     "sum": 226219.891559000
> }
> 
> What did we notice?
> - Less spikes on the disk
> - Lower commit latencies on the OSDs
> - Almost no 'slow requests' during backfills
> - Cache-hit ratio of about 60%
> 
> Max backfills and recovery active are both set to 1 on all OSDs.
> 
> For the next generation hardware we are looking into using 3U chassis
> with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, but we haven't
> tested those yet, so nothing to say about it.
> 
> The current setup is 200GB of cache for 18TB of disks. The new setup
> will be 1200GB for 64TB, curious to see what that does.
> 
> Our main conclusion however is that it does smoothen the I/O-pattern
> towards the disks and that gives a overall better response of the disks.
> 
> Wido
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Attachment:
signature.asc

Description: Digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com