Re: Ceph OSDs with bcache experience

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The thing that worries me with your next-gen design (actually your current design aswell) is SSD wear. If you use Intel SSD at 10 DWPD, that's 12TB/day per 64TB total.  I guess use case dependant,  and perhaps 1:4 write read ratio is quite high in terms of writes as-is.

You're also throughput-limiting yourself to the pci-e bw of the NVME device (regardless of NVRAM/SSD). Compared to traditonal interface, that may be ok of course in relative terms. NVRAM vs SSD here is simply a choice between wear (NVRAM as journal minimum), and cache hit probability (size).  

Interesting thought experiment anyway for me, thanks for sharing Wido.

/M


-------- Original message --------
From: Wido den Hollander <wido@xxxxxxxx>
Date: 20/10/2015 16:00 (GMT+01:00)
To: ceph-users <ceph-users@xxxxxxxx>
Subject: [ceph-users] Ceph OSDs with bcache experience

Hi,

In the "newstore direction" thread on ceph-devel I wrote that I'm using
bcache in production and Mark Nelson asked me to share some details.

Bcache is running in two clusters now that I manage, but I'll keep this
information to one of them (the one at PCextreme behind CloudStack).

In this cluster has been running for over 2 years now:

epoch 284353
fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307
created 2013-09-23 11:06:11.819520
modified 2015-10-20 15:27:48.734213

The system consists out of 39 hosts:

2U SuperMicro chassis:
* 80GB Intel SSD for OS
* 240GB Intel S3700 SSD for Journaling + Bcache
* 6x 3TB disk

This isn't the newest hardware. The next batch of hardware will be more
disks per chassis, but this is it for now.

All systems were installed with Ubuntu 12.04, but they are all running
14.04 now with bcache.

The Intel S3700 SSD is partitioned with a GPT label:
- 5GB Journal for each OSD
- 200GB Partition for bcache

root@ceph11:~# df -h|grep osd
/dev/bcache0    2.8T  1.1T  1.8T  38% /var/lib/ceph/osd/ceph-60
/dev/bcache1    2.8T  1.2T  1.7T  41% /var/lib/ceph/osd/ceph-61
/dev/bcache2    2.8T  930G  1.9T  34% /var/lib/ceph/osd/ceph-62
/dev/bcache3    2.8T  970G  1.8T  35% /var/lib/ceph/osd/ceph-63
/dev/bcache4    2.8T  814G  2.0T  30% /var/lib/ceph/osd/ceph-64
/dev/bcache5    2.8T  915G  1.9T  33% /var/lib/ceph/osd/ceph-65
root@ceph11:~#

root@ceph11:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty
root@ceph11:~# uname -r
3.19.0-30-generic
root@ceph11:~#

"apply_latency": {
    "avgcount": 2985023,
    "sum": 226219.891559000
}

What did we notice?
- Less spikes on the disk
- Lower commit latencies on the OSDs
- Almost no 'slow requests' during backfills
- Cache-hit ratio of about 60%

Max backfills and recovery active are both set to 1 on all OSDs.

For the next generation hardware we are looking into using 3U chassis
with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, but we haven't
tested those yet, so nothing to say about it.

The current setup is 200GB of cache for 18TB of disks. The new setup
will be 1200GB for 64TB, curious to see what that does.

Our main conclusion however is that it does smoothen the I/O-pattern
towards the disks and that gives a overall better response of the disks.

Wido

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux