On 21-10-15 15:30, Mark Nelson wrote: > > > On 10/21/2015 01:59 AM, Wido den Hollander wrote: >> On 10/20/2015 07:44 PM, Mark Nelson wrote: >>> On 10/20/2015 09:00 AM, Wido den Hollander wrote: >>>> Hi, >>>> >>>> In the "newstore direction" thread on ceph-devel I wrote that I'm using >>>> bcache in production and Mark Nelson asked me to share some details. >>>> >>>> Bcache is running in two clusters now that I manage, but I'll keep this >>>> information to one of them (the one at PCextreme behind CloudStack). >>>> >>>> In this cluster has been running for over 2 years now: >>>> >>>> epoch 284353 >>>> fsid 0d56dd8f-7ae0-4447-b51b-f8b818749307 >>>> created 2013-09-23 11:06:11.819520 >>>> modified 2015-10-20 15:27:48.734213 >>>> >>>> The system consists out of 39 hosts: >>>> >>>> 2U SuperMicro chassis: >>>> * 80GB Intel SSD for OS >>>> * 240GB Intel S3700 SSD for Journaling + Bcache >>>> * 6x 3TB disk >>>> >>>> This isn't the newest hardware. The next batch of hardware will be more >>>> disks per chassis, but this is it for now. >>>> >>>> All systems were installed with Ubuntu 12.04, but they are all running >>>> 14.04 now with bcache. >>>> >>>> The Intel S3700 SSD is partitioned with a GPT label: >>>> - 5GB Journal for each OSD >>>> - 200GB Partition for bcache >>>> >>>> root@ceph11:~# df -h|grep osd >>>> /dev/bcache0 2.8T 1.1T 1.8T 38% /var/lib/ceph/osd/ceph-60 >>>> /dev/bcache1 2.8T 1.2T 1.7T 41% /var/lib/ceph/osd/ceph-61 >>>> /dev/bcache2 2.8T 930G 1.9T 34% /var/lib/ceph/osd/ceph-62 >>>> /dev/bcache3 2.8T 970G 1.8T 35% /var/lib/ceph/osd/ceph-63 >>>> /dev/bcache4 2.8T 814G 2.0T 30% /var/lib/ceph/osd/ceph-64 >>>> /dev/bcache5 2.8T 915G 1.9T 33% /var/lib/ceph/osd/ceph-65 >>>> root@ceph11:~# >>>> >>>> root@ceph11:~# lsb_release -a >>>> No LSB modules are available. >>>> Distributor ID: Ubuntu >>>> Description: Ubuntu 14.04.3 LTS >>>> Release: 14.04 >>>> Codename: trusty >>>> root@ceph11:~# uname -r >>>> 3.19.0-30-generic >>>> root@ceph11:~# >>>> >>>> "apply_latency": { >>>> "avgcount": 2985023, >>>> "sum": 226219.891559000 >>>> } >>>> >>>> What did we notice? >>>> - Less spikes on the disk >>>> - Lower commit latencies on the OSDs >>>> - Almost no 'slow requests' during backfills >>>> - Cache-hit ratio of about 60% >>>> >>>> Max backfills and recovery active are both set to 1 on all OSDs. >>>> >>>> For the next generation hardware we are looking into using 3U chassis >>>> with 16 4TB SATA drives and a 1.2TB NVM-E SSD for bcache, but we >>>> haven't >>>> tested those yet, so nothing to say about it. >>>> >>>> The current setup is 200GB of cache for 18TB of disks. The new setup >>>> will be 1200GB for 64TB, curious to see what that does. >>>> >>>> Our main conclusion however is that it does smoothen the I/O-pattern >>>> towards the disks and that gives a overall better response of the >>>> disks. >>> >>> Hi Wido, thanks for the big writeup! Did you guys happen to do any >>> benchmarking? I think Xiaoxi looked at flashcache a while back but had >>> mixed results if I remember right. It would be interesting to know how >>> bcache is affecting performance in different scenarios. >>> >> >> No, we didn't do any benchmarking. Initially this cluster was build for >> just the RADOS Gateway, so we went for 2Gbit (2x 1Gbit) per machine. 90% >> is still Gbit networking and we are in the process of upgrading it all >> to 10Gbit. >> >> Since the 1Gbit network latency is about 4 times higher then 10Gbit we >> aren't really benchmarking the cluster. >> >> What counts for us most is that we can do recovery operations without >> any slow requests. >> >> Before bcache we saw disks spike to 100% busy while a backfill was busy. >> Now bcache smoothens this and we see peaks of maybe 70%, but that's it. > > In the testing I was doing to figure out our new lab hardware, I was > seeing SSDs handle recovery dramatically better than spinning disks as > well during cephtestrados runs. It might be worth digging in to see > what the IO patterns look like. In the mean time though, it's very > interesting that bcache helps in this case so much. Good to know! > To add to this. We still had to enable hashpspools on a few pools, so we did. The degradation went to 39% on the cluster and it has been recovering for over 48 hours now. Not a single slow request while we had the OSD complaint time set to 5 seconds. After setting this to 0.5 seconds we saw some slow requests, but nothing dramatic. For us bcache works really great with spinning disks. Wido >> >>> Thanks, >>> Mark >>> >>>> >>>> Wido >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com