Re: RBD with SSD journals and SAS OSDs

William Josefsson <william.josefson@xxxxxxxxx> · Sun, 16 Oct 2016 19:07:17 +0800

Ok thanks for sharing. yes my journals are Intel S3610 200GB, which I
partition in 4 partitions each ~45GB. When I ceph-deploy I declare
these as the journals of the OSDs.

I was trying to understand the blocking, and how much my SAS OSDs
affected my performance. I have a total of 9 hosts, 158 OSDs each
1.8TB. The Servers are connected through copper 10Gbit LACP bonds.
My failure domain is by type RACK. The CRUSH rule set is by rack. 3
hosts in each rack. Pool size is =3. I'm running hammer on centos7.

I did a simple fio test from one of my xl instances, and got the
results below. The Latency 7.21ms is worrying, is this expected
results? Or is there any way I can further tune my cluster to achieve
better results? thx will

FIO: sync=1, direct=1, bs=4k

write-50: (groupid=11, jobs=50): err= 0: pid=3945: Sun Oct 16 08:41:15 2016
  write: io=832092KB, bw=27721KB/s, iops=6930, runt= 30017msec
    clat (msec): min=2, max=253, avg= 7.21, stdev= 4.97
     lat (msec): min=2, max=253, avg= 7.21, stdev= 4.97
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    5], 20.00th=[    5],
     | 30.00th=[    5], 40.00th=[    6], 50.00th=[    7], 60.00th=[    8],
     | 70.00th=[    9], 80.00th=[   10], 90.00th=[   12], 95.00th=[   14],
     | 99.00th=[   17], 99.50th=[   19], 99.90th=[   21], 99.95th=[   23],
     | 99.99th=[  253]
    bw (KB  /s): min=  341, max=  870, per=2.01%, avg=556.60, stdev=136.98
    lat (msec) : 4=8.24%, 10=74.10%, 20=17.52%, 50=0.12%, 500=0.02%
  cpu          : usr=0.04%, sys=0.23%, ctx=425242, majf=0, minf=1570
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=208023/d=0, short=r=0/w=0/d=0

On Sun, Oct 16, 2016 at 4:18 PM, Christian Balzer <chibi@xxxxxxx> wrote:
>
> Hello,
>
> On Sun, 16 Oct 2016 15:03:24 +0800 William Josefsson wrote:
>
>> Hi list, while I know that writes in the RADOS backend are sync() can
>> anyone please explain when the cluster will return on a write call for
>> RBD from VMs? Will data be considered synced one written to the
>> journal or all the way to the OSD drive?
>>
> This has been answered countless (really) here, the Ceph Architecture
> documentation should really be more detailed about this, as well as how
> parallel the data is being sent to the secondary OSDs.
>
> It is of course ack'ed to the client once all journals have successfully
> written the data, otherwise journal SSDs would make a LOT less sense.
>
>> Each host in my cluster has 5x Intel S3610, and 18x1.8TB Hitachi 10krpm SAS.
>>
> The size of your SSDs (you didn't mention) will determine the speed, for
> journal purposes the sequential write speed is basically it.
>
> A 5:18 ratio implies that some of your SSDs hold more journals than others.
>
> You emphatically do NOT want that, because eventually the busier ones will
> run out of endurance while the other ones still have plenty left.
>
> If possible change this to a 5:20 or 6:18 ratio (depending on your SSDs
> and expected write volume).
>
> Christian
>> I have size=3 for my pool. Will Ceph return once the data is written
>> to at least 3 designated journals, or will it in fact wait until the
>> data is written to the OSD drives? thx will
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com