Ok thanks for sharing. yes my journals are Intel S3610 200GB, which I partition in 4 partitions each ~45GB. When I ceph-deploy I declare these as the journals of the OSDs. I was trying to understand the blocking, and how much my SAS OSDs affected my performance. I have a total of 9 hosts, 158 OSDs each 1.8TB. The Servers are connected through copper 10Gbit LACP bonds. My failure domain is by type RACK. The CRUSH rule set is by rack. 3 hosts in each rack. Pool size is =3. I'm running hammer on centos7. I did a simple fio test from one of my xl instances, and got the results below. The Latency 7.21ms is worrying, is this expected results? Or is there any way I can further tune my cluster to achieve better results? thx will FIO: sync=1, direct=1, bs=4k write-50: (groupid=11, jobs=50): err= 0: pid=3945: Sun Oct 16 08:41:15 2016 write: io=832092KB, bw=27721KB/s, iops=6930, runt= 30017msec clat (msec): min=2, max=253, avg= 7.21, stdev= 4.97 lat (msec): min=2, max=253, avg= 7.21, stdev= 4.97 clat percentiles (msec): | 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 5], 20.00th=[ 5], | 30.00th=[ 5], 40.00th=[ 6], 50.00th=[ 7], 60.00th=[ 8], | 70.00th=[ 9], 80.00th=[ 10], 90.00th=[ 12], 95.00th=[ 14], | 99.00th=[ 17], 99.50th=[ 19], 99.90th=[ 21], 99.95th=[ 23], | 99.99th=[ 253] bw (KB /s): min= 341, max= 870, per=2.01%, avg=556.60, stdev=136.98 lat (msec) : 4=8.24%, 10=74.10%, 20=17.52%, 50=0.12%, 500=0.02% cpu : usr=0.04%, sys=0.23%, ctx=425242, majf=0, minf=1570 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=208023/d=0, short=r=0/w=0/d=0 On Sun, Oct 16, 2016 at 4:18 PM, Christian Balzer <chibi@xxxxxxx> wrote: > > Hello, > > On Sun, 16 Oct 2016 15:03:24 +0800 William Josefsson wrote: > >> Hi list, while I know that writes in the RADOS backend are sync() can >> anyone please explain when the cluster will return on a write call for >> RBD from VMs? Will data be considered synced one written to the >> journal or all the way to the OSD drive? >> > This has been answered countless (really) here, the Ceph Architecture > documentation should really be more detailed about this, as well as how > parallel the data is being sent to the secondary OSDs. > > It is of course ack'ed to the client once all journals have successfully > written the data, otherwise journal SSDs would make a LOT less sense. > >> Each host in my cluster has 5x Intel S3610, and 18x1.8TB Hitachi 10krpm SAS. >> > The size of your SSDs (you didn't mention) will determine the speed, for > journal purposes the sequential write speed is basically it. > > A 5:18 ratio implies that some of your SSDs hold more journals than others. > > You emphatically do NOT want that, because eventually the busier ones will > run out of endurance while the other ones still have plenty left. > > If possible change this to a 5:20 or 6:18 ratio (depending on your SSDs > and expected write volume). > > Christian >> I have size=3 for my pool. Will Ceph return once the data is written >> to at least 3 designated journals, or will it in fact wait until the >> data is written to the OSD drives? thx will >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com