Re: RBD performance - tuning hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 30, 2012 at 06:12:11PM +0200, Alexandre DERUMIER wrote:
> >>well, you have to compare
> >>- pure a SSD (via PCIe or SAS-6G)        vs.
> >>- Ceph-Journal, which goes 2x over 10GbE with IP
> >>  Client -> primary-copy -> 2nd-copy
> >>  (= redundancy over Ethernet distance)
> 
> Sure but the first osd ack to the client,before replicating to the others osd.
no 

> 
> Client -> primary-copy -> 2nd-copy
>        <-ack
>          primary-copy -> 2nd-copy
>                       -> 3st-copy
> 
> Or I'm wrong ?
yes,
please have a look at the attached file: ceph-replication-acks.png
The client usually will continue on 'ACK' and not wait for the 'commit'.

BTW. all my journals are in RAM (/dev/ramX)
32x 2GB = 32GB of data with replica 2x

If "filestore min/max sync interval" is set to 99999999
data should 'never' be written to OSD
('never' at least during the tests if the written data is < 32GB)

In such a configuration only the Ceph-Code and the Interconnect (10GbE/IP) would be the brakeman.

Cheers,
-Dieter


> 
> 
> ----- Mail original ----- 
> 
> De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx> 
> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
> Cc: ceph-devel@xxxxxxxxxxxxxxx, "Andreas Bluemle" <andreas.bluemle@xxxxxxxxxxx> 
> Envoyé: Jeudi 30 Août 2012 18:02:05 
> Objet: Re: RBD performance - tuning hints 
> 
> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote: 
> > Thanks 
> > 
> > >> 8x SSD, 200GB each 
> > 
> > 20000 iops seem pretty low,no ? 
> well, you have to compare 
> - pure a SSD (via PCIe or SAS-6G) vs. 
> - Ceph-Journal, which goes 2x over 10GbE with IP 
> Client -> primary-copy -> 2nd-copy 
> (= redundancy over Ethernet distance) 
> 
> I'm curious about the answer from Inktank, 
> 
> -Dieter 
> 
> > 
> > 
> > for @intank: 
> > 
> > Is their a bottleneck somewhere in ceph ? 
> Maybe "SimpleMessenger dispatching: cause of performance problems?" 
> from Thu, 16 Aug 2012 18:08:39 +0200 
> by <andreas.bluemle@xxxxxxxxxxx> 
> can be an answer. 
> Especially if a small number of OSDs is used. 
> 
> > 
> > I said that, because I would like to know if it's scale by adding new nodes. 
> > 
> > Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list) 
> > 
> > 
> > ----- Mail original ----- 
> > 
> > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx> 
> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
> > Cc: ceph-devel@xxxxxxxxxxxxxxx 
> > Envoyé: Jeudi 30 Août 2012 17:33:42 
> > Objet: Re: RBD performance - tuning hints 
> > 
> > On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote: 
> > > Thanks for the report ! 
> > > 
> > > vs your first benchmark, it's with RBD 4M or 64K ? 
> > with 4MB (see attached config info) 
> > 
> > Cheers, 
> > -Dieter 
> > 
> > > 
> > > (how much ssd by node?) 
> > 8x SSD, 200GB each 
> > 
> > > 
> > > 
> > > 
> > > ----- Mail original ----- 
> > > 
> > > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx> 
> > > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx> 
> > > Cc: ceph-devel@xxxxxxxxxxxxxxx 
> > > Envoyé: Jeudi 30 Août 2012 16:56:34 
> > > Objet: Re: RBD performance - tuning hints 
> > > 
> > > Hi Alexandre, 
> > > 
> > > with the 4 filestore parameter below some fio values could be increased: 
> > > filestore max sync interval = 30 
> > > filestore min sync interval = 29 
> > > filestore flusher = false 
> > > filestore queue max ops = 10000 
> > > 
> > > ###### IOPS 
> > > fio_read_4k_64: 9373 
> > > fio_read_4k_128: 9939 
> > > fio_randwrite_8k_16: 12376 
> > > fio_randwrite_4k_16: 13315 
> > > fio_randwrite_512_32: 13660 
> > > fio_randwrite_8k_32: 17318 
> > > fio_randwrite_4k_32: 18057 
> > > fio_randwrite_8k_64: 19693 
> > > fio_randwrite_512_64: 20015 <<< 
> > > fio_randwrite_4k_64: 20024 <<< 
> > > fio_randwrite_8k_128: 20547 <<< 
> > > fio_randwrite_4k_128: 20839 <<< 
> > > fio_randwrite_512_128: 21417 <<< 
> > > fio_randread_8k_128: 48872 
> > > fio_randread_4k_128: 50002 
> > > fio_randread_512_128: 51202 
> > > 
> > > ###### MB/s 
> > > fio_randread_2m_32: 628 
> > > fio_read_4m_64: 630 
> > > fio_randread_8m_32: 633 
> > > fio_read_2m_32: 637 
> > > fio_read_4m_16: 640 
> > > fio_randread_4m_16: 652 
> > > fio_write_2m_32: 660 
> > > fio_randread_4m_32: 677 
> > > fio_read_4m_32: 678 
> > > (...) 
> > > fio_write_4m_64: 771 
> > > fio_randwrite_2m_64: 789 
> > > fio_write_8m_128: 796 
> > > fio_write_4m_32: 802 
> > > fio_randwrite_4m_128: 807 <<< 
> > > fio_randwrite_2m_32: 811 <<< 
> > > fio_write_2m_128: 833 <<< 
> > > fio_write_8m_64: 901 <<< 
> > > 
> > > Best Regards, 
> > > -Dieter 
> > > 
> > > 
> > > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote: 
> > > > Nice results ! 
> > > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ? 
> > > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster) 
> > > > 
> > > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > > I think you can try to tune these values 
> > > > 
> > > > filestore max sync interval = 30 
> > > > filestore min sync interval = 29 
> > > > filestore flusher = false 
> > > > filestore queue max ops = 10000 
> > > > 
> > > > 
> > > > 
> > > > ----- Mail original ----- 
> > > > 
> > > > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx> 
> > > > À: ceph-devel@xxxxxxxxxxxxxxx 
> > > > Cc: "Dieter Kasper (KD)" <d.kasper@xxxxxxxxxxxx> 
> > > > Envoyé: Mardi 28 Août 2012 19:48:42 
> > > > Objet: RBD performance - tuning hints 
> > > > 
> > > > Hi, 
> > > > 
> > > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details) 
> > > > I can observe a pretty nice rados bench performance 
> > > > (see bench-rados.txt for details): 
> > > > 
> > > > Bandwidth (MB/sec): 961.710 
> > > > Max bandwidth (MB/sec): 1040 
> > > > Min bandwidth (MB/sec): 772 
> > > > 
> > > > 
> > > > Also the bandwidth performance generated with 
> > > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads} 
> > > > 
> > > > .... is acceptable, e.g. 
> > > > fio_write_4m_16 795 MB/s 
> > > > fio_randwrite_8m_128 717 MB/s 
> > > > fio_randwrite_8m_16 714 MB/s 
> > > > fio_randwrite_2m_32 692 MB/s 
> > > > 
> > > > 
> > > > But, the write IOPS seems to be limited around 19k ... 
> > > > RBD 4M 64k (= optimal_io_size) 
> > > > fio_randread_512_128 53286 55925 
> > > > fio_randread_4k_128 51110 44382 
> > > > fio_randread_8k_128 30854 29938 
> > > > fio_randwrite_512_128 18888 2386 
> > > > fio_randwrite_512_64 18844 2582 
> > > > fio_randwrite_8k_64 17350 2445 
> > > > (...) 
> > > > fio_read_4k_128 10073 53151 
> > > > fio_read_4k_64 9500 39757 
> > > > fio_read_4k_32 9220 23650 
> > > > (...) 
> > > > fio_read_4k_16 9122 14322 
> > > > fio_write_4k_128 2190 14306 
> > > > fio_read_8k_32 706 13894 
> > > > fio_write_4k_64 2197 12297 
> > > > fio_write_8k_64 3563 11705 
> > > > fio_write_8k_128 3444 11219 
> > > > 
> > > > 
> > > > Any hints for tuning the IOPS (read and/or write) would be appreciated. 
> > > > 
> > > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full) 
> > > > 
> > > > 
> > > > Kind Regards, 
> > > > -Dieter 
> > > > 
> > > > 
> > > > 
> > > > -- 
> > > > 
> > > > -- 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Alexandre D e rumier 
> > > > 
> > > > Ingénieur Systèmes et Réseaux 
> > > > 
> > > > 
> > > > Fixe : 03 20 68 88 85 
> > > > 
> > > > Fax : 03 20 68 90 88 
> > > > 
> > > > 
> > > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > > 12 rue Marivaux 75002 Paris 
> > > > -- 
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx 
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> > > 
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > > -- 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Alexandre D e rumier 
> > > 
> > > Ingénieur Systèmes et Réseaux 
> > > 
> > > 
> > > Fixe : 03 20 68 88 85 
> > > 
> > > Fax : 03 20 68 90 88 
> > > 
> > > 
> > > 45 Bvd du Général Leclerc 59100 Roubaix 
> > > 12 rue Marivaux 75002 Paris 
> > > 
> > 
> > 
> > 
> > -- 
> > 
> > -- 
> > 
> > 
> > 
> > 
> > 
> > Alexandre D e rumier 
> > 
> > Ingénieur Systèmes et Réseaux 
> > 
> > 
> > Fixe : 03 20 68 88 85 
> > 
> > Fax : 03 20 68 90 88 
> > 
> > 
> > 45 Bvd du Général Leclerc 59100 Roubaix 
> > 12 rue Marivaux 75002 Paris 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
> > the body of a message to majordomo@xxxxxxxxxxxxxxx 
> > More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
> 
> -- 
> 
> -- 
> 
> 
> 
> 	
> 
> Alexandre D e rumier 
> 
> Ingénieur Systèmes et Réseaux 
> 
> 
> Fixe : 03 20 68 88 85 
> 
> Fax : 03 20 68 90 88 
> 
> 
> 45 Bvd du Général Leclerc 59100 Roubaix 
> 12 rue Marivaux 75002 Paris 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Principal Consultant, Data Center Storage Architecture and Technology
FTS CTO
FUJITSU TECHNOLOGY SOLUTIONS GMBH
Mies-van-der-Rohe-Straße 8 / 4F
80807 München
Germany

Telephone:      +49 89 62060     1898
Telefax:	+49 89 62060 329 1898
Mobile: 	+49 170 8563173
Email:  	dieter.kasper@xxxxxxxxxxxxxx
Internet:       http://ts.fujitsu.com
Company Details: http://ts.fujitsu.com/imprint.html

Attachment: ceph-replication-acks.png
Description: PNG image


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux