Re: RBD performance - tuning hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 30, 2012 at 9:48 AM, Dieter Kasper <d.kasper@xxxxxxxxxxxx> wrote:
> On Thu, Aug 30, 2012 at 06:12:11PM +0200, Alexandre DERUMIER wrote:
>> >>well, you have to compare
>> >>- pure a SSD (via PCIe or SAS-6G)        vs.
>> >>- Ceph-Journal, which goes 2x over 10GbE with IP
>> >>  Client -> primary-copy -> 2nd-copy
>> >>  (= redundancy over Ethernet distance)
>>
>> Sure but the first osd ack to the client,before replicating to the others osd.
> no
>
>>
>> Client -> primary-copy -> 2nd-copy
>>        <-ack
>>          primary-copy -> 2nd-copy
>>                       -> 3st-copy
>>
>> Or I'm wrong ?
> yes,
> please have a look at the attached file: ceph-replication-acks.png
> The client usually will continue on 'ACK' and not wait for the 'commit'.
>
> BTW. all my journals are in RAM (/dev/ramX)
> 32x 2GB = 32GB of data with replica 2x
>
> If "filestore min/max sync interval" is set to 99999999
> data should 'never' be written to OSD
> ('never' at least during the tests if the written data is < 32GB)

I believe it actually will start syncing to disk when the journal is
half full (right, Sam?) — and even if it doesn't sync, there's a
reasonable chance that some of the data will be written out to disk in
the background (though that shouldn't slow anything down, of course).
:)
-Greg


>
> In such a configuration only the Ceph-Code and the Interconnect (10GbE/IP) would be the brakeman.
>
> Cheers,
> -Dieter
>
>
>>
>>
>> ----- Mail original -----
>>
>> De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx>
>> À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
>> Cc: ceph-devel@xxxxxxxxxxxxxxx, "Andreas Bluemle" <andreas.bluemle@xxxxxxxxxxx>
>> Envoyé: Jeudi 30 Août 2012 18:02:05
>> Objet: Re: RBD performance - tuning hints
>>
>> On Thu, Aug 30, 2012 at 05:46:35PM +0200, Alexandre DERUMIER wrote:
>> > Thanks
>> >
>> > >> 8x SSD, 200GB each
>> >
>> > 20000 iops seem pretty low,no ?
>> well, you have to compare
>> - pure a SSD (via PCIe or SAS-6G) vs.
>> - Ceph-Journal, which goes 2x over 10GbE with IP
>> Client -> primary-copy -> 2nd-copy
>> (= redundancy over Ethernet distance)
>>
>> I'm curious about the answer from Inktank,
>>
>> -Dieter
>>
>> >
>> >
>> > for @intank:
>> >
>> > Is their a bottleneck somewhere in ceph ?
>> Maybe "SimpleMessenger dispatching: cause of performance problems?"
>> from Thu, 16 Aug 2012 18:08:39 +0200
>> by <andreas.bluemle@xxxxxxxxxxx>
>> can be an answer.
>> Especially if a small number of OSDs is used.
>>
>> >
>> > I said that, because I would like to know if it's scale by adding new nodes.
>> >
>> > Does Intank have already done some random iops benchmark ? (I always see sequential throughput bench in the mailing list)
>> >
>> >
>> > ----- Mail original -----
>> >
>> > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx>
>> > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
>> > Cc: ceph-devel@xxxxxxxxxxxxxxx
>> > Envoyé: Jeudi 30 Août 2012 17:33:42
>> > Objet: Re: RBD performance - tuning hints
>> >
>> > On Thu, Aug 30, 2012 at 05:28:02PM +0200, Alexandre DERUMIER wrote:
>> > > Thanks for the report !
>> > >
>> > > vs your first benchmark, it's with RBD 4M or 64K ?
>> > with 4MB (see attached config info)
>> >
>> > Cheers,
>> > -Dieter
>> >
>> > >
>> > > (how much ssd by node?)
>> > 8x SSD, 200GB each
>> >
>> > >
>> > >
>> > >
>> > > ----- Mail original -----
>> > >
>> > > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx>
>> > > À: "Alexandre DERUMIER" <aderumier@xxxxxxxxx>
>> > > Cc: ceph-devel@xxxxxxxxxxxxxxx
>> > > Envoyé: Jeudi 30 Août 2012 16:56:34
>> > > Objet: Re: RBD performance - tuning hints
>> > >
>> > > Hi Alexandre,
>> > >
>> > > with the 4 filestore parameter below some fio values could be increased:
>> > > filestore max sync interval = 30
>> > > filestore min sync interval = 29
>> > > filestore flusher = false
>> > > filestore queue max ops = 10000
>> > >
>> > > ###### IOPS
>> > > fio_read_4k_64: 9373
>> > > fio_read_4k_128: 9939
>> > > fio_randwrite_8k_16: 12376
>> > > fio_randwrite_4k_16: 13315
>> > > fio_randwrite_512_32: 13660
>> > > fio_randwrite_8k_32: 17318
>> > > fio_randwrite_4k_32: 18057
>> > > fio_randwrite_8k_64: 19693
>> > > fio_randwrite_512_64: 20015 <<<
>> > > fio_randwrite_4k_64: 20024 <<<
>> > > fio_randwrite_8k_128: 20547 <<<
>> > > fio_randwrite_4k_128: 20839 <<<
>> > > fio_randwrite_512_128: 21417 <<<
>> > > fio_randread_8k_128: 48872
>> > > fio_randread_4k_128: 50002
>> > > fio_randread_512_128: 51202
>> > >
>> > > ###### MB/s
>> > > fio_randread_2m_32: 628
>> > > fio_read_4m_64: 630
>> > > fio_randread_8m_32: 633
>> > > fio_read_2m_32: 637
>> > > fio_read_4m_16: 640
>> > > fio_randread_4m_16: 652
>> > > fio_write_2m_32: 660
>> > > fio_randread_4m_32: 677
>> > > fio_read_4m_32: 678
>> > > (...)
>> > > fio_write_4m_64: 771
>> > > fio_randwrite_2m_64: 789
>> > > fio_write_8m_128: 796
>> > > fio_write_4m_32: 802
>> > > fio_randwrite_4m_128: 807 <<<
>> > > fio_randwrite_2m_32: 811 <<<
>> > > fio_write_2m_128: 833 <<<
>> > > fio_write_8m_64: 901 <<<
>> > >
>> > > Best Regards,
>> > > -Dieter
>> > >
>> > >
>> > > On Wed, Aug 29, 2012 at 10:50:12AM +0200, Alexandre DERUMIER wrote:
>> > > > Nice results !
>> > > > (can you make same benchmark from a qemu-kvm guest with virtio-driver ?
>> > > > I have made some bench some month ago with stephan priebe, and we never be able to have more than 20000iops, with a full ssd 3nodes cluster)
>> > > >
>> > > > >>How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> > > > I think you can try to tune these values
>> > > >
>> > > > filestore max sync interval = 30
>> > > > filestore min sync interval = 29
>> > > > filestore flusher = false
>> > > > filestore queue max ops = 10000
>> > > >
>> > > >
>> > > >
>> > > > ----- Mail original -----
>> > > >
>> > > > De: "Dieter Kasper" <d.kasper@xxxxxxxxxxxx>
>> > > > À: ceph-devel@xxxxxxxxxxxxxxx
>> > > > Cc: "Dieter Kasper (KD)" <d.kasper@xxxxxxxxxxxx>
>> > > > Envoyé: Mardi 28 Août 2012 19:48:42
>> > > > Objet: RBD performance - tuning hints
>> > > >
>> > > > Hi,
>> > > >
>> > > > on my 4-node system (SSD + 10GbE, see bench-config.txt for details)
>> > > > I can observe a pretty nice rados bench performance
>> > > > (see bench-rados.txt for details):
>> > > >
>> > > > Bandwidth (MB/sec): 961.710
>> > > > Max bandwidth (MB/sec): 1040
>> > > > Min bandwidth (MB/sec): 772
>> > > >
>> > > >
>> > > > Also the bandwidth performance generated with
>> > > > fio --filename=/dev/rbd1 --direct=1 --rw=$io --bs=$bs --size=2G --iodepth=$threads --ioengine=libaio --runtime=60 --group_reporting --name=file1 --output=fio_${io}_${bs}_${threads}
>> > > >
>> > > > .... is acceptable, e.g.
>> > > > fio_write_4m_16 795 MB/s
>> > > > fio_randwrite_8m_128 717 MB/s
>> > > > fio_randwrite_8m_16 714 MB/s
>> > > > fio_randwrite_2m_32 692 MB/s
>> > > >
>> > > >
>> > > > But, the write IOPS seems to be limited around 19k ...
>> > > > RBD 4M 64k (= optimal_io_size)
>> > > > fio_randread_512_128 53286 55925
>> > > > fio_randread_4k_128 51110 44382
>> > > > fio_randread_8k_128 30854 29938
>> > > > fio_randwrite_512_128 18888 2386
>> > > > fio_randwrite_512_64 18844 2582
>> > > > fio_randwrite_8k_64 17350 2445
>> > > > (...)
>> > > > fio_read_4k_128 10073 53151
>> > > > fio_read_4k_64 9500 39757
>> > > > fio_read_4k_32 9220 23650
>> > > > (...)
>> > > > fio_read_4k_16 9122 14322
>> > > > fio_write_4k_128 2190 14306
>> > > > fio_read_8k_32 706 13894
>> > > > fio_write_4k_64 2197 12297
>> > > > fio_write_8k_64 3563 11705
>> > > > fio_write_8k_128 3444 11219
>> > > >
>> > > >
>> > > > Any hints for tuning the IOPS (read and/or write) would be appreciated.
>> > > >
>> > > > How can I set the variables when the Journal data have go to the OSD ? (after X seconds and/or when Y %-full)
>> > > >
>> > > >
>> > > > Kind Regards,
>> > > > -Dieter
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > --
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Alexandre D e rumier
>> > > >
>> > > > Ingénieur Systèmes et Réseaux
>> > > >
>> > > >
>> > > > Fixe : 03 20 68 88 85
>> > > >
>> > > > Fax : 03 20 68 90 88
>> > > >
>> > > >
>> > > > 45 Bvd du Général Leclerc 59100 Roubaix
>> > > > 12 rue Marivaux 75002 Paris
>> > > > --
>> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > --
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Alexandre D e rumier
>> > >
>> > > Ingénieur Systèmes et Réseaux
>> > >
>> > >
>> > > Fixe : 03 20 68 88 85
>> > >
>> > > Fax : 03 20 68 90 88
>> > >
>> > >
>> > > 45 Bvd du Général Leclerc 59100 Roubaix
>> > > 12 rue Marivaux 75002 Paris
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > --
>> >
>> >
>> >
>> >
>> >
>> > Alexandre D e rumier
>> >
>> > Ingénieur Systèmes et Réseaux
>> >
>> >
>> > Fixe : 03 20 68 88 85
>> >
>> > Fax : 03 20 68 90 88
>> >
>> >
>> > 45 Bvd du Général Leclerc 59100 Roubaix
>> > 12 rue Marivaux 75002 Paris
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>> --
>>
>> --
>>
>>
>>
>>
>>
>> Alexandre D e rumier
>>
>> Ingénieur Systèmes et Réseaux
>>
>>
>> Fixe : 03 20 68 88 85
>>
>> Fax : 03 20 68 90 88
>>
>>
>> 45 Bvd du Général Leclerc 59100 Roubaix
>> 12 rue Marivaux 75002 Paris
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Principal Consultant, Data Center Storage Architecture and Technology
> FTS CTO
> FUJITSU TECHNOLOGY SOLUTIONS GMBH
> Mies-van-der-Rohe-Straße 8 / 4F
> 80807 München
> Germany
>
> Telephone:      +49 89 62060     1898
> Telefax:        +49 89 62060 329 1898
> Mobile:         +49 170 8563173
> Email:          dieter.kasper@xxxxxxxxxxxxxx
> Internet:       http://ts.fujitsu.com
> Company Details: http://ts.fujitsu.com/imprint.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux