Please do not apply any optimization without benchmarking *before* and *after* in a somewhat realistic scenario. No, iperf is likely not a realistic setup because it will usually be limited by available network bandwidth which is (should) rarely be maxed out on your actual Ceph setup. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, May 29, 2020 at 2:15 AM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote: > Hello. > > A few days ago I offered to share the notes I've compiled on network > tuning. Right now it's a Google Doc: > > > https://docs.google.com/document/d/1nB5fzIeSgQF0ti_WN-tXhXAlDh8_f8XF9GhU7J1l00g/edit?usp=sharing > > I've set it up to allow comments and I'd be glad for questions and > feedback. If Google Docs not an acceptable format I'll try to put it up > somewhere as HTML or Wiki. Disclosure: some sections were copied > verbatim from other sources. > > Regarding the current discussion about iperf, the likely bottleneck is > buffering. There is a per-NIC output queue set with 'ip link' and a per > CPU core input queue set with 'sysctl'. Both should be set to some > multiple of the frame size based on calculations related to link speed > and latency. Jumping from 1500 to 9000 could negatively impact > performance because one buffer or the other might be 1500 bytes short of > a low multiple of 9000. > > It would be interesting to see the iperf tests repeated with > corresponding buffer sizing. I will perform this experiment as soon as > I complete some day-job tasks. > > -Dave > > Dave Hall > Binghamton University > kdhall@xxxxxxxxxxxxxx > 607-760-2328 (Cell) > 607-777-4641 (Office) > > On 5/27/2020 6:51 AM, EDH - Manuel Rios wrote: > > Anyone can share their table with other MTU values? > > > > Also interested into Switch CPU load > > > > KR, > > Manuel > > > > -----Mensaje original----- > > De: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> > > Enviado el: miércoles, 27 de mayo de 2020 12:01 > > Para: chris.palmer <chris.palmer@xxxxxxxxx>; paul.emmerich < > paul.emmerich@xxxxxxxx> > > CC: amudhan83 <amudhan83@xxxxxxxxx>; anthony.datri < > anthony.datri@xxxxxxxxx>; ceph-users <ceph-users@xxxxxxx>; doustar < > doustar@xxxxxxxxxxxx>; kdhall <kdhall@xxxxxxxxxxxxxx>; sstkadu < > sstkadu@xxxxxxxxx> > > Asunto: Re: [External Email] Re: Ceph Nautius not working > after setting MTU 9000 > > > > > > Interesting table. I have this on a production cluster 10gbit at a > > datacenter (obviously doing not that much). > > > > > > [@]# iperf3 -c 10.0.0.13 -P 1 -M 9000 > > Connecting to host 10.0.0.13, port 5201 > > [ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201 > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > [ 4] 0.00-1.00 sec 1.14 GBytes 9.77 Gbits/sec 0 690 KBytes > > [ 4] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.08 MBytes > > [ 4] 2.00-3.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > > [ 4] 3.00-4.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > > [ 4] 4.00-5.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes > > [ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.21 MBytes > > [ 4] 6.00-7.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > > [ 4] 7.00-8.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.21 MBytes > > [ 4] 8.00-9.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > > [ 4] 9.00-10.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > > [ ID] Interval Transfer Bandwidth Retr > > [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec 0 > > sender > > [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec > > receiver > > > > > > -----Original Message----- > > Subject: Re: Re: [External Email] Re: Ceph Nautius not > > working after setting MTU 9000 > > > > To elaborate on some aspects that have been mentioned already and add > > some others:: > > > > > > * Test using iperf3. > > > > * Don't try to use jumbos on networks where you don't have complete > > control over every host. This usually includes the main ceph network. > > It's just too much grief. You can consider using it for limited-access > > networks (e.g. ceph cluster network, hypervisor migration network, etc) > > where you know every switch & host is tuned correctly. (This works even > > when those nets share a vlan trunk with non-jumbo vlans - just set the > > max value on the trunk itself, and individual values on each vlan.) > > > > * If you are pinging make sure it doesn't fragment otherwise you > > will get misleading results: e.g. ping -M do -s 9000 x.x.x.x > > * Do not assume that 9000 is the best value. It depends on your > > NICs, your switch, kernel/device parameters, etc. Try different values > > (using iperf3). As an example the results below are using a small cheap > > Mikrotek 10G switch and HPE 10G NICs. It highlights how in this > > configuration 9000 is worse than 1500, but that 5139 is optimal yet 5140 > > is worst. The same pattern (obviously with different values) was > > apparent when multiple tests were run concurrently. Always test your own > > network in a controlled manner. And of course if you introduce anything > > different later on, test again. With enterprise-grade kit this might not > > be so common, but always test if you fiddle. > > > > > > MTU Gbps (actual data transfer values using iperf3) - one particular > > configuration only > > > > 9600 8.91 (max value) > > 9000 8.91 > > 8000 8.91 > > 7000 8.91 > > 6000 8.91 > > 5500 8.17 > > 5200 7.71 > > 5150 7.64 > > 5140 7.62 > > 5139 9.81 (optimal) > > 5138 9.81 > > 5137 9.81 > > 5135 9.81 > > 5130 9.81 > > 5120 9.81 > > 5100 9.81 > > 5000 9.81 > > 4000 9.76 > > 3000 9.68 > > 2000 9.28 > > 1500 9.37 (default) > > > > > > Whether any of this will make a tangible difference for ceph is moot. I > > just spend a little time getting the network stack correct as above, > > then leave it. That way I know I am probably getting some benefit, and > > not doing any harm. If you blindly change things you may well do harm > > that can manifest itself in all sorts of ways outside of Ceph. Getting > > some test results for this using Ceph will be easy; getting MEANINGFUL > > results that way will be hard. > > > > > > Chris > > > > > > On 27/05/2020 09:25, Marc Roos wrote: > > > > > > > > > > I would not call a ceph page, a random tuning tip. At least I hope > > they > > are not. NVMe-only with 100Gbit is not really a standard setup. I > > assume > > with such setup you have the luxury to not notice many > > optimizations. > > > > What I mostly read is that changing to mtu 9000 will allow you to > > better > > saturate the 10Gbit adapter, and I expect this to show on a low end > > busy > > cluster. Don't you have any test results of such a setup? > > > > > > > > > > -----Original Message----- > > > > Subject: Re: Re: [External Email] Re: Ceph Nautius not > > > > working after setting MTU 9000 > > > > Don't optimize stuff without benchmarking *before and after*, don't > > > > apply random tuning tipps from the Internet without benchmarking > > them. > > > > My experience with Jumbo frames: 3% performance. On a NVMe-only > > setup > > with 100 Gbit/s network. > > > > Paul > > > > > > -- > > Paul Emmerich > > > > Looking for help with your Ceph cluster? Contact us at > > https://croit.io > > > > croit GmbH > > Freseniusstr. 31h > > 81247 München > > www.croit.io > > Tel: +49 89 1896585 90 > > > > On Tue, May 26, 2020 at 7:02 PM Marc Roos > > <M.Roos@xxxxxxxxxxxxxxxxx> <mailto:M.Roos@xxxxxxxxxxxxxxxxx> > > wrote: > > > > > > > > > > Look what I have found!!! :) > > https://ceph.com/geen-categorie/ceph-loves-jumbo-frames/ > > > > > > > > -----Original Message----- > > From: Anthony D'Atri [mailto:anthony.datri@xxxxxxxxx] > > Sent: maandag 25 mei 2020 22:12 > > To: Marc Roos > > Cc: kdhall; martin.verges; sstkadu; amudhan83; ceph-users; > > doustar > > Subject: Re: Re: [External Email] Re: Ceph > > Nautius not > > > > working after setting MTU 9000 > > > > Quick and easy depends on your network infrastructure. > > Sometimes > > it is > > difficult or impossible to retrofit a live cluster without > > disruption. > > > > > > > On May 25, 2020, at 1:03 AM, Marc Roos > > <M.Roos@xxxxxxxxxxxxxxxxx> <mailto:M.Roos@xxxxxxxxxxxxxxxxx> > > > > wrote: > > > > > > > > > I am interested. I am always setting mtu to 9000. To be > > honest I > > > cannot imagine there is no optimization since you have > less > > interrupt > > > requests, and you are able x times as much data. Every > time > > there > > > > > something written about optimizing the first thing > mention > > is > > changing > > > > > to the mtu 9000. Because it is quick and easy win. > > > > > > > > > > > > > > > -----Original Message----- > > > From: Dave Hall [mailto:kdhall@xxxxxxxxxxxxxx] > > > Sent: maandag 25 mei 2020 5:11 > > > To: Martin Verges; Suresh Rama > > > Cc: Amudhan P; Khodayar Doustar; ceph-users > > > Subject: Re: [External Email] Re: Ceph > Nautius > > not > > > working after setting MTU 9000 > > > > > > All, > > > > > > Regarding Martin's observations about Jumbo Frames.... > > > > > > I have recently been gathering some notes from various > > internet > > > sources regarding Linux network performance, and Linux > > performance in > > > general, to be applied to a Ceph cluster I manage but > also > > to the > > rest > > > > > of the Linux server farm I'm responsible for. > > > > > > In short, enabling Jumbo Frames without also tuning a > number > > of > > other > > > kernel and NIC attributes will not provide the > performance > > increases > > > we'd like to see. I have not yet had a chance to go > through > > the > > rest > > > of the testing I'd like to do, but I can confirm (via > > iperf3) > > that > > > only enabling Jumbo Frames didn't make a significant > > difference. > > > > > > Some of the other attributes I'm referring to are > incoming > > and > > > outgoing buffer sizes at the NIC, IP, and TCP levels, > > interrupt > > > coalescing, NIC offload functions that should or > shouldn't > > be > > turned > > > on, packet queuing disciplines (tc), the best choice of > TCP > > slow-start > > > > > algorithms, and other TCP features and attributes. > > > > > > The most off-beat item I saw was something about adding > > IPTABLES > > rules > > > > > to bypass CONNTRACK table lookups. > > > > > > In order to do anything meaningful to assess the effect > of > > all of > > > > > these settings I'd like to figure out how to set them all > > via > > Ansible > > > - so more to learn before I can give opinions. > > > > > > --> If anybody has added this type of configuration to > Ceph > > > > Ansible, > > > I'd be glad for some pointers. > > > > > > I have started to compile a document containing my notes. > > It's > > rough, > > > > > but I'd be glad to share if anybody is interested. > > > > > > -Dave > > > > > > Dave Hall > > > Binghamton University > > > > > >> On 5/24/2020 12:29 PM, Martin Verges wrote: > > >> > > >> Just save yourself the trouble. You won't have any real > > benefit > > from > > > MTU > > >> 9000. It has some smallish, but it is not worth the > effort, > > > > problems, > > > and > > >> loss of reliability for most environments. > > >> Try it yourself and do some benchmarks, especially with > > your > > regular > > >> workload on the cluster (not the maximum peak > performance), > > then > > drop > > > the > > >> MTU to default ;). > > >> > > >> Please if anyone has other real world benchmarks showing > > huge > > > differences > > >> in regular Ceph clusters, please feel free to post it > here. > > >> > > >> -- > > >> Martin Verges > > >> Managing director > > >> > > >> Mobile: +49 174 9335695 > > >> E-Mail: martin.verges@xxxxxxxx > > >> Chat: https://t.me/MartinVerges > > >> > > >> croit GmbH, Freseniusstr. 31h, 81247 Munich > > >> CEO: Martin Verges - VAT-ID: DE310638492 Com. register: > > Amtsgericht > > >> Munich HRB 231263 > > >> > > >> Web: https://croit.io > > >> YouTube: https://goo.gl/PGE1Bx > > >> > > >> > > >>> Am So., 24. Mai 2020 um 15:54 Uhr schrieb Suresh Rama > > >> <sstkadu@xxxxxxxxx> <mailto:sstkadu@xxxxxxxxx> : > > >> > > >>> Ping with 9000 MTU won't get response as I said and it > > should > > be > > > 8972. Glad > > >>> it is working but you should know what happened to > avoid > > this > > issue > > > later. > > >>> > > >>>> On Sun, May 24, 2020, 3:04 AM Amudhan P > > <amudhan83@xxxxxxxxx> <mailto:amudhan83@xxxxxxxxx> > > wrote: > > >>> > > >>>> No, ping with MTU size 9000 didn't work. > > >>>> > > >>>> On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar > > > <doustar@xxxxxxxxxxxx> <mailto:doustar@xxxxxxxxxxxx> > > >>>> wrote: > > >>>> > > >>>>> Does your ping work or not? > > >>>>> > > >>>>> > > >>>>> On Sun, May 24, 2020 at 6:53 AM Amudhan P > > <amudhan83@xxxxxxxxx> <mailto:amudhan83@xxxxxxxxx> > > > wrote: > > >>>>> > > >>>>>> Yes, I have set setting on the switch side also. > > >>>>>> > > >>>>>> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar, > > > <doustar@xxxxxxxxxxxx> <mailto:doustar@xxxxxxxxxxxx> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Problem should be with network. When you change > MTU it > > > > should be > > >>>> changed > > >>>>>>> all over the network, any single hup on your > network > > should > > > > >>>>>>> speak > > > and > > >>>>>>> accept 9000 MTU packets. you can check it on your > > hosts > > with > > >>> "ifconfig" > > >>>>>>> command and there is also equivalent commands for > > other > > >>>> network/security > > >>>>>>> devices. > > >>>>>>> > > >>>>>>> If you have just one node which it not correctly > > configured > > for > > > MTU > > >>>> 9000 > > >>>>>>> it wouldn't work. > > >>>>>>> > > >>>>>>> On Sat, May 23, 2020 at 2:30 PM sinan@xxxxxxxx > > <sinan@xxxxxxxx> <mailto:sinan@xxxxxxxx> > > >>> wrote: > > >>>>>>>> Can the servers/nodes ping eachother using large > > packet > > sizes? > > >>>>>>>> I > > >>> guess > > >>>>>>>> not. > > >>>>>>>> > > >>>>>>>> Sinan Polat > > >>>>>>>> > > >>>>>>>>> Op 23 mei 2020 om 14:21 heeft Amudhan P > > <amudhan83@xxxxxxxxx> <mailto:amudhan83@xxxxxxxxx> > > > het > > >>>>>>>> volgende geschreven: > > >>>>>>>>> In OSD logs "heartbeat_check: no reply from OSD" > > >>>>>>>>> > > >>>>>>>>>> On Sat, May 23, 2020 at 5:44 PM Amudhan P > > > <amudhan83@xxxxxxxxx> <mailto:amudhan83@xxxxxxxxx> > > >>>>>>>> wrote: > > >>>>>>>>>> Hi, > > >>>>>>>>>> > > >>>>>>>>>> I have set Network switch with MTU size 9000 and > > also in > > my > > >>> netplan > > >>>>>>>>>> configuration. > > >>>>>>>>>> > > >>>>>>>>>> What else needs to be checked? > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> On Sat, May 23, 2020 at 3:39 PM Wido den > Hollander > > < > > >>> wido@xxxxxxxx > > >>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> On 5/23/20 12:02 PM, Amudhan P wrote: > > >>>>>>>>>>>> Hi, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I am using ceph Nautilus in Ubuntu 18.04 > working > > fine > > wit > > > MTU > > >>>> size > > >>>>>>>> 1500 > > >>>>>>>>>>>> (default) recently i tried to update MTU size > to > > 9000. > > >>>>>>>>>>>> After setting Jumbo frame running ceph -s is > > timing > > out. > > >>>>>>>>>>> Ceph can run just fine with an MTU of 9000. But > > there > > is > > >>> probably > > >>>>>>>>>>> something else wrong on the network which is > > causing > > this. > > >>>>>>>>>>> > > >>>>>>>>>>> Check the Jumbo Frames settings on all the > > switches as > > well > > > to > > >>>> make > > >>>>>>>> sure > > >>>>>>>>>>> they forward all the packets. > > >>>>>>>>>>> > > >>>>>>>>>>> This is definitely not a Ceph issue. > > >>>>>>>>>>> > > >>>>>>>>>>> Wido > > >>>>>>>>>>> > > >>>>>>>>>>>> regards > > >>>>>>>>>>>> Amudhan P > > >>>>>>>>>>>> > _______________________________________________ > > >>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx > To > > >>>>>>>>>>>> unsubscribe send an email to > > ceph-users-leave@xxxxxxx > > >>>>>>>>>>>> > > >>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx > To > > unsubscribe > > > > >>>>>>>>>>> send an email to ceph-users-leave@xxxxxxx > > >>>>>>>>>>> > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To > > unsubscribe > > >>>>>>>>> send an email to ceph-users-leave@xxxxxxx > > >>>>>>>> _______________________________________________ > > >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx To > > unsubscribe > > >>>>>>>> send an email to ceph-users-leave@xxxxxxx > > >>>>>>>> > > >>>> _______________________________________________ > > >>>> ceph-users mailing list -- ceph-users@xxxxxxx To > > unsubscribe > > send > > >>>> an email to ceph-users-leave@xxxxxxx > > >>>> > > >>> _______________________________________________ > > >>> ceph-users mailing list -- ceph-users@xxxxxxx To > > unsubscribe > > send an > > > > >>> email to ceph-users-leave@xxxxxxx > > >>> > > >> _______________________________________________ > > >> ceph-users mailing list -- ceph-users@xxxxxxx To > > unsubscribe > > send an > > >> email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To > unsubscribe > > send > > an > > > email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx To > unsubscribe > > send > > an > > > email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx