Please do not apply any optimization without benchmarking *before* and
*after* in a somewhat realistic scenario.
No, iperf is likely not a realistic setup because it will usually be
limited by available network bandwidth which is (should) rarely be
maxed out on your actual Ceph setup.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io <http://www.croit.io>
Tel: +49 89 1896585 90
On Fri, May 29, 2020 at 2:15 AM Dave Hall <kdhall@xxxxxxxxxxxxxx
<mailto:kdhall@xxxxxxxxxxxxxx>> wrote:
Hello.
A few days ago I offered to share the notes I've compiled on network
tuning. Right now it's a Google Doc:
https://docs.google.com/document/d/1nB5fzIeSgQF0ti_WN-tXhXAlDh8_f8XF9GhU7J1l00g/edit?usp=sharing
I've set it up to allow comments and I'd be glad for questions and
feedback. If Google Docs not an acceptable format I'll try to put
it up
somewhere as HTML or Wiki. Disclosure: some sections were copied
verbatim from other sources.
Regarding the current discussion about iperf, the likely
bottleneck is
buffering. There is a per-NIC output queue set with 'ip link' and
a per
CPU core input queue set with 'sysctl'. Both should be set to some
multiple of the frame size based on calculations related to link
speed
and latency. Jumping from 1500 to 9000 could negatively impact
performance because one buffer or the other might be 1500 bytes
short of
a low multiple of 9000.
It would be interesting to see the iperf tests repeated with
corresponding buffer sizing. I will perform this experiment as
soon as
I complete some day-job tasks.
-Dave
Dave Hall
Binghamton University
kdhall@xxxxxxxxxxxxxx <mailto:kdhall@xxxxxxxxxxxxxx>
607-760-2328 (Cell)
607-777-4641 (Office)
On 5/27/2020 6:51 AM, EDH - Manuel Rios wrote:
> Anyone can share their table with other MTU values?
>
> Also interested into Switch CPU load
>
> KR,
> Manuel
>
> -----Mensaje original-----
> De: Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx
<mailto:M.Roos@xxxxxxxxxxxxxxxxx>>
> Enviado el: miércoles, 27 de mayo de 2020 12:01
> Para: chris.palmer <chris.palmer@xxxxxxxxx
<mailto:chris.palmer@xxxxxxxxx>>; paul.emmerich
<paul.emmerich@xxxxxxxx <mailto:paul.emmerich@xxxxxxxx>>
> CC: amudhan83 <amudhan83@xxxxxxxxx
<mailto:amudhan83@xxxxxxxxx>>; anthony.datri
<anthony.datri@xxxxxxxxx <mailto:anthony.datri@xxxxxxxxx>>;
ceph-users <ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>>;
doustar <doustar@xxxxxxxxxxxx <mailto:doustar@xxxxxxxxxxxx>>;
kdhall <kdhall@xxxxxxxxxxxxxx <mailto:kdhall@xxxxxxxxxxxxxx>>;
sstkadu <sstkadu@xxxxxxxxx <mailto:sstkadu@xxxxxxxxx>>
> Asunto: Re: [External Email] Re: Ceph Nautius not
working after setting MTU 9000
>
>
> Interesting table. I have this on a production cluster 10gbit at a
> datacenter (obviously doing not that much).
>
>
> [@]# iperf3 -c 10.0.0.13 -P 1 -M 9000
> Connecting to host 10.0.0.13, port 5201
> [ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 1.14 GBytes 9.77 Gbits/sec 0 690 KBytes
> [ 4] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.08 MBytes
> [ 4] 2.00-3.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
> [ 4] 3.00-4.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
> [ 4] 4.00-5.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
> [ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.21 MBytes
> [ 4] 6.00-7.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
> [ 4] 7.00-8.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.21 MBytes
> [ 4] 8.00-9.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
> [ 4] 9.00-10.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec 0
> sender
> [ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec
> receiver
>
>
> -----Original Message-----
> Subject: Re: Re: [External Email] Re: Ceph Nautius not
> working after setting MTU 9000
>
> To elaborate on some aspects that have been mentioned already
and add
> some others::
>
>
> * Test using iperf3.
>
> * Don't try to use jumbos on networks where you don't have
complete
> control over every host. This usually includes the main ceph
network.
> It's just too much grief. You can consider using it for
limited-access
> networks (e.g. ceph cluster network, hypervisor migration
network, etc)
> where you know every switch & host is tuned correctly. (This
works even
> when those nets share a vlan trunk with non-jumbo vlans - just
set the
> max value on the trunk itself, and individual values on each vlan.)
>
> * If you are pinging make sure it doesn't fragment otherwise you
> will get misleading results: e.g. ping -M do -s 9000 x.x.x.x
> * Do not assume that 9000 is the best value. It depends on your
> NICs, your switch, kernel/device parameters, etc. Try different
values
> (using iperf3). As an example the results below are using a
small cheap
> Mikrotek 10G switch and HPE 10G NICs. It highlights how in this
> configuration 9000 is worse than 1500, but that 5139 is optimal
yet 5140
> is worst. The same pattern (obviously with different values) was
> apparent when multiple tests were run concurrently. Always test
your own
> network in a controlled manner. And of course if you introduce
anything
> different later on, test again. With enterprise-grade kit this
might not
> be so common, but always test if you fiddle.
>
>
> MTU Gbps (actual data transfer values using iperf3) - one
particular
> configuration only
>
> 9600 8.91 (max value)
> 9000 8.91
> 8000 8.91
> 7000 8.91
> 6000 8.91
> 5500 8.17
> 5200 7.71
> 5150 7.64
> 5140 7.62
> 5139 9.81 (optimal)
> 5138 9.81
> 5137 9.81
> 5135 9.81
> 5130 9.81
> 5120 9.81
> 5100 9.81
> 5000 9.81
> 4000 9.76
> 3000 9.68
> 2000 9.28
> 1500 9.37 (default)
>
>
> Whether any of this will make a tangible difference for ceph is
moot. I
> just spend a little time getting the network stack correct as above,
> then leave it. That way I know I am probably getting some
benefit, and
> not doing any harm. If you blindly change things you may well do
harm
> that can manifest itself in all sorts of ways outside of Ceph.
Getting
> some test results for this using Ceph will be easy; getting
MEANINGFUL
> results that way will be hard.
>
>
> Chris
>
>
> On 27/05/2020 09:25, Marc Roos wrote:
>
>
>
>
> I would not call a ceph page, a random tuning tip. At
least I hope
> they
> are not. NVMe-only with 100Gbit is not really a standard
setup. I
> assume
> with such setup you have the luxury to not notice many
> optimizations.
>
> What I mostly read is that changing to mtu 9000 will allow
you to
> better
> saturate the 10Gbit adapter, and I expect this to show on
a low end
> busy
> cluster. Don't you have any test results of such a setup?
>
>
>
>
> -----Original Message-----
>
> Subject: Re: Re: [External Email] Re: Ceph
Nautius not
>
> working after setting MTU 9000
>
> Don't optimize stuff without benchmarking *before and
after*, don't
>
> apply random tuning tipps from the Internet without
benchmarking
> them.
>
> My experience with Jumbo frames: 3% performance. On a
NVMe-only
> setup
> with 100 Gbit/s network.
>
> Paul
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at
> https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io <http://www.croit.io>
> Tel: +49 89 1896585 90
>
> On Tue, May 26, 2020 at 7:02 PM Marc Roos
> <M.Roos@xxxxxxxxxxxxxxxxx <mailto:M.Roos@xxxxxxxxxxxxxxxxx>>
<mailto:M.Roos@xxxxxxxxxxxxxxxxx <mailto:M.Roos@xxxxxxxxxxxxxxxxx>>
> wrote:
>
>
>
>
> Look what I have found!!! :)
> https://ceph.com/geen-categorie/ceph-loves-jumbo-frames/
>
>
>
> -----Original Message-----
> From: Anthony D'Atri
[mailto:anthony.datri@xxxxxxxxx <mailto:anthony.datri@xxxxxxxxx>]
> Sent: maandag 25 mei 2020 22:12
> To: Marc Roos
> Cc: kdhall; martin.verges; sstkadu; amudhan83;
ceph-users;
> doustar
> Subject: Re: Re: [External Email] Re:
Ceph
> Nautius not
>
> working after setting MTU 9000
>
> Quick and easy depends on your network infrastructure.
> Sometimes
> it is
> difficult or impossible to retrofit a live cluster
without
> disruption.
>
>
> > On May 25, 2020, at 1:03 AM, Marc Roos
> <M.Roos@xxxxxxxxxxxxxxxxx <mailto:M.Roos@xxxxxxxxxxxxxxxxx>>
<mailto:M.Roos@xxxxxxxxxxxxxxxxx <mailto:M.Roos@xxxxxxxxxxxxxxxxx>>
>
> wrote:
> >
> >
> > I am interested. I am always setting mtu to
9000. To be
> honest I
> > cannot imagine there is no optimization since
you have less
> interrupt
> > requests, and you are able x times as much data.
Every time
> there
>
> > something written about optimizing the first
thing mention
> is
> changing
>
> > to the mtu 9000. Because it is quick and easy win.
> >
> >
> >
> >
> > -----Original Message-----
> > From: Dave Hall [mailto:kdhall@xxxxxxxxxxxxxx
<mailto:kdhall@xxxxxxxxxxxxxx>]
> > Sent: maandag 25 mei 2020 5:11
> > To: Martin Verges; Suresh Rama
> > Cc: Amudhan P; Khodayar Doustar; ceph-users
> > Subject: Re: [External Email] Re:
Ceph Nautius
> not
> > working after setting MTU 9000
> >
> > All,
> >
> > Regarding Martin's observations about Jumbo
Frames....
> >
> > I have recently been gathering some notes from
various
> internet
> > sources regarding Linux network performance, and
Linux
> performance in
> > general, to be applied to a Ceph cluster I
manage but also
> to the
> rest
>
> > of the Linux server farm I'm responsible for.
> >
> > In short, enabling Jumbo Frames without also
tuning a number
> of
> other
> > kernel and NIC attributes will not provide the
performance
> increases
> > we'd like to see. I have not yet had a chance
to go through
> the
> rest
> > of the testing I'd like to do, but I can
confirm (via
> iperf3)
> that
> > only enabling Jumbo Frames didn't make a significant
> difference.
> >
> > Some of the other attributes I'm referring to
are incoming
> and
> > outgoing buffer sizes at the NIC, IP, and TCP
levels,
> interrupt
> > coalescing, NIC offload functions that should or
shouldn't
> be
> turned
> > on, packet queuing disciplines (tc), the best
choice of TCP
> slow-start
>
> > algorithms, and other TCP features and attributes.
> >
> > The most off-beat item I saw was something about
adding
> IPTABLES
> rules
>
> > to bypass CONNTRACK table lookups.
> >
> > In order to do anything meaningful to assess the
effect of
> all of
>
> > these settings I'd like to figure out how to set
them all
> via
> Ansible
> > - so more to learn before I can give opinions.
> >
> > --> If anybody has added this type of
configuration to Ceph
>
> Ansible,
> > I'd be glad for some pointers.
> >
> > I have started to compile a document containing
my notes.
> It's
> rough,
>
> > but I'd be glad to share if anybody is interested.
> >
> > -Dave
> >
> > Dave Hall
> > Binghamton University
> >
> >> On 5/24/2020 12:29 PM, Martin Verges wrote:
> >>
> >> Just save yourself the trouble. You won't have
any real
> benefit
> from
> > MTU
> >> 9000. It has some smallish, but it is not worth
the effort,
>
> problems,
> > and
> >> loss of reliability for most environments.
> >> Try it yourself and do some benchmarks,
especially with
> your
> regular
> >> workload on the cluster (not the maximum peak
performance),
> then
> drop
> > the
> >> MTU to default ;).
> >>
> >> Please if anyone has other real world
benchmarks showing
> huge
> > differences
> >> in regular Ceph clusters, please feel free to
post it here.
> >>
> >> --
> >> Martin Verges
> >> Managing director
> >>
> >> Mobile: +49 174 9335695
> >> E-Mail: martin.verges@xxxxxxxx
<mailto:martin.verges@xxxxxxxx>
> >> Chat: https://t.me/MartinVerges
> >>
> >> croit GmbH, Freseniusstr. 31h, 81247 Munich
> >> CEO: Martin Verges - VAT-ID: DE310638492 Com.
register:
> Amtsgericht
> >> Munich HRB 231263
> >>
> >> Web: https://croit.io
> >> YouTube: https://goo.gl/PGE1Bx
> >>
> >>
> >>> Am So., 24. Mai 2020 um 15:54 Uhr schrieb
Suresh Rama
> >> <sstkadu@xxxxxxxxx <mailto:sstkadu@xxxxxxxxx>>
<mailto:sstkadu@xxxxxxxxx <mailto:sstkadu@xxxxxxxxx>> :
> >>
> >>> Ping with 9000 MTU won't get response as I
said and it
> should
> be
> > 8972. Glad
> >>> it is working but you should know what
happened to avoid
> this
> issue
> > later.
> >>>
> >>>> On Sun, May 24, 2020, 3:04 AM Amudhan P
> <amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
<mailto:amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
> wrote:
> >>>
> >>>> No, ping with MTU size 9000 didn't work.
> >>>>
> >>>> On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar
> > <doustar@xxxxxxxxxxxx
<mailto:doustar@xxxxxxxxxxxx>> <mailto:doustar@xxxxxxxxxxxx
<mailto:doustar@xxxxxxxxxxxx>>
> >>>> wrote:
> >>>>
> >>>>> Does your ping work or not?
> >>>>>
> >>>>>
> >>>>> On Sun, May 24, 2020 at 6:53 AM Amudhan P
> <amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
<mailto:amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
> > wrote:
> >>>>>
> >>>>>> Yes, I have set setting on the switch side
also.
> >>>>>>
> >>>>>> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar,
> > <doustar@xxxxxxxxxxxx
<mailto:doustar@xxxxxxxxxxxx>> <mailto:doustar@xxxxxxxxxxxx
<mailto:doustar@xxxxxxxxxxxx>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Problem should be with network. When you
change MTU it
>
> should be
> >>>> changed
> >>>>>>> all over the network, any single hup on
your network
> should
>
> >>>>>>> speak
> > and
> >>>>>>> accept 9000 MTU packets. you can check it
on your
> hosts
> with
> >>> "ifconfig"
> >>>>>>> command and there is also equivalent
commands for
> other
> >>>> network/security
> >>>>>>> devices.
> >>>>>>>
> >>>>>>> If you have just one node which it not
correctly
> configured
> for
> > MTU
> >>>> 9000
> >>>>>>> it wouldn't work.
> >>>>>>>
> >>>>>>> On Sat, May 23, 2020 at 2:30 PM
sinan@xxxxxxxx <mailto:sinan@xxxxxxxx>
> <sinan@xxxxxxxx <mailto:sinan@xxxxxxxx>>
<mailto:sinan@xxxxxxxx <mailto:sinan@xxxxxxxx>>
> >>> wrote:
> >>>>>>>> Can the servers/nodes ping eachother
using large
> packet
> sizes?
> >>>>>>>> I
> >>> guess
> >>>>>>>> not.
> >>>>>>>>
> >>>>>>>> Sinan Polat
> >>>>>>>>
> >>>>>>>>> Op 23 mei 2020 om 14:21 heeft Amudhan P
> <amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
<mailto:amudhan83@xxxxxxxxx <mailto:amudhan83@xxxxxxxxx>>
> > het
> >>>>>>>> volgende geschreven:
> >>>>>>>>> In OSD logs "heartbeat_check: no reply
from OSD"
> >>>>>>>>>
> >>>>>>>>>> On Sat, May 23, 2020 at 5:44 PM Amudhan P
> > <amudhan83@xxxxxxxxx
<mailto:amudhan83@xxxxxxxxx>> <mailto:amudhan83@xxxxxxxxx
<mailto:amudhan83@xxxxxxxxx>>
> >>>>>>>> wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have set Network switch with MTU size
9000 and
> also in
> my
> >>> netplan
> >>>>>>>>>> configuration.
> >>>>>>>>>>
> >>>>>>>>>> What else needs to be checked?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Sat, May 23, 2020 at 3:39 PM Wido den Hollander
> <
> >>> wido@xxxxxxxx <mailto:wido@xxxxxxxx>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On 5/23/20 12:02 PM, Amudhan P wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am using ceph Nautilus in Ubuntu 18.04 working
> fine
> wit
> > MTU
> >>>> size
> >>>>>>>> 1500
> >>>>>>>>>>>> (default) recently i tried to update MTU size to
> 9000.
> >>>>>>>>>>>> After setting Jumbo frame running ceph -s is
> timing
> out.
> >>>>>>>>>>> Ceph can run just fine with an MTU of 9000. But
> there
> is
> >>> probably
> >>>>>>>>>>> something else wrong on the network which is
> causing
> this.
> >>>>>>>>>>>
> >>>>>>>>>>> Check the Jumbo Frames settings on all the
> switches as
> well
> > to
> >>>> make
> >>>>>>>> sure
> >>>>>>>>>>> they forward all the packets.
> >>>>>>>>>>>
> >>>>>>>>>>> This is definitely not a Ceph issue.
> >>>>>>>>>>>
> >>>>>>>>>>> Wido
> >>>>>>>>>>>
> >>>>>>>>>>>> regards
> >>>>>>>>>>>> Amudhan P
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To
> >>>>>>>>>>>> unsubscribe send an email to
> ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
> >>>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To
> unsubscribe
>
> >>>>>>>>>>> send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >>>>>>>>>>>
> >>>>>>>>>
_______________________________________________
> >>>>>>>>> ceph-users mailing list --
ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> To
> unsubscribe
> >>>>>>>>> send an email to
ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
> >>>>>>>>
_______________________________________________
> >>>>>>>> ceph-users mailing list --
ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> To
> unsubscribe
> >>>>>>>> send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >>>>>>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To
> unsubscribe
> send
> >>>> an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >>>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To
> unsubscribe
> send an
>
> >>> email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >>>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To
> unsubscribe
> send an
> >> email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To unsubscribe
> send
> an
> > email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To unsubscribe
> send
> an
> > email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to
ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>