Re: Ceph 0.94 (and lower) performance on >1 hosts ??

SCHAER Frederic <frederic.schaer@xxxxxx> · Tue, 28 Jul 2015 15:48:15 +0000

Hi again,

So I have tried 
- changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
- changing the memory configuration, from "advanced ecc mode" to "performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
- plugged a second 10GB/s link and setup a ceph internal network
- tried various "tuned-adm profile" such as "throughput-performance"

This changed about nothing.

If 
- the CPUs are not maxed out, and lowering the frequency doesn't change a thing
- the network is not maxed out
- the memory doesn't seem to have an impact
- network interrupts are spread across all 8 cpu cores and receive queues are OK
- disks are not used at their maximum potential (iostat shows my dd commands produce much more tps than the 4MB ceph transfers...)

Where can I possibly find a bottleneck ?????

I'm /(almost) out of ideas/ ... :'(

Regards

-----Message d'origine-----
De : ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de SCHAER Frederic
Envoyé : vendredi 24 juillet 2015 16:04
À : Christian Balzer; ceph-users@xxxxxxxxxxxxxx
Objet : [PROVENANCE INTERNET] Re:  Ceph 0.94 (and lower) performance on >1 hosts ??

Hi,

Thanks.
I did not know about atop, nice tool... and I don't seem to be IRQ overloaded - I can reach 100% cpu % for IRQs, but that's shared across all 8 physical cores.
I also discovered "turbostat" which showed me the R510s were not configured for "performance" in the bios (but dbpm - demand based power management), and were not bumping the CPUs frequency to 2.4GHz as they should... only apparently remaining at 1.6Ghz...

But changing that did not improve things unfortunately. I know have CPUs  using their xeon turbo frequency, but no throughput improvement.

Looking at RPS/ RSS, it looks like our Broadcom cards are configured correctly according to redhat, i.e : one receive queue per physical core, spreading the IRQ load everywhere.
One thing I noticed though is that the dell BIOS allows to change IRQs... but once you change the network card IRQ, it also changes the RAID card IRQ as well as many others, all sharing the same bios IRQ (that's therefore apparently a useless option). Weird.

Still attempting to determine the bottleneck ;)

Regards
Frederic

-----Message d'origine-----
De : Christian Balzer [mailto:chibi@xxxxxxx] 
Envoyé : jeudi 23 juillet 2015 14:18
À : ceph-users@xxxxxxxxxxxxxx
Cc : Gregory Farnum; SCHAER Frederic
Objet : Re:  Ceph 0.94 (and lower) performance on >1 hosts ??

On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote:

> Your note that dd can do 2GB/s without networking makes me think that
> you should explore that. As you say, network interrupts can be
> problematic in some systems. The only thing I can think of that's been
> really bad in the past is that some systems process all network
> interrupts on cpu 0, and you probably want to make sure that it's
> splitting them across CPUs.
>

An IRQ overload would be very visible with atop.

Splitting the IRQs will help, but it is likely to need some smarts.

As in, irqbalance may spread things across NUMA nodes.

A card with just one IRQ line will need RPS (Receive Packet Steering),
irqbalance can't help it.

For example, I have a compute node with such a single line card and Quad
Opterons (64 cores, 8 NUMA nodes).

The default is all interrupt handling on CPU0 and that is very little,
except for eth2. So this gets a special treatment:
---
echo 4 >/proc/irq/106/smp_affinity_list
---
Pinning the IRQ for eth2 to CPU 4 by default

---
echo f0 > /sys/class/net/eth2/queues/rx-0/rps_cpus
---
giving RPS CPUs 4-7 to work with. At peak times it needs more than 2
cores, otherwise with this architecture just using 4 and 5 (same L2 cache)
would be better.

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com