Hi, Thanks. I did not know about atop, nice tool... and I don't seem to be IRQ overloaded - I can reach 100% cpu % for IRQs, but that's shared across all 8 physical cores. I also discovered "turbostat" which showed me the R510s were not configured for "performance" in the bios (but dbpm - demand based power management), and were not bumping the CPUs frequency to 2.4GHz as they should... only apparently remaining at 1.6Ghz... But changing that did not improve things unfortunately. I know have CPUs using their xeon turbo frequency, but no throughput improvement. Looking at RPS/ RSS, it looks like our Broadcom cards are configured correctly according to redhat, i.e : one receive queue per physical core, spreading the IRQ load everywhere. One thing I noticed though is that the dell BIOS allows to change IRQs... but once you change the network card IRQ, it also changes the RAID card IRQ as well as many others, all sharing the same bios IRQ (that's therefore apparently a useless option). Weird. Still attempting to determine the bottleneck ;) Regards Frederic -----Message d'origine----- De : Christian Balzer [mailto:chibi@xxxxxxx] Envoyé : jeudi 23 juillet 2015 14:18 À : ceph-users@xxxxxxxxxxxxxx Cc : Gregory Farnum; SCHAER Frederic Objet : Re: Ceph 0.94 (and lower) performance on >1 hosts ?? On Thu, 23 Jul 2015 11:14:22 +0100 Gregory Farnum wrote: > Your note that dd can do 2GB/s without networking makes me think that > you should explore that. As you say, network interrupts can be > problematic in some systems. The only thing I can think of that's been > really bad in the past is that some systems process all network > interrupts on cpu 0, and you probably want to make sure that it's > splitting them across CPUs. > An IRQ overload would be very visible with atop. Splitting the IRQs will help, but it is likely to need some smarts. As in, irqbalance may spread things across NUMA nodes. A card with just one IRQ line will need RPS (Receive Packet Steering), irqbalance can't help it. For example, I have a compute node with such a single line card and Quad Opterons (64 cores, 8 NUMA nodes). The default is all interrupt handling on CPU0 and that is very little, except for eth2. So this gets a special treatment: --- echo 4 >/proc/irq/106/smp_affinity_list --- Pinning the IRQ for eth2 to CPU 4 by default --- echo f0 > /sys/class/net/eth2/queues/rx-0/rps_cpus --- giving RPS CPUs 4-7 to work with. At peak times it needs more than 2 cores, otherwise with this architecture just using 4 and 5 (same L2 cache) would be better. Regards, Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com