On Tue, Jul 28, 2015 at 11:48 AM, SCHAER Frederic <frederic.schaer@xxxxxx> wrote:
>
> Hi again,
>
> So I have tried
> - changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
> - changing the memory configuration, from "advanced ecc mode" to "performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
> - plugged a second 10GB/s link and setup a ceph internal network
> - tried various "tuned-adm profile" such as "throughput-performance"
>
> This changed about nothing.
>
> If
> - the CPUs are not maxed out, and lowering the frequency doesn't change a thing
> - the network is not maxed out
> - the memory doesn't seem to have an impact
> - network interrupts are spread across all 8 cpu cores and receive queues are OK
> - disks are not used at their maximum potential (iostat shows my dd commands produce much more tps than the 4MB ceph transfers...)
>
> Where can I possibly find a bottleneck ?????
>
> I'm /(almost) out of ideas/ ... :'(
>
> Regards
>
>
Frederic,
I was trying to optimize my ceph cluster as well and I looked at all of the same things you described, which didn't help my performance noticeably.
The following network kernel tuning settings did help me significantly.
This is my /etc/sysctl.conf file on all of my hosts: ceph mons, ceph osds and any client that connects to my ceph cluster.
# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
#net.core.rmem_max = 56623104
#net.core.wmem_max = 56623104
# Use 128M buffers
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.optmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.core.somaxconn = 1024
# Increase the length of the processor input queue
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0
# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192
# Disable source routing and redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
# Recommended when jumbo frames are enabled
net.ipv4.tcp_mtu_probing = 1
I have 40 Gbps links on my osd nodes, and 10 Gbps links on everything else.
Let me know if that helps.
Jake
>
> Hi again,
>
> So I have tried
> - changing the cpus frequency : either 1.6GHZ, or 2.4GHZ on all cores
> - changing the memory configuration, from "advanced ecc mode" to "performance mode", boosting the memory bandwidth from 35GB/s to 40GB/s
> - plugged a second 10GB/s link and setup a ceph internal network
> - tried various "tuned-adm profile" such as "throughput-performance"
>
> This changed about nothing.
>
> If
> - the CPUs are not maxed out, and lowering the frequency doesn't change a thing
> - the network is not maxed out
> - the memory doesn't seem to have an impact
> - network interrupts are spread across all 8 cpu cores and receive queues are OK
> - disks are not used at their maximum potential (iostat shows my dd commands produce much more tps than the 4MB ceph transfers...)
>
> Where can I possibly find a bottleneck ?????
>
> I'm /(almost) out of ideas/ ... :'(
>
> Regards
>
>
Frederic,
I was trying to optimize my ceph cluster as well and I looked at all of the same things you described, which didn't help my performance noticeably.
The following network kernel tuning settings did help me significantly.
This is my /etc/sysctl.conf file on all of my hosts: ceph mons, ceph osds and any client that connects to my ceph cluster.
# Increase Linux autotuning TCP buffer limits
# Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE
# Don't set tcp_mem itself! Let the kernel scale it based on RAM.
#net.core.rmem_max = 56623104
#net.core.wmem_max = 56623104
# Use 128M buffers
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.core.optmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# Make room for more TIME_WAIT sockets due to more clients,
# and allow them to be reused if we run out of sockets
# Also increase the max packet backlog
net.core.somaxconn = 1024
# Increase the length of the processor input queue
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_max_syn_backlog = 30000
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 10
# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0
# If your servers talk UDP, also up these limits
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192
# Disable source routing and redirects
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
# Recommended when jumbo frames are enabled
net.ipv4.tcp_mtu_probing = 1
I have 40 Gbps links on my osd nodes, and 10 Gbps links on everything else.
Let me know if that helps.
Jake
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com