Re: Flapping osd / continuously reported as failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/24/2014 06:29 AM, Maciej Bonin wrote:
Gregory Farnum <greg@...> writes:


On Mon, Aug 19, 2013 at 3:09 PM, Mostowiec Dominik
<Dominik.Mostowiec@...> wrote:
Hi,
Yes, it definitely can as scrubbing takes locks on the PG, which will
prevent reads or writes while the
message is being processed (which will involve the rgw index being
scanned).
It is possible to tune scrubbing config for eliminate slow requests and
marking osd down when large rgw
bucket index is scrubbing?

Unfortunately not, or we would have mentioned it before. :/ There are
some proposals for sharding bucket indexes that would ameliorate this
problem, and on Cuttlefish or Dumpling the OSD won't get marked down,
but it will still block incoming requests on that object (ie, requests
to access the bucket) while the scrubbing is in place.
That said, that improvement might be sufficient since you haven't
actually shown us how long the object scrub takes.
-Greg
Software Engineer #42  <at>  http://inktank.com | http://ceph.com



Hello Guys,

I just wanted to share that we've had a similar problem and we had solved it
by borrowing sensible kernel option defaults from a radosgw patch iirc.
net.ipv4.ip_local_port_range = 1024 65535
net.core.netdev_max_backlog = 30000
net.core.somaxconn = 4096
net.ipv4.tcp_max_syn_backlog = 252144
net.ipv4.tcp_max_tw_buckets = 360000
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

FWIW, these may not strictly help with the situation you described, but at least on our test cluster helped improve RGW performance in general on 10GbE+:

echo 33554432 | sudo tee /proc/sys/net/core/rmem_default
echo 33554432 | sudo tee /proc/sys/net/core/wmem_default
echo 33554432 | sudo tee /proc/sys/net/core/rmem_max
echo 33554432 | sudo tee /proc/sys/net/core/wmem_max
echo "10240 87380 33554432" | sudo tee /proc/sys/net/ipv4/tcp_rmem
echo "10240 87380 33554432" | sudo tee /proc/sys/net/ipv4/tcp_wmem
echo 250000 | sudo tee /proc/sys/net/core/netdev_max_backlog
echo 524288 | sudo tee /proc/sys/net/nf_conntrack_max
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_recycle
echo 1 | sudo tee /proc/sys/net/ipv4/tcp_tw_reuse



Regards,
Maciej Bonin
Systems Engineer
m247.com
ISO 27001 Data Protection Classification: A - Public

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux