Hi Tim, On Wed, May 15, 2013 at 09:25:35PM +0200, Tim Mohlmann wrote: > Just for my information / learning process: > > This ping checking, is het a process itself, or is it part of the OSD process? > (Maybe as a sub-process). In that case one could play with nice settings? Eg: > marking the process as a realtime, or just a very low nice value. Then, at > least, it would not bail on you when the CPU is at 100%. (preferring the pings > over the OSD itself). > I believe it is a thread of the OSD-process. So it could in theory be that something in the process, like a lock or kernel/scheduler is blocking the ping process from being actived. > I am actually new to ceph (exploring possiblities), so I am not famliar with > its internals. So consider it as a "if, then" sugestion. > That is fine, we all have to learn it somehow. :-) > Regards, > > Tim > > On Tuesday 14 May 2013 16:22:43 Sage Weil wrote: > > On Tue, 14 May 2013, Chen, Xiaoxi wrote: > > > I like the idea to leave ping in cluster network because it can help us > > > detect switch?nic failure. > > > > > > What confuse me is I keep pinging every ceph node's cluster ip?it is OK > > > during the whole run with less than 1 ms latency?why the heartbeat still > > > suffer? TOP show my cpu not 100% utilized?with ?30% io wait?.Enabling > > > jumbo frame **seems** make things worth.(just feeling.no data supports) > > > > I say "ping" in teh general sense.. it's not using ICMP, but sending > > small messages over a TCP session, doing some minimal processing on the > > other end, and sending them back. If the machine is heavily loaded and > > that thread doesn't get scheduled or somehow blocks, it may be > > problematic. > > > > How responsive generally is the machine under load? Is there available > > CPU? > > > > We can try to enable debugging to see what is going on.. 'debug ms = 1' > > and 'debug osd = 20' is everything we would need, but will incur > > additoinal load itself and may spoil the experiment... > > > > sage > > > > > ???? iPhone > > > > > > ? 2013-5-14?23:36?"Mark Nelson" <mark.nelson@xxxxxxxxxxx> ??? > > > > > > > On 05/14/2013 10:30 AM, Sage Weil wrote: > > > >> On Tue, 14 May 2013, Chen, Xiaoxi wrote: > > > >>> Hi > > > >>> > > > >>> We are suffering our OSD flipping between up and down ( OSD X be > > > >>> voted to > > > >>> > > > >>> down due to 3 missing ping, and after a while it tells the monitor > > > >>> ?map xxx > > > >>> wrongly mark me down? ). Because we are running sequential write > > > >>> performance test on top of RBDs, and the cluster network nics is > > > >>> really in high utilization (8Gb/s+ for a 10Gb network). > > > >>> > > > >>> Is this a expected behavior ? or how can I prevent this > > > >>> happen? > > > >> > > > >> You an increase the heartbeat grace period. The pings are handled by a > > > >> separate thread on the backside interface (if there is one). If you > > > >> are > > > >> missing pings then the network or scheduler is preventing those (small) > > > >> messages from being processed (there is almost no lock contention in > > > >> that > > > >> path). Which means it really is taking ~20 seconds or wahtever to > > > >> handle > > > >> those messages. It's really a questin of how unresponsive you want to > > > >> permit the OSDs to be before you consider it a failure.. > > > >> > > > >> sage > > > > > > > > It might be worth testing out how long pings or other network traffic > > > > are taking during these tests. There may be some tcp tunning you can > > > > do here, or even consider using a separate network for the mons. > > > > > > > > Mark > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com