Re: Post-XDR CLD cannot keep session up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/09/2010 05:34 AM, Jeff Garzik wrote:
On 02/07/2010 02:00 AM, Pete Zaitcev wrote:
Hi, Jeff& Colin:

It looks like you broke something in CLD, not sure if server or client.
There are two possibly related bugs. But first, here's the messages
(The chunkd is run with -D). Note that I have 2 servers listed in DNS
(both on port 4499), but only one is up.

Feb 6 23:36:10 hitlain cld[1934]: databases up
Feb 6 23:36:10 hitlain cld[1934]: Listening on :: port 4499
Feb 6 23:36:10 hitlain cld[1934]: initialized: verbose 0
Feb 6 23:37:10 hitlain chunkd[1967]: Verbose debug output enabled
Feb 6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host
hitlain.zaitcev.lan prio 10 weight 50
Feb 6 23:37:10 hitlain chunkd[1968]: cldc_saveaddr: found CLD host
elanor.zaitcev.lan prio 10 weight 50
Feb 6 23:37:10 hitlain chunkd[1968]: Selected CLD host
hitlain.zaitcev.lan port 4499
Feb 6 23:37:10 hitlain chunkd[1968]: Listening on host :: port 8082
Feb 6 23:37:10 hitlain chunkd[1968]: initialized
Feb 6 23:37:10 hitlain chunkd[1968]: New CLD session created, sid
05B521BF4071EBA2
Feb 6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
Feb 6 23:37:10 hitlain chunkd[1968]: CLD file "/chunk-default/2" written
Feb 6 23:39:45 hitlain chunkd[1968]: Session failed, sid 05B521BF4071EBA2
Feb 6 23:39:45 hitlain chunkd[1968]: Selected CLD host
elanor.zaitcev.lan port 4499
Feb 6 23:39:45 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:39:50 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:39:55 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:00 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:05 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:10 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:15 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:46 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:51 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:40:56 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:01 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:06 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:11 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:16 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:21 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:26 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:31 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:36 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:41 hitlain chunkd[1968]: cldc_udp_receive_pkt failed: -111
Feb 6 23:41:46 hitlain chunkd[1968]: New CLD session creation failed: 17
Feb 6 23:41:46 hitlain chunkd[1968]: Session failed, sid 6C5A5E5D4D8F2270
Feb 6 23:41:46 hitlain chunkd[1968]: Selected CLD host
hitlain.zaitcev.lan port 4499
Feb 6 23:41:46 hitlain chunkd[1968]: New CLD session created, sid
4E2A8ED73878F038
Feb 6 23:41:46 hitlain chunkd[1968]: CLD file "/chunk-default/2" created
Feb 6 23:41:46 hitlain chunkd[1968]: CLD lock(/chunk-default/2)
failed: 11

So, first regression: session ALWAYS fails, for no reason I can see.
It takes 2 minutes 35 seconds, as you can observe from the "Session
failed"
message.


Well, session_timeout() is not being executed like it should be, by the
core timer code. This could be memory corruption, a libtimer bug, or
something else entirely. I can observe session_timeout() being updated
to a new timer expiration, and then never being called again.

There is definitely something strange going on in the timer routines, that is causing session_timeout() not to run even though it re-adds itself to the timer list using cld_timer_add(). fprintf() debug output in cld_timer_add and cld_timers_run are yielding unexpected results.

More debugging after sleep.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux