Re: [Patch 01/12] CLD: fix crash in retransmissions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/18/2010 12:37 AM, Pete Zaitcev wrote:
For a longest time I was plagued by (very infrequent) crashes like this:

Program received signal SIGSEGV, Segmentation fault.
sess_retry_output (timer=0x92070c0) at session.c:532
532                     if (!next_retry || (op->next_retry<  next_retry))
(gdb) info threads
* 1 Thread 0xb72f96c0 (LWP 22417)  sess_retry_output (timer=0x92070c0) at session.c:532
(gdb) where
#0  sess_retry_output (timer=0x92070c0) at session.c:532
#1  session_retry (timer=0x92070c0) at session.c:565
#2  0x08049aee in cld_timers_run (tlist=0x8056630) at ../lib/libtimer.c:95
#3  0x0804e9cc in main_loop (argc=5, argv=0xbff70bd4) at server.c:983
#4  main (argc=5, argv=0xbff70bd4) at server.c:1138

The crash happens because op is NULL. As it turned out, this happens
if a packet retransmit and a session expiration occur simultaneously
(in the same pass of timers_run). The scenario is:
  - timers_run collects expired timers at exec list
  - timers_run expires session
  - two timer_del are called, but one of them is on exec list already,
    so it's ineffective
  - session is freed, this zeroes ->data in lists (later op)
  - timers_run continues along the exec list, invokes the retransmission
    callback, and that crashes with NULL op.

The proposed solution is to rework the timers_run, again. But this
time, we'll make it simpler by observing that timers are ordered by
expiration time. Therefore, we can pull next timer off the list,
expire it, and loop until expiration time is greater than the current
time. No execution list is kept. The integrity of the main list
is assured by never walking it and always referring to the head
anew at each iteration.

This patch appears to fix the problem and stands up to use that
crashed the old code.

Signed-off-by: Pete Zaitcev<zaitcev@xxxxxxxxxx>

---
  include/cld_common.h |   10 ++++++----
  lib/libtimer.c       |   41 ++++++++++++++++++++---------------------
  2 files changed, 26 insertions(+), 25 deletions(-)

applied 1-4

Note that I change the email subject line prefix (which is normally copied by automated tools directly into git) from "CLD" to "libcldc", when committing to git. I am trying to follow (and encourage others to) the kernel's method of using a prefix to indicate the subsystem or section within the current git repo to which a change applies.

To be fully friendly with automated tools, an ideal Project Hail subject line might read

	[cld patch 1/1] update Makefile.am
	 	or
	[tabled patch 1/1] update Makefile.am
		or
	[cld patch 1/1] libcldc: add nncld API, the new new CLD API

Not a big deal, just noting what is most friendly to the git automated tools, when I'm importing each Hail patch.

	Jeff




--
To unsubscribe from this list: send the line "unsubscribe hail-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Fedora Clound]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux