On Fri, 13 Jun 2003, James Olin Oden wrote: > OK, this is the original problem I was trying to figure with the debug > version of init. In house we have an app that was starting a process > manager from init. It was starting after the sysinit entry and before > the runlevel script entries (it needed to be around when the runlevel > scripts were ran to respond to requests from application rc scripts to > start there processes). The entry looked something like this: > > > # System initialization. > si::sysinit:/etc/rc.d/rc.sysinit > > > pm:4:respawn:/sbin/procmgr > > l0:0:wait:/etc/rc.d/rc 0 > l1:1:wait:/etc/rc.d/rc 1 > l2:2:wait:/etc/rc.d/rc 2 > l3:3:wait:/etc/rc.d/rc 3 > l4:4:wait:/etc/rc.d/rc 4 > l5:5:wait:/etc/rc.d/rc 5 > l6:6:wait:/etc/rc.d/rc 6 > > This worked fine sometimes, but other times it would not get ran > at all, and init would not continue on and run the runlevel 4 rc > scripts. I created the file: > > /etc/initscript > > Which gets called by init as a proxy for starting processes if it exists, > and I could see that it would call the rc.sysinit stuff, but it would > never call the process manager when the problem occured. I then put > a debug version of init on the system (the subject of the previous email) > and eventually when I got the debug version rebooting I saw the following: > > > Enabling local filesystem quotas: [ OK ] > Enabling swap space: [ OK ] > INIT: chld_handler: unknown child 67 exited. > INIT: Checking for children to start"(E*¥_chld_handler: unknown child 375 exited. > INIT: SYSINIT -> BOOT > > And there she sat. Note it had ran rc.sysinit, and then it caught a > sigchild from an orphaned child of rc.sysinit or one of its children. > After that it starts checking to see what it needs to start next. > Note, in that ouytput we have a little bit of garbage (an internal buffer > overflow?). After it prints "SYSINIT -> BOOT", no more output occurs, > and since it has not ran the rc scripts (or are started the gettys) you > have system that is effectively hung. > > Any ideas of where I should look in the init code or how I might debug > this further? > After patching init such that signals are blocked when talking to syslog: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=97534 Now init does not hang when it the memory corruption occurs. It instead still prints the funny little characters, pauses for about 30 seconds, and then continues booting. I have linked init against electric fence, and sure enough when the problem occurs (it is intermittent) electric fence stops init. The problem is that at the point the problem occurs no network is up (and thus nor is sshd) and no getty's are up. I tried modifying the rc.sysinit script to gring up an interface, and start sshd, but this seemed to change things enough that the problem was never seen. So then I instead started a getty, which took much less time to start. We are still not seeing the problem (though, we are also getting hangs in the md code of the kernel...sigh). Again any clues to help debug this problem would be most apreciated. Also, we are using the latest kernel, and glibc, with the rawhide version of init (2.85-3). This problem occurs with the penultimate errata kernel and the shipping init (2.84-13) also. Thanks...james _______________________________________________ Redhat-devel-list mailing list Redhat-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/redhat-devel-list