Re: crash CPU bound waiting for user response

Dave Anderson <anderson@xxxxxxxxxx> · Thu, 05 Jul 2007 09:48:34 -0400

D. Hugh Redelmeier wrote:
| From: Dave Anderson <anderson@xxxxxxxxxx>

| D. Hugh Redelmeier wrote:

| > ==> Worse: while it is awaiting my RETURN, it is burning 100% of the CPU!
| > 
| > Here is what "ps laxgwf" says about the crash process and its child.
| > 
| > F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
| > 4     0  4426  4406  25   0 416812 332764 -     R+   pts/5     80:36
| > |               |           \_ crash --readnow
| > /usr/lib/debug/lib/modules/2.6.21-1.3228.fc7/vmlinux
| > /var/crash/2007-07-02-13:42/vmcore
| > 0     0  4989  4426  18   0  73976   740 -      S+   pts/5      0:00
| > |               |               \_ /usr/bin/less -E -X -Ps -- MORE --
| > forward\: <SPACE>, <ENTER> or j  backward\: b or k  quit\: q
| > 
| > strace of the crash process shows an infinite sequence of:
| >     wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
| >     wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
| >     wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
| >     wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
| > 
| > This is very wasteful.
| > 
| > There are other ways to get into this state.  Other places less is
| > being used and is waiting.  Probably wherever less is used even if it
| > isn't waiting.
| > 
| > I just tested: this problem exists when using a normal xterm.
| 

Again, what exactly do you do to reproduce it?  I just cannot get the 100%
cpu-time waiting on the "less" sub-shell.

| Yeah, I have seen this on occasions, but I have never been able
| to reproduce it on demand.  There was a patch suggestion a while ago,
| but I deferred it until I could reliably reproduce it for testing
| before taking it in.

I've put gdb on the case.  The CPU burning that I'm currently experiencing is
in cmdline.c:restore_sanity.  The actuall code in question is:
    while (!waitpid(pc->stdpipe_pid, &waitstatus, WNOHANG))
                                ;
That sure looks like a busy-wait.

If you execute this code, you should get a busy-wait too.

If you replaced WNOHANG with 0, I think that the wait would have the
same result but not be busy.  You would then want to loop in the case
where waitpid returns a -1 with errno == EINTR.

Here's what I'd try (UNTESTED!):
    do ; while (waitpid(pc->stdpipe_pid, &waitstatus, 0) == -1 && errno == EINTR);

All the uses of WNOHANG in that function look suspicious.

I understand.  I also remember that the WNOHANG's were originally added
there on purpose because of hangs I was seeing.  But that's not to say
it's the best way of doing things.

As I mentioned before, there was a patch posted by someone (as I recall
who preferred using gdb and gdb scripts with kdump vmcores), but going
back a year and a half into the archives, I can't find it.

Anyway, I'm going to have to be able to reproduce it and test any
changes thoroughly before potentially re-introducing the hangs I
used to see.

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility