Re: Qube2 slowly dies

Glyn Astill <glynastill@xxxxxxxxxxx> · Thu, 11 Jun 2009 08:54:43 +0000 (GMT)

Hi Kevin,

It's nice to see a scientific suggestion to the nature of the problem

> From: Kevin D. Kissell <kevink@xxxxxxxxxxxxx>

> Your description sounds an awful lot
> like failures I've seen when 
> interrupts get lost or blocked for some reason (could be
> hardware, the 
> kernel, or some interaction between them).  Have you
> looked at 
>  to see if "Spurious" interrupts are
> occurring, or if 
> the rate of serviced timer and I/O interrupts decreases or
> increases as 
> the system degrades?

No I haven't checked - but I will. What would I be looking for that would stick out as "spurious"? The type of interrupt, qty or random interrupts appearing and dissapearing?

> When the system becomes unresponsive, by any 
> chance does it "wake up" after 10-20 minutes (the time for
> the Count 
> register to wrap)?
> 

Not that I've noticed, I just see it degrade further and further untill it dies over the course of an hour or so.

> If other Qube2s don't exhibit this behavior with a given
> Linux kernel, 
> but yours does, and yet yours runs NetBSD OK, it suggests
> that there's a 
> difference in interrupt setup/handling between the two
> systems that just 
> happens to work around a hardware problem on your board.

I'm sure that's a valid possibility, however I do have two of these machines and I have tried both with the same results.

I also had a problem back when I tried etch with the 2.6.18 kernel, however in this case I saw no degraded performance at all, however after a some of hours of activity (anywhere between 2 and 24+) it'd just fall on it's ass.

> 
>           Regards,
> 
>           Kevin K.
> 
> Glyn Astill wrote:
> > Hi people,
> >
> > I've been directed here from the Debian lists by
> Martin Michlmayr. I'm running lenny on a qube2 128mb ram /
> 40gb disk.
> >
> > I've tried kernels 2.6.26 and 2.6.30~rc8 and the issue
> I'm about to describe is present in both, I haven't tried
> any other kernels - but I will try 2.6.22 when I can.
> >
> > Essentially the machine gets more and more sluggish
> until it finally dies. I've had a quick look in meminfo and
> I can't see that it's running out of memory, and I'm not
> sure what else to check?
> >
> > I find it hard to describe what's going off, but
> here's a scenario I hope illustrates the problem. The
> configure script is just an example of doing something - I
> could easily have extracted an archive with tar or something
> for the same results;
> >
> > - I start 2 ssh sessions and in one start configure
> for the postgres source, in the other I just started top.
> >
> > - And for a while all seems fine; configure ticks away
> and top refreshes every second.
> >
> > - Then top stops ticking over - but it'll refresh with
> a keypress. Anyway I exit top and try to run it again...
> nothing. I hit ctrl-c which brings me back to the prompt and
> I try again... nothing.
> >
> > - The configure script is still ticking over slowly.
> >
> > - I try "ps ax" - it works; so I try it again...
> nothing.
> >
> > - I try "ipcs" and "lsof" they both work and seem to
> keep working.
> >
> > - I try "ps ax" again... nothing. I hit ctrl-c and now
> it doesn't come back to the command prompt for a while.. say
> 5 minutes and eventually it's back.
> >
> > - It's still going. Some commands still work, some
> just do nothing. proc/meminfo shows it's not eaten all the
> memory.
> >
> > - If I try to start another ssh session I can log in,
> I get the motd, but I don't get to the shell.
> >
> > - Eventually the configure script ends, and all shells
> come back to the prompt. But it now seems totally
> braindamaged, I can run "ps ax" but "top" and other commands
> still do nothing. Heres strace attached to the top process:
> >
> > deb:~# strace -p 7228
> > Process 7228 attached - interrupt to quit
> > _newselect(0, NULL, NULL, NULL, {0, 500013}
> >
> > - Then after a little while the whole thing becomes
> unresponsive.
> >
> >
> > Can anyone confirm they've seen the same behaviour or
> direct me what to look into?
> >
> > Thanks
> > Glyn
> >
> >
> >       
> >
> >   
> 
>