Hi Kevin, It's nice to see a scientific suggestion to the nature of the problem > From: Kevin D. Kissell <kevink@xxxxxxxxxxxxx> > Your description sounds an awful lot > like failures I've seen when > interrupts get lost or blocked for some reason (could be > hardware, the > kernel, or some interaction between them). Have you > looked at > to see if "Spurious" interrupts are > occurring, or if > the rate of serviced timer and I/O interrupts decreases or > increases as > the system degrades? No I haven't checked - but I will. What would I be looking for that would stick out as "spurious"? The type of interrupt, qty or random interrupts appearing and dissapearing? > When the system becomes unresponsive, by any > chance does it "wake up" after 10-20 minutes (the time for > the Count > register to wrap)? > Not that I've noticed, I just see it degrade further and further untill it dies over the course of an hour or so. > If other Qube2s don't exhibit this behavior with a given > Linux kernel, > but yours does, and yet yours runs NetBSD OK, it suggests > that there's a > difference in interrupt setup/handling between the two > systems that just > happens to work around a hardware problem on your board. I'm sure that's a valid possibility, however I do have two of these machines and I have tried both with the same results. I also had a problem back when I tried etch with the 2.6.18 kernel, however in this case I saw no degraded performance at all, however after a some of hours of activity (anywhere between 2 and 24+) it'd just fall on it's ass. > > Regards, > > Kevin K. > > Glyn Astill wrote: > > Hi people, > > > > I've been directed here from the Debian lists by > Martin Michlmayr. I'm running lenny on a qube2 128mb ram / > 40gb disk. > > > > I've tried kernels 2.6.26 and 2.6.30~rc8 and the issue > I'm about to describe is present in both, I haven't tried > any other kernels - but I will try 2.6.22 when I can. > > > > Essentially the machine gets more and more sluggish > until it finally dies. I've had a quick look in meminfo and > I can't see that it's running out of memory, and I'm not > sure what else to check? > > > > I find it hard to describe what's going off, but > here's a scenario I hope illustrates the problem. The > configure script is just an example of doing something - I > could easily have extracted an archive with tar or something > for the same results; > > > > - I start 2 ssh sessions and in one start configure > for the postgres source, in the other I just started top. > > > > - And for a while all seems fine; configure ticks away > and top refreshes every second. > > > > - Then top stops ticking over - but it'll refresh with > a keypress. Anyway I exit top and try to run it again... > nothing. I hit ctrl-c which brings me back to the prompt and > I try again... nothing. > > > > - The configure script is still ticking over slowly. > > > > - I try "ps ax" - it works; so I try it again... > nothing. > > > > - I try "ipcs" and "lsof" they both work and seem to > keep working. > > > > - I try "ps ax" again... nothing. I hit ctrl-c and now > it doesn't come back to the command prompt for a while.. say > 5 minutes and eventually it's back. > > > > - It's still going. Some commands still work, some > just do nothing. proc/meminfo shows it's not eaten all the > memory. > > > > - If I try to start another ssh session I can log in, > I get the motd, but I don't get to the shell. > > > > - Eventually the configure script ends, and all shells > come back to the prompt. But it now seems totally > braindamaged, I can run "ps ax" but "top" and other commands > still do nothing. Heres strace attached to the top process: > > > > deb:~# strace -p 7228 > > Process 7228 attached - interrupt to quit > > _newselect(0, NULL, NULL, NULL, {0, 500013} > > > > - Then after a little while the whole thing becomes > unresponsive. > > > > > > Can anyone confirm they've seen the same behaviour or > direct me what to look into? > > > > Thanks > > Glyn > > > > > > > > > > > >