Re: Disk thrashing, Please HELP!

Rosina Bignall <rbignall@xxxxxxxxxxxxx> · Mon, 17 May 2004 08:36:23 -0500

Thanks for the great suggestions.

I rebooted before trying them to get the problem going again and 
... no problem!  So, I've tried to remember what I changed, but 
other than installing some of the recent updates (but not all, 
I'm still working on that... takes a while on a 56K modem) from 
Redhat, I don't think I did anything that would change it.  So, 
I'm probably going to be left with a mystery.

I know I needed to follow these suggestions while the problem 
process(es) were running, but of course, I cannot get it to 
happen now, so here's what I found without them running and 
perhaps you'll see something that I missed.  I admit that while 
not being a newbie to Linux, neither am I a power user, so please 
forgive my ignorance.  My computer serves as the gateway to 
several others in our home network - the other computers run 
various versions of Windows.

Jason Dixon wrote:
Without more information on what services you're running, it's going to 

In run level 5, I'm running the following: apmd, autofs, crond, 
cups, dhcpd, gpm, hpoj, ip6tables, iptables, irqbalance, isdn, 
kudzu, mdmonitor, mdmpd, microcode_ctl, named, netfs, network, 
nfslock, pcmcia, portmap, random, rawdevices, rhnsd, sendmail, 
sgi_fam, sshd, syslog, vmware, wine, xinetd.  Anything suspicious 
here I should investigate further?

be tough.  Use "ps afx" while the python process(es) is running to see 
what's actually calling python.

This showed that up2date called python, but up2date was not 
running before, so that's not it.

Ed Wilts wrote:
It sounds like you need to look at your scheduled tasks to see what is
starting python.  One of the ways to do this is to use lsof.  For
example:
# lsof / | grep python

The second column is the pid of the process that's running python.  Now

see if you can track down the guilty culprit from there.  

Again, only up2date which was not running when I experienced the 
problem.

You can also check which cron jobs are running python with:
[root@p6000 ewilts]# grep python /etc/cron.*/*
[root@p6000 ewilts]# grep python /var/spool/cron/*
That will help find some, but obviously not all since the cron entry
could simply be to a script that in turn runs python.

No dice.  Both commands showed nothing.

My gut tells me you're running mailman since it does have the rare habit
of thrashing a system like you're describing.  Are you mailman, and if
so, are you current?

No, I'm using sendmail.

Larry Brown wrote:
If after rebooting you get the same problem, I'd grep the contents of your
startup scripts looking for the bang for python.  I have not written
anything in python, however, you may do ...

fgrep python *

from within /etc/rc.d/init.d and see which ones where written in it.  Then
temporarily disable them by changing SXXscriptName to sXXscriptName where XX
represents the given number of the script.  This will prevent it from
starting up.  If upon reboot after that, the thrashing has stopped, you can
one by one change the s to S and reboot until you find it.  I don't know of

Nope, nothing starts python directly.

any faster way off hand.  Did this start after loading some package or is
this some existing package that recently started giving you the problem?

It's some existing package.  I had not installed anything in 
several days when this started.  I had installed several (many) 
fonts right before this happened (not packages, just individual 
fonts) which took a long time in and of itself and didn't get 
finished before I stopped it and rebooted and ended up with the 
thrashing problem, which is why I suspected that might have 
something to do with it, but I can't find any evidence to support 
that.  I've readded some of the fonts again, more slowly, not so 
many at once, without the same thing happening.

Further looking into the system log (once I didn't have so much 
going on that I could actually look at such a huge file), I found 
 the following messages repeated many, many times

May 16 14:04:54 rosina xinetd[20798]: warning: can't get client 
address: Transport endpoint is not connected

May 16 14:04:54 rosina xinetd[20798]: libwrap refused connection 
to sgi_fam (libwrap=fam) from <no address>

The process ID changed, but other than that, these repeated for a 
long time.  So, I'm guessing that that's what syslogd was writing 
causing the disk to thrash.  I've never done anything with 
xinetd, so I don't even know how it works let alone what it was 
doing that might have caused it.

Perhaps it was a cron job that finally finished, but I would have 
thought that rebooting would have stopped the cron job and it 
would not have started again until the next time that job came 
up, not after a reboot, right?  This lasted through several 
reboots.  Also, I had allowed it to run and thrash my hard drive 
for over a day, hoping that whatever was running would finish, 
but it did not.

Thanks for your help,
Rosina

--

Rosina Bignall
rbignall@xxxxxxxxxxxxx

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list