Re: RAID level and killing a job

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



On Tue, 2009-01-13 at 10:04 -0800, nate wrote:
> <snip>

> > Second question - A newly installed server consisting of CentOS 5.2,
> > straight
> > off the DVD, I invoke a command by hand, realize I want to kill it soon
> > after
> > (logged in as root).  I issue ps auwx|grep name_of_command, get the PID, and
> > issue kill -9 PID.  ps auwx|grep name_of_command is still running.
> >
> > The command is NOT part of any scheduled job.    Why won't the process die?
> 
> Is the process state "D" or "Z" ? Frequently either of these states
> can trigger an unkillable process. Sometimes "Z" (zombies) can be
> killed but often times they can't be directly killed. And if the
> process is in "D" then it is stuck waiting for I/O(most often) and
> you have to wait for it to complete, or reboot, sometimes going to
> single user mode and back again works as well, and sometimes killing
> other processes that the stuck one depends on can sometimes free it
> up so it can die.

It's been a long time, so please forgive any FUD here.

IIRC, zombies are processes that have ended but can not be "cleaned up".
This can happen when a parent has died before the child ends, when a
parent exists but is "sleeping" (for whatever reason: it may be waiting
on another event, waiting for I/O that never completes, ...).

IIRC, when the parent has died, then PPID you'll see is "1". But the
zombie will still be un-killable because it can not complete the
termination process (the parent it has to notify no longer exists). I
can't recall any way to eliminate these with a re-boot. I can't recall
if I ever tried a telinit to see if run level changes would kill it. I
suspect not.

If the parent exists and signals are not disabled or otherwise handled,
the killing of the parent may cause the zombie to flee. This is most
common, IIRC, when the parent is awating an event notification.

If a "clean" termination is desired, the parent must support some signal
processing, e.g. SIGHUP, SIGUSER1, ... (man 7 signal). If it does, then
things like removal of temporary files and telling the children to "STOP
THAT" can be done.

That's all I can recall without some actual work.

> 
> If the process is zombied you can try to find the parent process (if
> there is one) with ps -efx, and kill that sometimes that can cause
> the child to die as well, doesn't always work though.

> nate
> <snip sig stuff>

HTH
-- 
Bill

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux