On 08May2012 16:02, Patrick O'Callaghan <pocallaghan@xxxxxxxxx> wrote: | On Tue, 2012-05-08 at 13:16 -0700, Joe Zeff wrote: | > On 05/08/2012 12:39 PM, Patrick O'Callaghan wrote: | > > On Tue, 2012-05-08 at 19:42 +0100, Andrew Gray wrote: | > >> > Hi | > >> > | > >> > Either give use a way to kill a hung cp or rsync when the VPN goes down | > >> > and they end up is state D uninterrupted sleep or stop apps being able | > >> > to go into uninterrupted sleep !! | > > It is*not possible* to kill a process in D state. D state can be | > > defined as "the state which cannot be interrupted". | > | > I think it's fairly clear that Mr. O'Callaghan knows that. | | I think you mean Mr. Gray. | | > He's | > complaining about the consequences of there being an uninterruptable | > sleep. If I read him right, he's saying that it should always be | > possible for the user to force a hung app to die when it's clear to the | > user that something has happened that makes it impossible for the app to | > continue, such as rsync completing when the remote server's known to | > have crashed. Frankly, I think a SIGKILL should have this semantic: cancel the program _now_, and queue whatever is needed in the OS to clean up. | > At this point, probably the best way to proceed is to | > request that whoever maintains the programs in question modify them so | > that they don't enter this state when accessing a remote file system or | > that there's some way to get the app's attention and force it to abort. Not, very wrong. This state is out of the user's control (i.e. the program's control). | As I tried to explain, rewriting a couple of apps is not going to hack | it. The apps don't *know* they're using a networked filesystem, they're | just accessing files. They could find out and try to take measures, but | then what about all the other apps that also write files? Rewrite tar, | cpio, dd, cat, ...? | | The price of treating a networked fs as equivalent to a local one is | that you get screwed when it doesn't behave like a local one. Dealing | with this in a coherent and consistent way is hard. See the literature | on distributed filesystems. The semantics of an NFS system are *not* the | same as a local system. We brush this under the carpet most of the time | because it usually works, but sometimes the differences bite. It's not that hard to save userspace in the kernel. Make SIGKILL abort the OS call and terminate the process. Have the kernel mark the I/O as cancelled in whatever form is necessary for the subsystem in use. This would: - allow process cleanup, whih allows higher level things like shell scripts to quit when the things they call abort in a timely fashion - allow the kernel flexibility to cancel filesystem mounts more freely, because no processes are lying around claiming use of the FS - _then_ you can give umount some kind of "force" mode to tell the kernel that we no longer care about any outstanding I/Os on this network filesystem; the existing umount "-f" can be more effective "D" state is all very well for a stalled process, but there should be a way to say "enough" and abort the process and all its entanglements. Cheers, -- Cameron Simpson <cs@xxxxxxxxxx> DoD#743 http://www.cskk.ezoshosting.com/cs/ [...] every time you touch something, if your security systems rely on biometric ID, then you're essentially leaving your pin number on a post-it note. - Ben Goldacre, http://www.badscience.net//?p=585 -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org