I am sorry that this is not a proper reply email. For some reason I did not get this email although I subscribe to linux-scsi and found this accidentally from an archive. On Wed, 24 Aug 2005 akpm@xxxxxxxx wrote: > Hans-Joachim Baader <hjb@xxxxxxxxxxxx> wrote: > > > > Hi, > > > > > > > > > > I do nightly backups on tape. Every 3 to 4 weeks a process is stuck in > > > > D state while accessing the drive: > > > > > > > > 12398 ? D 0:00 /usr/sbin/amcheck -ms daily > > > > > > > > There are no messages in the log. Only a reboot can remove this process. > > > > > > Next time it happens, do > > > > > > dmesg -c > > > echo t > /proc/sysrq-trigger > > > dmesg -s 1000000 > foo > > > > thanks for looking into this. Since I haven't rebooted yet, I have the > > output for you. Hope it helps. > > > > OK, thanks. > > It would appear that st_do_scsi() is stuck in the wait_for_completion(). > > > Just looking at that function: I wonder if there's a problem with incoming > arg `do_wait'. If it's false and some other thread is waiting on STp->wait > then this thread will go and scribble on the completion structure. Maybe > there's additional synchronisation which can prevent that? > This should never happen. The st driver is designed so that only one SCSI command is active for one device at any time. The device can be opened by only a single user at a time. More than one thread can access the device if the fd is dup'ed. The trace shows that the process is hanging in st_open(). The SCSI commands there use 'do_wait' true and no other thread should be able to access this device through st. 'do_wait' is set to false only with the so called "asynchronous writes" when a write returns. One of the first things in read(), write(), and ioctl() is to check that a potential previous write has finished. Well, this is the theory. Bugs may exist. This kind of a problem of a process using a tape hanging in D state has occurred sometimes for numerous years. When the user has waited long enough (the default timeouts are loooong), the process has continued. When st has printed the command timed out, it has been usually a legitimate read or write. St has sent the command and waited for it to finish but this has never occurred. These problems may or may not be related to the current one. I have never been able to reproduce the problem in my systems. -- Kai - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html