Re: How to cancel job

Elliott Balsley <elliott@xxxxxxxxxxxxxx> · Sat, 5 Oct 2019 12:51:32 -0700

Thanks, I had not noticed the D state.  I learn something every day.
I am indeed using NFS hard mount, but when this issue happens there is
no problem with the mount itself.  Other apps can read/write just
fine.  So maybe this could be improved by making fio do interruptible
sleep?  Usually this happens when I accidentally start a job with a
size much higher than I intended.

On Sat, Oct 5, 2019 at 3:15 AM Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
>
> On Fri, 4 Oct 2019 at 22:34, Elliott Balsley <elliott@xxxxxxxxxxxxxx> wrote:
> >
> > > This sounds unexpected. Are there easy reproduction steps for this? Is
> > > I/O definitely being processed by the lower levels of the kernel?
> >
> > I'm not sure how to reproduce it reliably.  It seems to happen more
> > often with NFS, not local filesystems.  Here is an example where
> > Ctrl-C does nothing.  Then I ran "killall -9 fio" in another terminal"
> > and it took about 20 seconds before the disk activity actually stopped
> > in iostat.
> >
> > $ fio --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> > --filename_format=/mnt/rivendell/fio.\$jobnum.\$filenum
> > write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
> > (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
> > fio-3.1
> > Starting 1 process
> > write: Laying out IO file (1 file / 102400MiB)
> > fio: native_fallocate call failed: Operation not supported
> > ^Cbs: 1 (f=1): [W(1)][14.3%][r=0KiB/s,w=1327MiB/s][r=0,w=1327 IOPS][eta 00m:42s]
> > fio: terminating on signal 2
> > Killed1 (f=1): [F(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
> >
> > $ ps aux | grep fio
> > root     243211 36.0  0.0 931316  3068 ?        Ds   14:25   0:08 fio
> > --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> > --filename_format=/mnt/rivendell/fio.$jobnum.$filenum
> > root     243238  0.0  0.0 112708   988 pts/0    S+   14:26   0:00 grep
> > --color=auto fio
>
> That process state (D) means that it is uninterruptible sleep. Doing
> I/O through NFS can result in the process being unkillable until it
> gets out of sending/receiving some batch of data to/from the NFS
> server. Some part of fio saw the Ctrl-C (hence the terminating
> message) but the part actually processing I/O presumably didn't
> respond - maybe your NFS share is mounted with the "hard" option and
> is retrying I/O indefinitely?
>
> --
> Sitsofe | http://sucs.org/~sits/