Re: [PATCH 2/5] scsi: improved eh timeout handler

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Thu, 07 Nov 2013 13:33:31 -0500

On 13-11-07 01:45 AM, Hannes Reinecke wrote:
On 11/06/2013 06:23 PM, Mike Christie wrote:
On 11/05/2013 10:48 PM, Hannes Reinecke wrote:
On 11/05/2013 08:19 PM, Mike Christie wrote:
On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
+
+	scmd->eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
+	SCSI_LOG_ERROR_RECOVERY(3,
+		scmd_printk(KERN_INFO, scmd,
+			    "scmd %p abort scheduled\n", scmd));
+	schedule_delayed_work(&scmd->abort_work, HZ / 100);
+	return SUCCESS;
+}

Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?

Errm. Yes, why?

I must admit I'm not _that_ familiar with workqueues ...
Care to explain?

We all share the above workqueue_structs pool of threads, so if we get
stuck behind code doing GFP_KERNEL allocs that end up needing to write
data to the disk we are now trying to aborts on, then we could get
stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
created at workqueue_struct create time which can get used in cases like
that so we can always make forward progress.

Ah. Right. Yes, that makes sense.

I guess I'll have to redo the patches _yet again_.

I wonder if it might be useful to flag a LU (disk)
with "try really hard to recover me, perhaps at the
expense of other LUs". Seems like a LU containing the
rootfs or swap might qualify for setting such a flag.
And LUs that have this flag cleared could be assumed
to not get wedged in the fashion that Mike pointed out.

Doug Gilbert

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html