On Fri, Sep 27, 2013 at 08:34:51AM -0500, scameron@xxxxxxxxxxxxxxxxxx wrote: > On Fri, Sep 27, 2013 at 03:22:19PM +0200, Tomas Henzl wrote: > > On 09/23/2013 08:34 PM, Stephen M. Cameron wrote: > > > From: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx> > > > > > > SCSI mid layer doesn't seem to handle logical drives undergoing format > > > very well. scsi_add_device on such devices seems to result in hitting > > > those devices with a TUR at a rate of 3Hz for awhile, transitioning > > > to hitting them with a READ(10) at a much higher rate indefinitely, > > > and at boot time, this prevents the system from coming up. If we > > > do not expose such devices to the kernel, it isn't bothered by them. > > > > Is the result of this patch that the drive is no more visible for the user > > and he can't follow the formatting progress? > > Yes (subsequent patch monitors the progress and brings the drive > online when it's ready). > > > I think a better option is to fix the kernel to handle formatting devices better > > Yeah, you're probably right. (This is what comes of writing code for all > the distros then forward porting to kernel.org code. Grumble-grumble-management > grumble-grumble real-world problems.) [...] So I took a stab at modifying sd (below) which I am now thinking may not be the right approach. Empirically, it seems to work, but there are a couple of things not quite right. The gist of the patch is, when sd_spinup_device encounters a scsi disk which is NOT READY/FORMAT IN PROGRESS, it adds it to a list, and starts up a thread to monitor the list (if one isn't already running). The thread periodically polls the disks until they become ready, and revalidates them and removes them from the list when they become ready. When the list is empty the thread exits. There are a lot of complicated locking and get_device/put_device reference counting to make sure various rugs don't get ripped out from under the thread. One problematic area is on rmmod of sd. It's using kthread_stop() to stop the thread, but what if the thread's already gone? I didn't figure out how to get around that, and came to the conclusion that having the thread sometimes decide when to exit on it's own, and having it sometimes be told to exit on rmmod of sd probably can't work (although that is exactly what I would like to do.) So, hmm, not satisfactory. But then I started thinking that maybe I'm approaching it all wrong. Maybe this should be done (mostly) from userspace. Maybe some small change to sd just to keep it from waiting around too long for disks that are in NOT READY/FORMAT IN PROGRESS state, and some change to udevd(?) or udev rules... or something udev-ey? so that it can notice when a device that is in this state has been added to the system and launch a userspace daemon to poll the device with TURs via SG_IO and when it becomes ready, trigger the revalidation from userspace. That would probably be better than what I've done below. But, I'm not really sure where to start to get something like that rolling, presuming doing something like that would be a better approach. -- steve commit 6b4265426c8aff03e4c28372409a3f1c2c760c28 Author: Stephen M. Cameron <scameron@xxxxxxxxxxxxxxxxxx> Date: Wed Oct 9 19:03:46 2013 -0500 scsi: improve probing of disks that are temporarily offline The current handling of disks which are NOT READY/FORMAT IN PROGRESS is not ideal. Instead of waiting for a period of time for the disk to become ready and then giving up in sd_spinup_disk(), we can create a thread to wait for a list of disks which are expected to eventually become ready which periodically polls such disks and revalidates them when they become ready. When the list of not-ready disks becomes empty, the thread will exit. This allows other things to proceed, for example, allowing the system to boot without a the usual delay that is mostly futile since the odds are good that a format in progress won't finish during the time that sd_spinup_disk would have waited. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index e62d17d..3158780 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -51,6 +51,7 @@ #include <linux/async.h> #include <linux/slab.h> #include <linux/pm_runtime.h> +#include <linux/kthread.h> #include <asm/uaccess.h> #include <asm/unaligned.h> @@ -126,6 +127,19 @@ static DEFINE_MUTEX(sd_ref_mutex); static struct kmem_cache *sd_cdb_cache; static mempool_t *sd_cdb_pool; +static DEFINE_SPINLOCK(sd_revalidation_lock); +struct sd_deferred_disk_list_entry { + struct scsi_disk *sdkp; + struct list_head deferred_list; +}; +static struct list_head sd_deferred_disk_list = + LIST_HEAD_INIT(sd_deferred_disk_list); +#define REVALIDATION_THREAD_STOPPED 0 +#define REVALIDATION_THREAD_RUNNING 1 +#define REVALIDATION_THREAD_STOPPING 2 +static int sd_revalidation_thread_state; +static struct task_struct *sd_revalidation_thread; + static const char *sd_cache_types[] = { "write through", "none", "write back", "write back, no read (daft)" @@ -1733,8 +1747,11 @@ static int sd_done(struct scsi_cmnd *SCpnt) /* * spinup disk - called only in sd_revalidate_disk() + * Returns 1 if disk is not ready and revalidation + * should be deferred due to e.g., format in progress, + * 0 otherwise. */ -static void +static int sd_spinup_disk(struct scsi_disk *sdkp) { unsigned char cmd[10]; @@ -1743,6 +1760,7 @@ sd_spinup_disk(struct scsi_disk *sdkp) unsigned int the_result; struct scsi_sense_hdr sshdr; int sense_valid = 0; + int defer_revalidation = 0; spintime = 0; @@ -1766,7 +1784,7 @@ sd_spinup_disk(struct scsi_disk *sdkp) * with any more polling. */ if (media_not_present(sdkp, &sshdr)) - return; + return 0; if (the_result) sense_valid = scsi_sense_valid(&sshdr); @@ -1799,6 +1817,13 @@ sd_spinup_disk(struct scsi_disk *sdkp) break; /* standby */ if (sshdr.asc == 4 && sshdr.ascq == 0xc) break; /* unavailable */ + if (sshdr.asc == 4 && sshdr.ascq == 4) { + /* Format in progress. This may take a long + * time. Defer revalidation until later. + */ + defer_revalidation = 1; + break; + } /* * Issue command to spin up drive when not ready */ @@ -1853,6 +1878,7 @@ sd_spinup_disk(struct scsi_disk *sdkp) else printk("not responding...\n"); } + return defer_revalidation; } @@ -2668,6 +2694,144 @@ static int sd_try_extended_inquiry(struct scsi_device *sdp) return 0; } +static int sd_device_temporarily_offline(struct scsi_disk *sdkp) +{ + unsigned char cmd[10]; + unsigned int the_result; + struct scsi_sense_hdr sshdr; + + cmd[0] = TEST_UNIT_READY; + memset((void *) &cmd[1], 0, 9); + + the_result = scsi_execute_req(sdkp->device, cmd, + DMA_NONE, NULL, 0, + &sshdr, SD_TIMEOUT, + SD_MAX_RETRIES, NULL); + if (media_not_present(sdkp, &sshdr)) + return 0; + return (the_result && scsi_sense_valid(&sshdr) && + sshdr.sense_key == NOT_READY && + sshdr.asc == 4 && sshdr.ascq == 4); +} + +#define SD_OFFLINE_DEVICE_POLL_TIME (100 * HZ) +static int sd_deferred_revalidation_thread(void *v) +{ + struct sd_deferred_disk_list_entry *d; + unsigned long flags; + struct list_head revalidate_list, *this, *tmp; + int temp_offline; + + INIT_LIST_HEAD(&revalidate_list); + spin_lock_irqsave(&sd_revalidation_lock, flags); + while (!list_empty(&sd_deferred_disk_list) && !kthread_should_stop()) { + /* Check if any of the offline devices have become ready */ + list_for_each_safe(this, tmp, &sd_deferred_disk_list) { + d = list_entry(this, struct sd_deferred_disk_list_entry, + deferred_list); + get_device(&d->sdkp->dev); + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + temp_offline = sd_device_temporarily_offline(d->sdkp); + spin_lock_irqsave(&sd_revalidation_lock, flags); + if (!temp_offline) { + list_del(this); + list_add(&d->deferred_list, &revalidate_list); + } else { + put_device(&d->sdkp->dev); + } + } + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + list_for_each_safe(this, tmp, &revalidate_list) { + d = list_entry(this, struct sd_deferred_disk_list_entry, + deferred_list); + list_del(this); + revalidate_disk(d->sdkp->disk); + put_device(&d->sdkp->dev); + kfree(d); + } + schedule_timeout_uninterruptible(SD_OFFLINE_DEVICE_POLL_TIME); + spin_lock_irqsave(&sd_revalidation_lock, flags); + } + sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPED; + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + return 0; +} + +static void sd_stop_deferred_revalidation_thread(void) +{ + unsigned long flags; + int stop_thread; + + spin_lock_irqsave(&sd_revalidation_lock, flags); + stop_thread = (sd_revalidation_thread_state == + REVALIDATION_THREAD_RUNNING); + if (stop_thread) + /* STOPPING state prevents new thread from starting. */ + sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPING; + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + if (stop_thread) + kthread_stop(sd_revalidation_thread); +} + +static void sd_remove_deferred_revalidation(struct scsi_disk *sdkp) +{ + struct list_head *this, *tmp; + struct sd_deferred_disk_list_entry *d = NULL; + unsigned long flags; + + spin_lock_irqsave(&sd_revalidation_lock, flags); + list_for_each_safe(this, tmp, &sd_deferred_disk_list) { + d = list_entry(this, struct sd_deferred_disk_list_entry, + deferred_list); + if (d->sdkp == sdkp) { + list_del(this); + break; + } + } + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + kfree(d); +} + +static void sd_schedule_revalidation(struct scsi_disk *sdkp) +{ + struct sd_deferred_disk_list_entry *d, *entry; + unsigned long flags; + + d = kzalloc(sizeof(*d), GFP_KERNEL); + if (!d) { + sd_printk(KERN_WARNING, sdkp, + "sd_schedule_revalidation: Memory allocation failure.\n"); + return; + } + d->sdkp = sdkp; + spin_lock_irqsave(&sd_revalidation_lock, flags); + + /* Only add the device if it is not already in the list */ + list_for_each_entry(entry, &sd_deferred_disk_list, deferred_list) { + if (entry->sdkp == sdkp) { + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + kfree(d); + return; + } + } + + list_add(&d->deferred_list, &sd_deferred_disk_list); + if (sd_revalidation_thread_state == REVALIDATION_THREAD_STOPPED) { + sd_revalidation_thread_state = REVALIDATION_THREAD_RUNNING; + spin_unlock_irqrestore(&sd_revalidation_lock, flags); + sd_revalidation_thread = + kthread_run(sd_deferred_revalidation_thread, NULL, + "sd-deferred-revalidation"); + spin_lock_irqsave(&sd_revalidation_lock, flags); + } + if (!sd_revalidation_thread) { + sd_revalidation_thread_state = REVALIDATION_THREAD_STOPPED; + sd_printk(KERN_WARNING, sdkp, + "sd_schedule_revalidation: Failed to start deferred revalidation thread\n"); + } + spin_unlock_irqrestore(&sd_revalidation_lock, flags); +} + /** * sd_revalidate_disk - called the first time a new disk is seen, * performs disk spin up, read_capacity, etc. @@ -2679,6 +2843,7 @@ static int sd_revalidate_disk(struct gendisk *disk) struct scsi_device *sdp = sdkp->device; unsigned char *buffer; unsigned flush = 0; + int defer_revalidation; SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_revalidate_disk\n")); @@ -2697,13 +2862,13 @@ static int sd_revalidate_disk(struct gendisk *disk) goto out; } - sd_spinup_disk(sdkp); + defer_revalidation = sd_spinup_disk(sdkp); /* * Without media there is no reason to ask; moreover, some devices * react badly if we do. */ - if (sdkp->media_present) { + if (sdkp->media_present && !defer_revalidation) { sd_read_capacity(sdkp, buffer); if (sd_try_extended_inquiry(sdp)) { @@ -2718,7 +2883,8 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_write_same(sdkp, buffer); } - sdkp->first_scan = 0; + if (!defer_revalidation) + sdkp->first_scan = 0; /* * We now have all cache related info, determine how we deal @@ -2736,6 +2902,9 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_config_write_same(sdkp); kfree(buffer); + if (defer_revalidation) + sd_schedule_revalidation(sdkp); + out: return 0; } @@ -2990,6 +3159,7 @@ static int sd_remove(struct device *dev) sdkp = dev_get_drvdata(dev); devt = disk_devt(sdkp->disk); + sd_remove_deferred_revalidation(sdkp); scsi_autopm_get_device(sdkp->device); async_synchronize_full_domain(&scsi_sd_probe_domain); @@ -3204,6 +3374,7 @@ static void __exit exit_sd(void) SCSI_LOG_HLQUEUE(3, printk("exit_sd: exiting sd driver\n")); + sd_stop_deferred_revalidation_thread(); scsi_unregister_driver(&sd_template.gendrv); mempool_destroy(sd_cdb_pool); kmem_cache_destroy(sd_cdb_cache); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html