Re: [PATCH 2/4] libata: Implement disk shock protection support

Elias Oltmanns <eo@xxxxxxxxxxxxxx> · Wed, 10 Sep 2008 21:28:54 +0200

Tejun Heo <htejun@xxxxxxxxx> wrote:
> Hello, Elias.
>
> Elias Oltmanns wrote:
[...]
>> +static unsigned long ata_eh_park_devs(struct ata_port *ap)
>> +{
>> +	struct ata_link *link;
>> +	struct ata_device *dev;
>> +	struct ata_taskfile tf;
>> +	unsigned int err_mask;
>> +	unsigned long deadline = jiffies;
>> +
>> +	ata_port_for_each_link(link, ap) {
>> +		ata_link_for_each_dev(dev, link) {
>> +			struct ata_eh_context *ehc = &link->eh_context;
>> +			struct ata_eh_info *ehi = &link->eh_info;
>> +
>> +			if (dev->class != ATA_DEV_ATA ||
>> +			    dev->flags & ATA_DFLAG_NO_UNLOAD)
>> +				continue;
>> +
>> +			if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
>> +			    ehi->dev_action[dev->devno] & ATA_EH_PARK) {
>> +				unsigned long tmp = dev->unpark_deadline;
>
> The correct way to do this is ata_eh_about_to_do().  After that, you
> can just look at ehc->i.dev_action[].  Also, you'll need to call
> ata_eh_done() later.

We have a problem here, I'm afraid, because we may keep looping in EH
context and still want to pick up ATA_EH_PARK requests. Imagine that
ATA_EH_PARK has been scheduled for device A and the EH thread has
reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
is scheduled for device B on the same port. This will wake up the EH
thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
flag in eh_info, but checking ehc->i.dev_action afterwards will only
tell us whether this flag was set when we entered EH, not whether it had
been set since.

Should I change ata_eh_about_to_do() so that it will record the action
in link->eh_context before clearing it in link->eh_info?

>
>> +				if (time_before(deadline, tmp))
>> +					deadline = tmp;
>> +				else if (time_before_eq(tmp, jiffies))
>> +					continue;
>> +			}
>> +
>> +			if (ehc->did_unload_mask & (1 << dev->devno))
>> +				continue;
>> +
>> +			ata_tf_init(dev, &tf);
>> +			tf.command = ATA_CMD_IDLEIMMEDIATE;
>> +			tf.feature = 0x44;
>> +			tf.lbal = 0x4c;
>> +			tf.lbam = 0x4e;
>> +			tf.lbah = 0x55;
>> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +			tf.protocol |= ATA_PROT_NODATA;
>> +			err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
>> +						     NULL, 0, 0);
>> +			if (err_mask || tf.lbal != 0xc4)
>> +				ata_dev_printk(dev, KERN_ERR,
>> +					       "head unload failed!\n");
>> +			else
>> +				ehc->did_unload_mask |= 1 << dev->devno;
> ...
>> +static void ata_eh_unpark_devs(struct ata_port *ap)
>> +{
>> +	struct ata_link *link;
>> +	struct ata_device *dev;
>> +	struct ata_taskfile tf;
>> +
>> +	ata_port_for_each_link(link, ap) {
>> +		ata_link_for_each_dev(dev, link) {
>> +			struct ata_eh_context *ehc = &link->eh_context;
>> +
>> +			if (!(ehc->did_unload_mask & (1 << dev->devno)))
>> +				continue;
>> +
>> +			ata_tf_init(dev, &tf);
>> +			tf.command = ATA_CMD_CHK_POWER;
>> +			tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> +			tf.protocol |= ATA_PROT_NODATA;
>> +			ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
>
> And it's probably better to have ehc->unloaded_mask instead of
> ehc->did_unload_mask and clear it here so that if unload is scheduled
> after this point but before EH completes, it does unloading again.
> ie. Something like the following.
>
> 	ata_eh_done(ATA_EH_UNLOAD);
> 	ehc->i.unloaded_mask &= ~(1 << dev->devno);

No need for that because link->eh_context is cleared in
ata_scsi_error().

>
>> @@ -2830,6 +2904,19 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>>  		}
>>  	}
>>  
>> +	do {
>> +		unsigned long now;
>> +
>> +		deadline = ata_eh_park_devs(ap);
>> +		now = jiffies;
>> +		if (time_before_eq(deadline, now))
>> +			break;
>> +		prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
>> +		deadline = schedule_timeout_uninterruptible(deadline - now);
>> +	} while (deadline);
>> +	finish_wait(&ata_scsi_park_wq, &wait);
>> +	ata_eh_unpark_devs(ap);
>
> I think it would be better to put timeout computation and handling out
> here instead of inside ata_eh_park_devs().  ata_eh_park_devs() just
> parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it
> can continue.

Right.

>
>> +static ssize_t ata_scsi_park_store(struct device *device,
>> +				   struct device_attribute *attr,
>> +				   const char *buf, size_t len)
>> +{
> ...
>
>> +		switch (input) {
>> +		case -1:
>> +			dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
>> +			break;
>> +		case -2:
>> +			dev->flags |= ATA_DFLAG_NO_UNLOAD;
>> +			break;
>
> Can't we just drop ATA_DFLAG_NO_UNLOAD?  It doesn't provide any real
> functionality anymore.

I was afraid you'd say something like that in the end ;-). Well, we
can't. We really should only issue the unload command if we know that
it's safe, i.e., the device supports that feature. We assume it to be
safe if ata_id_has_unload() returns true or if the user told us that the
device does support the command. ATA_DFLAG_NO_UNLOAD is initialised
during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like
mine), the user can manually clear that flag afterwards.

Regards,

Elias
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html