Re: [PATCH 2/4] libata: Implement disk shock protection support

Elias Oltmanns <eo@xxxxxxxxxxxxxx> · Mon, 01 Sep 2008 16:51:14 +0200

Tejun Heo <htejun@xxxxxxxxx> wrote:
> Hello,
>
> Elias Oltmanns wrote:
[...]
>>> Also, the suspend operation is unloading the head and spin down the
>>> drive which sound like a good thing to do before crashing.  Maybe we
>>> can modify the suspend sequence such that it always unload the head
>>> first and then issue spindown.  That will ensure the head is in safe
>>> position as soon as possible.  If it's done this way, it'll be
>>> probably a good idea to split unloading and loading operations and do
>>> loading only when EH is being finished and the disk is not spun down.
>> 
>> Well, scsi midlayer will also issue a flush cache command. Besides, with
>> previous implementations I have observed occasional lock ups when
>> suspending while the unload timer was running. Once we have settled the
>> timer vs deadline issue, I'm willing to do some more investigation in
>> this area if you really insist that pm_notifiers should be avoided. But
>> then I am still not too sure about your reasoning and do feel happier
>> with these notifiers anyway.
>
> I'm not particularly against pm_notifiers.  I just can't see what
> advantages it have given the added complexity.  The only race window
> it closes is the one between suspend start and userland task freeze,
> which is a relatively short one and there are other much larger gaping
> holes there, so I don't really see much benefit in the particular
> pm_notifiers usage.  Maybe we need to keep the task running till the
> power is pulled?  If we can do that in sane manner which is no easy
> feat I agree, that can also solve the problem with hibernation.

I doubt that is feasible without substantial work especially as long as
part of it all is implemented in user space. With regard to the
pm_notifiers, I'll try to figure out exactly what the problem was
without them (I thought it was related to timers not firing anymore
after process freezing, but Pavel doubts that and I'm not too sure
anymore).

[...]
>>>> In particular, I don't want to reschedule EH in response to the second
>>>> write to the unload_heads file. Also, we have to consider the case where
>>>> the daemon signals to resume I/O prematurely by writing a timeout of 0.
>>>> In this case, the EH thread should be woken up immediately.
>>> Whether EH is scheduled multiple times or not doesn't matter at all.
>>> EH can be happily scheduled without any actual action to do and that
>>> does happen from time to time due to asynchronous nature of events.
>>> libata EH doesn't have any problem with that.  The only thing that's
>>> required is there's at least one ata_schedule_eh() after the latest
>>> EH-worthy event.  So, the simpler code might enter EH one more time
>>> once in the blue moon, but it won't do any harm.  EH will just look
>>> around and realize that there's nothing much to do and just exit.
>> 
>> The whole EH machinery is a very complex beast.
>
> The logic is quite complex due to the wonderful ATA but for users
> requesting actions, it really is quite simple.  Well, at least I think
> so.  (but I would say that, wouldn't I? :-)

Yes, you would ;-). Seriously though, I do agree that it is easy for
users to get the right message across to EH but ...

>
>> Any user of the
>> emergency head park facility has a particular interest that the system
>> spends as little time as possible in the EH code even if it's real error
>> recovery that matters most. Perhaps we could agree on the following
>> compromise:
>> 
>>     spin_lock_irq(ap->lock);
>>     old_deadline = ap->deadline;
>>     ap->deadline = jiffies + timeout;
>>     if (old_deadline < jiffies) {
>>         ap->link.eh_info.action |= ATA_EH_PARK;
>>         ata_port_schedule_eh(ap);
>>     }
>>     spin_unlock_irq(ap->lock);
>
> Really, it doesn't matter at all.  That's just an over optimization.
> The whole EH machinery pretty much expects spurious EH events and
> schedules and deals with them quite well.  No need to add extra
> protection.

... because of its complexity it is hard for me to estimate timing
impacts and constraints. The questions I'm concerned about are: What is
the average and worst case time it takes to get the heads parked and
what can I do if not to improve either of them, then at least not to make
things worse. In particular, I don't really have a clue about how much
time it takes to go through EH if no action is requested in comparison
to, say, the average read / write command. Obviously, I don't want to
schedule EH unnecessarily if that would mean that I won't be able to
issue another head unload for considerably longer than during normal I/O
or, indeed, on an idle system. Arguably, I don't even want to do
anything that causes more logging than absolutely necessary because this
will ultimately result in the disk spinning up from standby. But then I
believe that I only came across this logging issue when I was still
playing around with eh_revalidate and the like. So, can you set my mind
at rest that timing is no issue with spurious EH sequences? Now that I
come to think of it, I suppose it would harm performance anyway, so
everybody would care about such a delay, right?

>
>> There is still a race but it is very unlikely to trigger.
>> 
>> Still, you have dismissed my point about the equivalent of stopping a
>> running timer by specifying a 0 timeout. In fact, whenever the new
>> deadline is *before* the old deadline, we have to signal the sleeping EH
>> thread to wake up in time. This way we end up with something like
>> wait_event().
>
> Yes, right, reducing the timeout.  How about doing the following?
>
> 	wait_event(ata_scsi_part_wq,
> 		   time_before(jiffies, ap->unload_deadline));
>
> Heh... then again, it's not much different from your original code.  I
> won't object strongly to the original code but I still prefer just
> setting deadline and kicking EH from the userside instead of directly
> manipulating the timer.  That way, implementation is more separate
> from the interface and to me it seems easier to follow the code.

That's fine with me. All I want is that my code doesn't end up leaving
the system in an unresponsive state (to a head unload request, that is)
more often than before by spuriously scheduling EH. If that is not a
problem, I'm content.

Regards,

Elias
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html