Re: [PATCH 13/14] ahci: convert to new EH

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff Garzik wrote:
Tejun Heo wrote:
On Thu, Apr 20, 2006 at 02:01:12PM +0800, zhao, forrest wrote:
Hi, Tejun

When testing hotplug and reading your patches, I thought an interrupt
lost might occur on AHCI in the following case:

1 system boot up with SATA disk A attached to port 1 and disk B attached
to port 2
2 disk B at port 2 is hot-unplugged
3 ata_eh_revive() will execute several round of soft-reset/hard-reset as
we observed in dmesg
4 now imagine ata_eh_revive() start to execute the last round of
hard-reset, so the code path comes into ata_do_reset(), then into
ahci_hardreset()
5 disk B is hot-plugged to port 2, and an interrupt is triggered
6 CPU respond to this interrupt when code path execute between
ahci_start_engine(); in ahci_hardreset() and
ap->flags &= ~ATA_FLAG_FROZEN; in ata_do_reset();
7 then this interrupt is lost since no EH is scheduled to handle it.

I think invoking ata_eh_schedule_port() in ahci_postreset() can fix
the problem, is it right?

Hello, Forrest.

Yes, you're right.  The problem is that we cannot tell whether such
interrupts are due to the reset or some other events.  The goal was to
make sure existing devices are okay on EH completion.  If new devices
get connected during EH, we might lose the event, which IMHO is okay.

Maybe this can be solved by merging EH and probe into one.  Probing
and EH revive are pretty similar in the first place.  I'll think about

Speaking to hotplug specifically, on hardware with plug irqs, it needs to do something like

    * receive hotplug interrupt
    * hang out for a while, eating hotplug interrupt events
      (debounce)
    * revalidate device
    * issue unplug and/or plug to SCSI layer

I see.  I'll pay more attention to the debouncing.

that.  But I still think it's okay to lose hotplug interrupt during
EH.  All the user has to do is simply replug the device or issue
manual scan.

If losing the hotplug interrupt requires the user to do that, no that's definitely not OK... for a hotplug interrupt during EH, you want to stop what you're doing at the nearest opportunity, and start all over again revalidating the device. If its a different device, all the accumulated state is flushed.


Hmmm... Well, I initially thought that's a tradeoff libata can take. It's a quite small window. Such events are lost iff the user plugs a new device inbetween autopsy completion and reset completion. ie. while EH is actively spitting out messages.

I've been thinking about this since yesterday (except for the time I've played HOMM5 demo), and it seems that achieving completely reliable device detection can be achieved relatively easily by combining EH revive and probing. And with SError.X bit check at the end, PM should be able to do reliable detection, too.

PM is requiring more changes than I initially thought and merging probing and EH reviving would take some time too. And, of course, HOMM5 demo is out. So, I don't think I can make it this week. But on the bright side, SCSI part of EH seems to be agreed on and although EH and hotplug are a little bit flakey, libata generic PM support really works on my working tree!

Thanks.

--
tejun
-
: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux