Re: hot plug on ICH9 with AHCI on

Владимир Дашевский <vladimir.dashevsky@xxxxxxxxx> · Sun, 22 Mar 2009 21:26:21 +0300

Tejun!
Владимир Дашевский wrote:

Not all EMIs are one-shot events.  Some can span seconds.  Links don't
always come up right after failures.  Sometimes they require more than
one hardresets to get back to working order.  Link status report is
not reliable.  Sometimes they report offline for a while after certain
events.  If you know how to work around the above problems under a
second, I'm all ears but I doubt it unless it involves an additional
mechanical switch.

Well, for example, USB devices have a pull-up resistor on their D+ line.
DC bias can be used for detection of device presence without mechanical
switch.

SATA is not USB and onlineness detection isn't that simple.  Also,
have you tried to run a system on a USB device over flaky connection?

Well, I cannot argue with you here. All that I wanted to say is that I 
would prefer more optimistic software behavior if the hardware really 
supports device connection status.
The echo to delete node is synchronous.  It will return after the
device is completely removed but please note that "removing" in this
sense only covers the device itself.  It will flush the request queue
and spin the drive down but won't do anything about filesystems.  You
need to unmount first.  hal and desktop stuff already do the right
thing for devices marked removable.

Ok, but two more questions:
1. Is there any generic mechanism of notifiing processes which had
previously opened device being deleted of this event? What will happen
to such processes? Is it possible to check who are those who uses the
drive at the moment?

-EIO will happen, fuser, but if you want something intelligent, hal +
dbus.

Sorry, I missed the sense of this sentence. I tried this deletion with 
fdisk and see that fdisk does not even comply for device failure. It 
just starts to print empty partition table and so on. So the question is 
how to properly close any activity concerned with device being deleted 
if I do not know exactly what is that activity? Are the most typical 
programs which are allowed to use raw block devices aware of unexpected 
block device loss?

2. If the drive was deleted is it possible to start it back without
physical re-connection? Can I simulate status change og that port to
force the driver to auto-detect block device?

I don't really follow what you're trying to achieve but if you want
some fancy snapshotting + remapping trick, the best place would be dm.

Well, I didn't think of any tricks. I just deleted the drive as you 
taught me and tried to get it back without moving myself in front of the 
server. :-)
However, I think that some call to rescan scsi devices will be useful.

PS: as for this:

I'll be happy to improve EH behavior but you need to come up with
better reasons.  

I can tell that for me enclosure management support is quite a good
reason.

How is that in any way exclusive against longer detach delay?

I just answered with better reasons to make you happy, not with another 
advice of detach delay.

Unfortunately, there is no this support in official kernel. I have
seen only limited support of activity LED in kernel 2.6.28.
However, I am using Debian where the latest kernel is only
2.6.26. As a result I had to write a simple ahci_em module which
register simple proc interface to send LED states to all ICH9
ports. However, final goal is to integrate this module with mdadm to
have proper indication of RAID state.

The biggest obstacle is that there aren't too many enclosure devices
floating around.  What kind of device are you using?

I don't know exactly what device are you talking about. I was talking 
about LED message types that are supported in ICH9.
As for my server, ICH9 provides SGPIO interface that is routed to 
4-drive hot-swap backplane based on AMI MG9071 chip. However, this 
information isn't needed to program ICH9 since the LED message mechanism 
is supported in it. Other message types are not supported. And it is 
very strange that linux ahci still does not support this functionality 
since it was first introduced in ICH8 (datasheet first release in June 
of 2006).

PS: My code has about 11Kb of text and supports all useful RAID states: 
NORMAL, LOCATE, REBUILD, FAILURE, HOTSPARE, PREDICTED FAILURE SOON. I 
have tested in on my server and it works. I think it can be useful for 
other implementations of soft RAID systems with hat swap support.

Best regards, Vladimir Dashevsky

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html