Tejun!
Hello,
Владимир Дашевский wrote:
Well, for example, USB devices have a pull-up resistor on their D+ line.
DC bias can be used for detection of device presence without mechanical
switch.
SATA is not USB and onlineness detection isn't that simple. Also,
have you tried to run a system on a USB device over flaky connection?
Well, I cannot argue with you here. All that I wanted to say is that I
would prefer more optimistic software behavior if the hardware really
supports device connection status.
I really don't follow your train of thoughts here. Are you saying
that the driver should be optimistic about the reliability about
status reported by the hardware even when it is inherently imprecise
(please read the spec) and real world experiments prove that?
No. I ment that driver should performs better if the hardware supports
some features for that. Consider two different cases.
1. hardware derives port population status by sensing the carrier in the
data link. In this case it is possible that some EMI noise can damage
link integrity so strongly that not data bits but also a carrier will be
lost for a short time. This will lead to 'port is not present' status
however noone has actually removed the drive.
2. Hardware implements some feature like pull-up resistor in USB, or
special shorter 'present' contact as in PCI or CPCI connectors, or it
simply senses some dc current through power lines etc. In this case port
status is robust over EMI noise and be used to inform driver of actual
connection.
My thought was to improve driver behavior in case 2, either autodetected
by PCI IDs or manually overriden by some configure script.
-EIO will happen, fuser, but if you want something intelligent, hal +
dbus.
Sorry, I missed the sense of this sentence.
-EIO will happen to any processes trying to do IO on the removed
device. fuser will find out who's using the block device but if you
want something more intelligent, look at hal + dbus.
Hm, I tried to write fuser /dev/sda and got empty output. It seems that
file system does not open sda. How it works?
I tried this deletion with fdisk and see that fdisk does not even
comply for device failure. It just starts to print empty partition
table and so on. So the question is how to properly close any
activity concerned with device being deleted if I do not know
exactly what is that activity? Are the most typical programs which
are allowed to use raw block devices aware of unexpected block
device loss?
Please take a look at how desktop guys are handling the issue. It's
not something which can be handled in kernel proper.
Ok.
I don't really follow what you're trying to achieve but if you want
some fancy snapshotting + remapping trick, the best place would be dm.
Well, I didn't think of any tricks. I just deleted the drive as you
taught me and tried to get it back without moving myself in front of the
server. :-)
However, I think that some call to rescan scsi devices will be useful.
Ah.. in that case, you can do
# echo - - - > /sys/class/scsi_host/hostN/scan
well, it works but it takes of about 10 seconds to finish scan for
deleted drive. is this ok?
Probably, that's because drive goes down after deletion and it starts to
spin up during this scan.
The biggest obstacle is that there aren't too many enclosure devices
floating around. What kind of device are you using?
I don't know exactly what device are you talking about. I was talking
about LED message types that are supported in ICH9.
As for my server, ICH9 provides SGPIO interface that is routed to
4-drive hot-swap backplane based on AMI MG9071 chip. However, this
information isn't needed to program ICH9 since the LED message mechanism
is supported in it. Other message types are not supported. And it is
very strange that linux ahci still does not support this functionality
since it was first introduced in ICH8 (datasheet first release in June
of 2006).
Yeah, I know it has been in the spec but without hardware to play with
it's difficult to add driver features and lack of general availability
also means lower demand.
Well, I just cannot imagine how software raid can work without clearly
visible state. One drive mixed up in RAID5 and the whole array can get
damaged. And it is not so difficult to mix them up because drive names
may differ from physical slot numbers.
PS: My code has about 11Kb of text and supports all useful RAID states:
NORMAL, LOCATE, REBUILD, FAILURE, HOTSPARE, PREDICTED FAILURE SOON. I
have tested in on my server and it works. I think it can be useful for
other implementations of soft RAID systems with hat swap support.
I think it should be independent from RAID but having general
enclosure support will be nice. Care to post the patches?
Well, I can provide you with a code which works on my ICH9 Supermicro
platform. I believe it will also work with both ICH8 and ICH10.
However, since I could not install this module as traditional pci driver
(the kernel decided not to claim my ahci device since the main driver
present in the system) I had to rewrite it as a general linux kernel
module. It justs scan pci devices for AHCI capable ones and remaps their
ABAR to try enclosure management support. For now, only my ICH9 PCI IDs
are in my try list. All AHCI EM-capable devices get their associated
proc interface - /proc/ahci_emX/leds*. This module actually works in
parallel with kernel ahci driver but I think it will be a conflict with
it once the kernel driver starts to support em by itself. I guess, the
best way would be to document some API for controlling the EM, then to
declare some kernel ahci flag that will indicate full EM presence in the
kernel. Then I can improve my ahci_em module to skip its installation
when similar functions are built into the kernel.
My interface is quite simple. You just write a char to leds-controlling
proc file to set state of leds, for example:
echo r > /proc/ahci_em0/leds0 means you asked for REBUILD state
indicated in the bay of port 0.
I think that most of users would prefer additional module rather that
kernel udgrade, for the first time. Also, I am not very close to linux
kernel to provide a kernel patch.
Thanks.
Best regards, Vladimir Dashevsky
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html