Thank you for your reply and comments, Douglas. The user land product is waiting for release. The time frame doesn't allow doing many changes in the user land product at this time. Read the sg.c again. It seems that the reattached SAS devices would take the same sg slot if the following conditions meet 1. wait for 2+ minutes for a pending SG-IO write request to come back before pushing the cable back. The 2+ minutes gives the scsi middle level to timeout the pending io request and do error-recovery if it is needed. 2. close user space fd properly (sg_release will try to do the sg_dev_arr[k] = NULL. Do you see any other conditions? Thanks, Yanling -----Original Message----- From: Douglas Gilbert [mailto:dougg@xxxxxxxxxx] Sent: Tuesday, October 17, 2006 2:37 PM To: Qi, Yanling Cc: linux-scsi@xxxxxxxxxxxxxxx Subject: Re: sg_remove and pending write request Qi, Yanling wrote: > Hi All, > > We are running a test case of SAS cable pull/push on a SAS RAID system. > After the SAS cable is pulled from a SAS RAID, scsi devices are deleted. > And then when the cable is pushed back, the scsi device with the same > H:C:T:L sometime will be assigned to a diffent sgX. There is no guarantee of the naming stability of sg nodes (e.g. /dev/sg3) when devices disappear and re-appear. Actually the design of lk 2.6 seems to actively discourage user space programs from the assumption. Same applies for all SCSI device nodes (and host numbers) In the case of SAS, you really should be looking at the target port SAS address in the device identification VPD page (page 0x83). If the device in question is a SATA disk then you have more work to do. > Reading through the sg.c, it seems that if the sg device has a pending > write request, the sg slot (sg_dev_arr[k] = NULL) will not be freed > during sg_remove time. Can someone confirm this? Yes, I can confirm that. The sg driver waits for the mid level to callback with the outstanding IO completions (or timeouts). If the user kills the process, the sg driver still waits for IO completion. [A problem arises if the user tries to 'rmmod sg'.] The device could well re-appear during that "wait" time and the sg driver will assign a different device node (i.e. the first unused slot in sg_dev_arr[]). > If this is the case, what the user space process do to prevent this from > happening? Develop a user space program that applies fast acting super glue to the SAS connectors when IOs are in flight and hands approach. As I said above, you cannot assume device node names will be stable across disconnect, reconnect cycles. > I see that the sg.c sends SIGPOLL to the user space process > (kill_fasync(&sfp->async_qp, SIGPOLL,POLL_HUP);), what this signal will > be translated to the user space return-code from read/write call? You would need to be running asynchronous IO with the sg driver (i.e. write(),poll(),read() rather than SG_IO) and POLLUP should appear in struct pollfd::revents . You should also be able to run poll() from a signal handler that catches SIGPOLL. [My knowledge is a bit rusty in this area.] Doug Gilbert - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html