Seems the mailing list dropped my mail. Resend via a different mail server. On Wed, 25 May 2011 00:15:48 +0800 Kiefer Chang <zapchang@xxxxxxxxx> wrote: > After testing for few day the tgtd won't exit in the iscsi_tx_handler. > But some problem appers: > > (1) Some FDs are not released (Maybe commands on those FDs are not > completed eventually) You mean that the number of used FDs are increasing? > (2) I/O fail on the initiator side. (15+ volumes, vdbench I/O). Some > filesystem on the iscsi disks become read-only. From system logs we > can see the initiator try to do target reset. I'm not sure what OS you use on the initiator side but I guess that if the reset fails, the initiator would give up. If you tested with my patch, the reset always fails. Even without my patch, if tgtd is too busy to send the response of a reset quickly. > For some reason, our backing stores are files on Lustre, and we > wrapped the files with loop device then create LVMs on top of those > loop devices. (Finally exports to tgtd) > We found this is really slow and may change the design soon. Yeah, sounds too many mid layers. It would be better if tgtd directly uses files on Lustre. > We simply close the fd in iscsi_tcp_release. > I am not sure if the workaround is correct since there are still > problems. Please don't use them directly. > > --- > usr/iscsi/iscsi_tcp.c | 16 +++++++++++++++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c > index e87bbf1..21346ed 100644 > --- a/usr/iscsi/iscsi_tcp.c > +++ b/usr/iscsi/iscsi_tcp.c > @@ -370,7 +370,7 @@ static size_t iscsi_tcp_close(struct iscsi_connection *conn) > struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn); > > tgt_event_del(tcp_conn->fd); > - return close(tcp_conn->fd); > + return 0; > } > > static void iscsi_tcp_release(struct iscsi_connection *conn) > @@ -378,6 +378,20 @@ static void iscsi_tcp_release(struct > iscsi_connection *conn) > struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn); > > conn_exit(conn); > + > + /** > + * delay close fd from ep_close to here. > + * Prevent new accept conn use the > + * same fd and caused iscsi_tx_handler abnormal exit bug. > + */ > + if(conn->closed) { > + close(tcp_conn->fd); > + } else { Have you ever hit this bug? > + eprintf("Bug: conn=%p fd=%d not closed\n", conn, tcp_conn->fd); > + free(tcp_conn); > + exit(1); > + } > + > free(tcp_conn); > } > > > 2011/5/24 FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>: >> On Mon, 23 May 2011 22:39:45 +0800 >> Kiefer Chang <zapchang@xxxxxxxxx> wrote: >> >>> If the previous connection's uses iscsi_event_modify to change the >>> waiting epoll events of the fd, >>> is this possible that the second connection go into an unexpected state? >>> in conn_close() might uses mgmt_end_notify() to do this. >> >> Ah, I think that you are right. >> >> When conn_close is called, tgtd has some in-progress commands. tgtd closes the >> connection and the fd is reused for a new connection. >> >> Then these commands finish, they might call mgmt_end_notify and change the >> state of the new connection wrongly. >> >> So we should move close() from iscsi_tcp_close to iscsi_tcp_release to prevent >> the reuse of fds. >> >> Can you send a patch to do that? >> >> Thanks! >> > -- > To unsubscribe from this list: send the line "unsubscribe stgt" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html