2011/5/25 FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>: > On Wed, 25 May 2011 00:15:48 +0800 > Kiefer Chang <zapchang@xxxxxxxxx> wrote: > >> After testing for few day the tgtd won't exit in the iscsi_tx_handler. >> But some problem appers: >> >> (1) Some FDs are not released (Maybe commands on those FDs are not >> completed eventually) > > You mean that the number of used FDs are increasing? Yes! > >> (2) I/O fail on the initiator side. (15+ volumes, vdbench I/O). Some >> filesystem on the iscsi disks become read-only. From system logs we >> can see the initiator try to do target reset. > > I'm not sure what OS you use on the initiator side but I guess that if > the reset fails, the initiator would give up. CenOS using its build-in open-iscsi initiator. > If you tested with my patch, the reset always fails. Even without my > patch, if tgtd is too busy to send the response of a reset quickly. > > >> For some reason, our backing stores are files on Lustre, and we >> wrapped the files with loop device then create LVMs on top of those >> loop devices. (Finally exports to tgtd) >> We found this is really slow and may change the design soon. > > Yeah, sounds too many mid layers. It would be better if tgtd directly > uses files on Lustre. > > >> We simply close the fd in iscsi_tcp_release. >> I am not sure if the workaround is correct since there are still >> problems. Please don't use them directly. >> >> --- >> usr/iscsi/iscsi_tcp.c | 16 +++++++++++++++- >> 1 files changed, 15 insertions(+), 1 deletions(-) >> >> diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c >> index e87bbf1..21346ed 100644 >> --- a/usr/iscsi/iscsi_tcp.c >> +++ b/usr/iscsi/iscsi_tcp.c >> @@ -370,7 +370,7 @@ static size_t iscsi_tcp_close(struct iscsi_connection *conn) >> struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn); >> >> tgt_event_del(tcp_conn->fd); >> - return close(tcp_conn->fd); >> + return 0; >> } >> >> static void iscsi_tcp_release(struct iscsi_connection *conn) >> @@ -378,6 +378,20 @@ static void iscsi_tcp_release(struct >> iscsi_connection *conn) >> struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn); >> >> conn_exit(conn); >> + >> + /** >> + * delay close fd from ep_close to here. >> + * Prevent new accept conn use the >> + * same fd and caused iscsi_tx_handler abnormal exit bug. >> + */ >> + if(conn->closed) { >> + close(tcp_conn->fd); >> + } else { > > Have you ever hit this bug? NO, my colleague leaves this for debugging only. >> + eprintf("Bug: conn=%p fd=%d not closed\n", conn, tcp_conn->fd); >> + free(tcp_conn); >> + exit(1); >> + } >> + >> free(tcp_conn); >> } >> >> >> 2011/5/24 FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx>: >>> On Mon, 23 May 2011 22:39:45 +0800 >>> Kiefer Chang <zapchang@xxxxxxxxx> wrote: >>> >>>> If the previous connection's uses iscsi_event_modify to change the >>>> waiting epoll events of the fd, >>>> is this possible that the second connection go into an unexpected state? >>>> in conn_close() might uses mgmt_end_notify() to do this. >>> >>> Ah, I think that you are right. >>> >>> When conn_close is called, tgtd has some in-progress commands. tgtd closes the >>> connection and the fd is reused for a new connection. >>> >>> Then these commands finish, they might call mgmt_end_notify and change the >>> state of the new connection wrongly. >>> >>> So we should move close() from iscsi_tcp_close to iscsi_tcp_release to prevent >>> the reuse of fds. >>> >>> Can you send a patch to do that? >>> >>> Thanks! >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe stgt" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html