Re: iSCSI login failure, possible race condition

"Maged Mokhtar" <mmokhtar@xxxxxxxxxxxxxxxxxx> · Mon, 7 Mar 2016 14:00:13 +0200

Hi Nick,

Apologies I will try your patch soon and will let you know

/Maged


--------------------------------------------------
From: "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx>
Sent: Sunday, February 28, 2016 4:15 AM
To: "Maged Mokhtar" <mmokhtar@xxxxxxxxxxxxxxxxxx>
Cc: <target-devel@xxxxxxxxxxxxxxx>
Subject: Re: iSCSI login failure, possible race condition

Hi Maged,

On Mon, 2016-02-08 at 17:43 -0700, Maged Mokhtar wrote:
Hi Nick

Please find patch below:

Apologies for the delayed follow-up.  Comments below.


Solves an issue with frequent "iSCSI Login negotiation failed."  observed
during target discovery from Microsoft client initiator running inside a
virtual machine to a target running in another virtual machine inside the
same host under VMWare.
iscsi_target_sk_data_ready() callback is received too soon before
LOGIN_FLAGS_READY flag is set.
The patch fixes the problem by moving the section that checks the various
LOGIN_FLAGS_* from the iscsi_target_sk_data_ready() callback to the 
worker
function iscsi_target_do_login_rx() and in case  LOGIN_FLAGS_READY is not
set, sleeps and checks it a second time before exiting.

Maged Mokhtar mmokhtar@xxxxxxxxxxxxxxxxxx


diff -up a/drivers/target/iscsi/iscsi_target_nego.c 
b/drivers/target/iscsi/iscsi_target_nego.c
--- a/drivers/target/iscsi/iscsi_target_nego.c  2016-02-08 
18:07:32.282055751 -0500
+++ b/drivers/target/iscsi/iscsi_target_nego.c  2016-02-08 
18:28:59.655937903 -0500
@@ -409,34 +409,11 @@ static void iscsi_target_sk_data_ready(s
        bool rc;

        pr_debug("Entering iscsi_target_sk_data_ready: conn: %p\n", 
conn);
-
-       write_lock_bh(&sk->sk_callback_lock);
-       if (!sk->sk_user_data) {
-               write_unlock_bh(&sk->sk_callback_lock);
-               return;
-       }
-       if (!test_bit(LOGIN_FLAGS_READY, &conn->login_flags)) {
-               write_unlock_bh(&sk->sk_callback_lock);
-               pr_debug("Got LOGIN_FLAGS_READY=0, conn: %p >>>>\n", 
conn);
-               return;
-       }
-       if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
-               write_unlock_bh(&sk->sk_callback_lock);
-               pr_debug("Got LOGIN_FLAGS_CLOSED=1, conn: %p >>>>\n", 
conn);
-               return;
-       }
-       if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, 
&conn->login_flags)) {
-               write_unlock_bh(&sk->sk_callback_lock);
-               pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p 
 >>>>\n", conn);
-               return;
-       }
-
        rc = schedule_delayed_work(&conn->login_work, 0);
        if (!rc) {
                pr_debug("iscsi_target_sk_data_ready, 
schedule_delayed_work"
                         " got false\n");
        }
-       write_unlock_bh(&sk->sk_callback_lock);
 }

 static void iscsi_target_sk_state_change(struct sock *);
@@ -551,6 +528,35 @@ static void iscsi_target_do_login_rx(str

        if (conn->sock) {
                struct sock *sk = conn->sock->sk;
+               write_lock_bh(&sk->sk_callback_lock);
+               if (!sk->sk_user_data) {
+                       write_unlock_bh(&sk->sk_callback_lock);
+                       return;
+               }
+               if (!test_bit(LOGIN_FLAGS_READY, &conn->login_flags)) {
+                       write_unlock_bh(&sk->sk_callback_lock);
+                       pr_debug("Got LOGIN_FLAGS_READY=0, conn: %p 
 >>>>\n", conn);
+                       /* retry a second time in case we are called too 
soon before LOGIN_FLAGS_READY is set */
+                       msleep(100);
+                       write_lock_bh(&sk->sk_callback_lock);
+                       if (!test_bit(LOGIN_FLAGS_READY, 
&conn->login_flags)) {
+                               write_unlock_bh(&sk->sk_callback_lock);
+                               pr_debug("Second retry. Got 
LOGIN_FLAGS_READY=0, conn: %p >>>>\n", conn);
+                               return;
+                       }
+               }
+               if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
+                       write_unlock_bh(&sk->sk_callback_lock);
+                       pr_debug("Got LOGIN_FLAGS_CLOSED=1, conn: %p 
 >>>>\n", conn);
+                       return;
+               }
+               if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, 
&conn->login_flags)) {
+                       write_unlock_bh(&sk->sk_callback_lock);
+                       pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p 
 >>>>\n", conn);
+                       return;
+               }
+
+               write_unlock_bh(&sk->sk_callback_lock);

                read_lock_bh(&sk->sk_callback_lock);
                state = iscsi_target_sk_state_check(sk);



So pondering this further, an earlier initial LOGIN_FLAGS_READY bit
assignment might work as a simpler patch:

diff --git a/drivers/target/iscsi/iscsi_target_nego.c 
b/drivers/target/iscsi/iscsi_target_nego.c
index 9fc9117..8720daf 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -1246,16 +1246,16 @@ int iscsi_target_start_negotiation(
{
       int ret;

-       ret = iscsi_target_do_login(conn, login);
-       if (!ret) {
-               if (conn->sock) {
-                       struct sock *sk = conn->sock->sk;
+       if (conn->sock) {
+               struct sock *sk = conn->sock->sk;

-                       write_lock_bh(&sk->sk_callback_lock);
-                       set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
-                       write_unlock_bh(&sk->sk_callback_lock);
-               }
-       } else if (ret < 0) {
+               write_lock_bh(&sk->sk_callback_lock);
+               set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+               write_unlock_bh(&sk->sk_callback_lock);
+       }
+
+       ret = iscsi_target_do_login(conn, login);
+       if (ret < 0) {
               cancel_delayed_work_sync(&conn->login_work);
               cancel_delayed_work_sync(&conn->login_cleanup_work);
               iscsi_target_restore_sock_callbacks(conn);


Note each iscsi_target_do_login_rx() work_struct callback is still
clearing LOGIN_FLAGS_READ_ACTIVE after iscsi_target_do_login() is
invoked each time to handle subsequent login request PDUs.

So it's unclear if you'll hit a similar problem with clear_bit for
LOGIN_FLAGS_READ_ACTIVE in subsequent login PDUs, or if the above to
kick-off ->sk_data_ready() work_struct processing after initial login
request PDU is received does the trick.

Either way, please give the above a shot on your setup, and let's see
where we end up.

Thanks again,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html