Re: iSCSI login failure, possible race condition

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Sat, 27 Feb 2016 18:15:46 -0800

Hi Maged,

On Mon, 2016-02-08 at 17:43 -0700, Maged Mokhtar wrote:
> Hi Nick
> 
> Please find patch below:

Apologies for the delayed follow-up.  Comments below.

> 
> Solves an issue with frequent "iSCSI Login negotiation failed."  observed
> during target discovery from Microsoft client initiator running inside a
> virtual machine to a target running in another virtual machine inside the
> same host under VMWare.
> iscsi_target_sk_data_ready() callback is received too soon before
> LOGIN_FLAGS_READY flag is set.
> The patch fixes the problem by moving the section that checks the various
> LOGIN_FLAGS_* from the iscsi_target_sk_data_ready() callback to the worker
> function iscsi_target_do_login_rx() and in case  LOGIN_FLAGS_READY is not
> set, sleeps and checks it a second time before exiting.
> 
> Maged Mokhtar mmokhtar@xxxxxxxxxxxxxxxxxx
> 
> 
> diff -up a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> --- a/drivers/target/iscsi/iscsi_target_nego.c  2016-02-08 18:07:32.282055751 -0500
> +++ b/drivers/target/iscsi/iscsi_target_nego.c  2016-02-08 18:28:59.655937903 -0500
> @@ -409,34 +409,11 @@ static void iscsi_target_sk_data_ready(s
>         bool rc;
> 
>         pr_debug("Entering iscsi_target_sk_data_ready: conn: %p\n", conn);
> -
> -       write_lock_bh(&sk->sk_callback_lock);
> -       if (!sk->sk_user_data) {
> -               write_unlock_bh(&sk->sk_callback_lock);
> -               return;
> -       }
> -       if (!test_bit(LOGIN_FLAGS_READY, &conn->login_flags)) {
> -               write_unlock_bh(&sk->sk_callback_lock);
> -               pr_debug("Got LOGIN_FLAGS_READY=0, conn: %p >>>>\n", conn);
> -               return;
> -       }
> -       if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
> -               write_unlock_bh(&sk->sk_callback_lock);
> -               pr_debug("Got LOGIN_FLAGS_CLOSED=1, conn: %p >>>>\n", conn);
> -               return;
> -       }
> -       if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> -               write_unlock_bh(&sk->sk_callback_lock);
> -               pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> -               return;
> -       }
> -
>         rc = schedule_delayed_work(&conn->login_work, 0);
>         if (!rc) {
>                 pr_debug("iscsi_target_sk_data_ready, schedule_delayed_work"
>                          " got false\n");
>         }
> -       write_unlock_bh(&sk->sk_callback_lock);
>  }
> 
>  static void iscsi_target_sk_state_change(struct sock *);
> @@ -551,6 +528,35 @@ static void iscsi_target_do_login_rx(str
> 
>         if (conn->sock) {
>                 struct sock *sk = conn->sock->sk;
> +               write_lock_bh(&sk->sk_callback_lock);
> +               if (!sk->sk_user_data) {
> +                       write_unlock_bh(&sk->sk_callback_lock);
> +                       return;
> +               }
> +               if (!test_bit(LOGIN_FLAGS_READY, &conn->login_flags)) {
> +                       write_unlock_bh(&sk->sk_callback_lock);
> +                       pr_debug("Got LOGIN_FLAGS_READY=0, conn: %p >>>>\n", conn);
> +                       /* retry a second time in case we are called too soon before LOGIN_FLAGS_READY is set */
> +                       msleep(100);
> +                       write_lock_bh(&sk->sk_callback_lock);
> +                       if (!test_bit(LOGIN_FLAGS_READY, &conn->login_flags)) {
> +                               write_unlock_bh(&sk->sk_callback_lock);
> +                               pr_debug("Second retry. Got LOGIN_FLAGS_READY=0, conn: %p >>>>\n", conn);
> +                               return;
> +                       }
> +               }
> +               if (test_bit(LOGIN_FLAGS_CLOSED, &conn->login_flags)) {
> +                       write_unlock_bh(&sk->sk_callback_lock);
> +                       pr_debug("Got LOGIN_FLAGS_CLOSED=1, conn: %p >>>>\n", conn);
> +                       return;
> +               }
> +               if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> +                       write_unlock_bh(&sk->sk_callback_lock);
> +                       pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> +                       return;
> +               }
> +
> +               write_unlock_bh(&sk->sk_callback_lock);
> 
>                 read_lock_bh(&sk->sk_callback_lock);
>                 state = iscsi_target_sk_state_check(sk);
> 
> 

So pondering this further, an earlier initial LOGIN_FLAGS_READY bit
assignment might work as a simpler patch:

diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index 9fc9117..8720daf 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -1246,16 +1246,16 @@ int iscsi_target_start_negotiation(
 {
        int ret;
 
-       ret = iscsi_target_do_login(conn, login);
-       if (!ret) {
-               if (conn->sock) {
-                       struct sock *sk = conn->sock->sk;
+       if (conn->sock) {
+               struct sock *sk = conn->sock->sk;
 
-                       write_lock_bh(&sk->sk_callback_lock);
-                       set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
-                       write_unlock_bh(&sk->sk_callback_lock);
-               }
-       } else if (ret < 0) {
+               write_lock_bh(&sk->sk_callback_lock);
+               set_bit(LOGIN_FLAGS_READY, &conn->login_flags);
+               write_unlock_bh(&sk->sk_callback_lock);
+       }
+
+       ret = iscsi_target_do_login(conn, login);
+       if (ret < 0) {
                cancel_delayed_work_sync(&conn->login_work);
                cancel_delayed_work_sync(&conn->login_cleanup_work);
                iscsi_target_restore_sock_callbacks(conn);


Note each iscsi_target_do_login_rx() work_struct callback is still
clearing LOGIN_FLAGS_READ_ACTIVE after iscsi_target_do_login() is
invoked each time to handle subsequent login request PDUs.

So it's unclear if you'll hit a similar problem with clear_bit for
LOGIN_FLAGS_READ_ACTIVE in subsequent login PDUs, or if the above to
kick-off ->sk_data_ready() work_struct processing after initial login
request PDU is received does the trick.

Either way, please give the above a shot on your setup, and let's see
where we end up.

Thanks again,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html