Re: [Open-FCoE] System crashes with increased drive count

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there design limit for the number of target drives that we should
not cross? Is 10 a reasonable number? We did notice that lower number
of target has less problems from our testing.

Are there any additional tests that we can do to narrow down the
problem? For example try different IO types, random vs sequential,
read vs write. Would that help?

Nab,
We cannot change the connection between the servers. They are bare
metal cloud servers that we don't have direct access.

Thanks,

Jun



On Wed, Jun 4, 2014 at 3:01 PM, Vasu Dev <vasu.dev@xxxxxxxxxxxxxxx> wrote:
> On Wed, 2014-06-04 at 15:21 -0700, Nicholas A. Bellinger wrote:
>> On Wed, 2014-06-04 at 11:45 -0700, Jun Wu wrote:
>> > The test setup includes one host and one target. The target exposes 10
>> > hard drives (or 10 LUNs) on one fcoe port. The single initiator runs
>> > 10 fio processes simultaneously to the 10 target drives through fcoe
>> > vn2vn. This is a simple configuration that other people may also want
>> > to try.
>> >
>> > >Exchange 0x6e4 is aborted and then target still sending frame, while
>> > >later should not occur but first setting up abort with 0 msec timeout
>> > >doesn not look correct either and it is different that 8000 ms on
>> > >initiator side.
>> >
>> > Should target stop sending frame after abort? I still see a lot of 0
>> > msec messages on target side. Is this something that should be
>> > addressed?
>> >
>> > >Reducing retries could narrow down to early aborts is the cause here,
>> > >can you try with REC disabled on initiator side for that using this
>> > >change ?
>> >
>> > By disabling REC have you confirmed that the early aborts is the
>> > cause? Is the abort caused by 0 msec timeout?
>> >
>>
>> The 0 msec timeout still look really suspicious..
>>
>> IIRC, these timeout values are exchanged in the FLOGI request packet,
>> and/or in a separate Request Timeout Value (RTV) packet..
>>
>> It might be worthwhile to track down where these zero-length settings
>> are coming from, as it might be a indication of what's wrong.
>>
>> How about the following patch to dump these values..?
>>
>> Also just curious, have you tried running these two hosts in
>> point-to-point mode without the switch to see if the same types of
>> issues occur..? It might be useful to help isolate the problem space a
>> bit.
>>
>> Vasu, any other ideas here..?
>>
>
> Your patch is good to debug 0 msec value, however this may not the issue
> since these are from incoming aborts processing and by then IO is
> aborted and would cause seq_send failures as I explained in other
> response.
>
> Nab, Shall tcm_fc take some action on seq_send failures to the target
> core which could help in slowing down host requests rate above the fcoe
> transport ?
>
> Thanks,
> Vasu
>> --nab
>>
>> diff --git a/drivers/scsi/libfc/fc_lport.c b/drivers/scsi/libfc/fc_lport.c
>> index e01a298..72b8676 100644
>> --- a/drivers/scsi/libfc/fc_lport.c
>> +++ b/drivers/scsi/libfc/fc_lport.c
>> @@ -379,6 +379,7 @@ static void fc_lport_flogi_fill(struct fc_lport *lport,
>>               sp->sp_tot_seq = htons(255);    /* seq. we accept */
>>               sp->sp_rel_off = htons(0x1f);
>>               sp->sp_e_d_tov = htonl(lport->e_d_tov);
>> +             printk("fc_lport_flogi_fill sp->sp_e_d_tov: %u\n", sp->sp_e_d_tov);
>>
>>               cp->cp_rdfs = htons((u16) lport->mfs);
>>               cp->cp_con_seq = htons(255);
>> @@ -1766,7 +1767,9 @@ void fc_lport_flogi_resp(struct fc_seq *sp, struct fc_frame *fp,
>>
>>       csp_flags = ntohs(flp->fl_csp.sp_features);
>>       r_a_tov = ntohl(flp->fl_csp.sp_r_a_tov);
>> +     printk("fc_lport_flogi_resp: r_a_tov: %u\n", r_a_tov);
>>       e_d_tov = ntohl(flp->fl_csp.sp_e_d_tov);
>> +     printk("fc_lport_flogi_resp: e_d_tov %u\n", e_d_tov);
>>       if (csp_flags & FC_SP_FT_EDTR)
>>               e_d_tov /= 1000000;
>>
>> @@ -1795,6 +1798,9 @@ void fc_lport_flogi_resp(struct fc_seq *sp, struct fc_frame *fp,
>>               fc_lport_enter_dns(lport);
>>       }
>>
>> +     printk("fc_lport_flogi_resp: lport->e_d_tov: %u\n", lport->e_d_tov);
>> +     printk("fc_lport_flogi_resp: lport->r_a_tov: %u\n", lport->r_a_tov);
>> +
>>  out:
>>       fc_frame_free(fp);
>>  err:
>> diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
>> index 589ff9a..4cdb055 100644
>> --- a/drivers/scsi/libfc/fc_rport.c
>> +++ b/drivers/scsi/libfc/fc_rport.c
>> @@ -142,7 +142,9 @@ static struct fc_rport_priv *fc_rport_create(struct fc_lport *lport,
>>       rdata->event = RPORT_EV_NONE;
>>       rdata->flags = FC_RP_FLAGS_REC_SUPPORTED;
>>       rdata->e_d_tov = lport->e_d_tov;
>> +     printk("fc_rport_create: rdata->e_d_tov: %u\n", rdata->e_d_tov);
>>       rdata->r_a_tov = lport->r_a_tov;
>> +     printk("fc_rport_create: rdata->r_a_tov: %u\n", rdata->r_a_tov);
>>       rdata->maxframe_size = FC_MIN_MAX_PAYLOAD;
>>       INIT_DELAYED_WORK(&rdata->retry_work, fc_rport_timeout);
>>       INIT_WORK(&rdata->event_work, fc_rport_work);
>> @@ -286,7 +288,9 @@ static void fc_rport_work(struct work_struct *work)
>>               rpriv->rp_state = rdata->rp_state;
>>               rpriv->flags = rdata->flags;
>>               rpriv->e_d_tov = rdata->e_d_tov;
>> +             printk("fc_rport_work: rpriv->e_d_tov: %u\n", rpriv->e_d_tov);
>>               rpriv->r_a_tov = rdata->r_a_tov;
>> +             printk("rpriv->r_a_tov: rpriv->r_a_tov: %u\n", rpriv->r_a_tov);
>>               mutex_unlock(&rdata->rp_mutex);
>>
>>               if (rport_ops && rport_ops->event_callback) {
>> @@ -638,10 +642,14 @@ static int fc_rport_login_complete(struct fc_rport_priv *rdata,
>>                * E_D_TOV is not valid on an incoming FLOGI request.
>>                */
>>               e_d_tov = ntohl(flogi->fl_csp.sp_e_d_tov);
>> +             printk("fc_rport_login_complete e_d_tov: %u\n", e_d_tov);
>>               if (csp_flags & FC_SP_FT_EDTR)
>>                       e_d_tov /= 1000000;
>> -             if (e_d_tov > rdata->e_d_tov)
>> +             if (e_d_tov > rdata->e_d_tov) {
>> +                     printk("fc_rport_login_complete rdata->e_d_tov %u\n",
>> +                             rdata->e_d_tov);
>>                       rdata->e_d_tov = e_d_tov;
>> +             }
>>       }
>>       rdata->maxframe_size = fc_plogi_get_maxframe(flogi, lport->mfs);
>>       return 0;
>> @@ -690,8 +698,11 @@ static void fc_rport_flogi_resp(struct fc_seq *sp, struct fc_frame *fp,
>>       if (!flogi)
>>               goto bad;
>>       r_a_tov = ntohl(flogi->fl_csp.sp_r_a_tov);
>> -     if (r_a_tov > rdata->r_a_tov)
>> +     printk("fc_rport_flogi_resp r_a_tov: %u\n", r_a_tov);
>> +     if (r_a_tov > rdata->r_a_tov) {
>> +             printk("fc_rport_flogi_resp rdata->r_a_tov: %u\n", rdata->r_a_tov);
>>               rdata->r_a_tov = r_a_tov;
>> +     }
>>
>>       if (rdata->ids.port_name < lport->wwpn)
>>               fc_rport_enter_plogi(rdata);
>> @@ -971,6 +982,7 @@ static void fc_rport_enter_plogi(struct fc_rport_priv *rdata)
>>               return;
>>       }
>>       rdata->e_d_tov = lport->e_d_tov;
>> +     printk("fc_rport_enter_plogi: rdata->e_d_tov: %u\n", rdata->e_d_tov);
>>
>>       if (!lport->tt.elsct_send(lport, rdata->ids.port_id, fp, ELS_PLOGI,
>>                                 fc_rport_plogi_resp, rdata,
>> @@ -1183,12 +1195,16 @@ static void fc_rport_rtv_resp(struct fc_seq *sp, struct fc_frame *fp,
>>                       if (tov == 0)
>>                               tov = 1;
>>                       rdata->r_a_tov = tov;
>> +                     printk("fc_rport_rtv_resp rdata->r_a_tov: %u\n",
>> +                             rdata->r_a_tov);
>>                       tov = ntohl(rtv->rtv_e_d_tov);
>>                       if (toq & FC_ELS_RTV_EDRES)
>>                               tov /= 1000000;
>>                       if (tov == 0)
>>                               tov = 1;
>>                       rdata->e_d_tov = tov;
>> +                     printk("fc_rport_rtv_resp rdata->e_d_tov: %u\n",
>> +                             rdata->e_d_tov);
>>               }
>>       }
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux