On Tue, 2014-01-28 at 13:01 +0530, Arshad Hussain wrote: > Hi Nicholas, > > On 1/24/2014 1:25 AM, Nicholas A. Bellinger wrote: > > <SNIP> > >>> Looking at the code in iscsi_target_erl1.c:iscsit_handle_r2t_snack(), it > >>> is checking the validity of received BegRun + RunLength values, in that > >>> they don't exceed what R2TSN has already been put on the wire: > >>> > >>> if (runlength) { > >>> if ((begrun + runlength) > cmd->r2t_sn) { > >>> pr_err("Command ITT: 0x%08x received R2T SNACK" > >>> " with BegRun: 0x%08x, RunLength: 0x%08x, exceeds" > >>> " current R2TSN: 0x%08x, protocol error.\n", > >>> cmd->init_task_tag, begrun, runlength, cmd->r2t_sn); > >>> return iscsit_reject_cmd(cmd, > >>> ISCSI_REASON_BOOKMARK_INVALID, buf); > >>> } > >>> last_r2tsn = (begrun + runlength); > >>> } else > >>> last_r2tsn = cmd->r2t_sn; > >>> > >>> Can you confirm your bug is with R2T SNACKs, and not a different type of > >>> SNACK..? > >> Yes I confirm it is with R2T SNACK. My two cents. Without htonl() > >> conversion the > >> value 0x1 (intel) is getting converted into large value 0x10000000 on > >> assignment. > >> Which in turn is getting assigned to last_r2tsn. And further down your > >> code... > >> > >> while (begrun < last_r2tsn) { > >> > >> ... // it may be looping in uncontrolled manner and failing. > >> begrun++ > >> } > >> > >> > > So the above check in iscsit_handle_r2t_snack(): > > > > if (runlength) { > > if ((begrun + runlength) > cmd->r2t_sn) { > > <SEND REJECT> > > } > > } > > > > is what's supposed to be protecting against the reception of a bogus > > runlength exceeding the last R2TSN sent out by the command. > > > > How about testing with the following quick debug patch to see why > > runlength=0x10000000 is not hitting the above sanity check..? > > > > Thanks, > > > > --nab > > <SNIP> > I tried out you patch. I ran 3 cases with it. The case with runlength>0 > and without > htonl() conversion still crashes the machine. > > 1) Case 1 : begrun=0,runlength=0. It works. > Jan 28 01:35:10 wfs kernel: [ 136.416906] Debug #1 begrun: 0 runlength: > 0 cmd->r2t_sn: 1 > Jan 28 01:35:10 wfs kernel: [ 136.416907] Debug #2 last_r2tsn: 1 > > 2) Case 2: begrun=0 and (RunLength >0 ) However, it is converted via > htonl(1). It works. > Jan 28 01:49:02 wfs kernel: [ 968.100236] Debug #1 begrun: 0 runlength: > 1 cmd->r2t_sn: 1 > Jan 28 01:49:02 wfs kernel: [ 968.100237] Debug #2 last_r2tsn: 1 > > 3) Case 3: begrun=0 and Runlength=1. No conversion done with htonl. It > dump_stack(). > Attached it the screen shot of the dump_stack(). It looks like is > hitting the new case (last_r2tsn>0x0000FFFF) > you have put in. > So without full dmesg output it's kinda hard to tell. ;) In any event, I'm pretty sure it's hitting the same original sanity check in iscsit_handle_r2t_snack() mentioned above for bogus RunLengths, and it's the generation of the REJECT that is triggering the bug which is likely a >= v3.10 regression. I'd really like to get this resolved, and the quickest way to do that would be to send a copy of your test-case code (offlist). Otherwise, I'll have to put together something similar. Thanks, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html