Re: Data Digest Errors - Good News

jrepac@xxxxxxxxx · Sat, 21 Apr 2012 11:46:51 -0700 (PDT)

The original build used for tracing the digest issue was from December.  I traced failing transactions on the wire through the CRC32C calc and was trying to do a data dump during the CRC calc to figure out where the corruption was occurring.  The VM crashed every time the dump occurred and other work was backing up so I gave up for a while.  

It just seemed like a pipeline full condition may be related to the failure since there was a delay after the format started until the failures started to occur and then all writes would fail.  Restarting the format without doing anything to the target would result in the writes working again....at least for a few seconds.  The test was run on a pure 10 GbE network.
There were many changes in the target from when I reported the failure and now.  My most recent test on kernel 3.4-rc3 was done on a clone of the original VM on a different ESX box. The last failing test was performed on 3.2 rc4+ from your repository.  There is a possibility that there could be a memory management problem with the original ESX box at higher data rates.  I need to get more air time on 3.4-rc3 or the 3.4 stable before porting it to the rest of my test automation network.  I can get more data then to see if the problem is really gone.

-joe  

----- Original Message -----
From: Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx>
To: jrepac@xxxxxxxxx
Cc: "target-devel@xxxxxxxxxxxxxxx" <target-devel@xxxxxxxxxxxxxxx>
Sent: Saturday, April 21, 2012 1:59 AM
Subject: Re: Data Digest Errors - Good News

On Fri, 2012-04-20 at 08:52 -0700, jrepac@xxxxxxxxx wrote:
> Hi Nicholas,
> I pulled kernel 3.4-rc3 yesterday to see if the digest errors with
> Windows format were still there.  They are gone.  I think something
> got fixed in the data path since there did not appear to be a problem
> with the CRC32C logic.  The original problem occurred about 2-3
> seconds into the sequence of writes for the format and after the first
> failing write, all subsequent writes would fail.  Writes would work
> again if the format was restarted until the first digest error.
> 

Mmmm, very strange..  I can't think of what change currently in mainline
v3.4-rc3 that would be effecting iscsi_target_mod in this fashion..?

Could it be possible that the culprit is something outside of
drivers/target/ code..?  Are you still able to reproduce with the same
hardware and current lio-core.git HEAD @ 3.4-rc2..?

> Is fixing SendTargets still on your list?
> 

I'll be AFK most of next week, but promise to get back to this soon..

--nab
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html