Dr. Volker Jaenisch wrote:
every combination that I've tried when there are multiple
simultaneous readers Reproduced that. On a single core more than one
simultanteous threads accessing the LUN over iSER also give read errors.
OK, Thanks a lot for doing all this testing / bug hunting work.
I read the Feb 2008 "iser multiple readers" thread and wasn't sure if /
what was the conclusion. OTOH Robin reported that the patch that slows
down tgt not to send the scsi response before the rdma write is
completed eliminated the error but OTOH Pete was doing some analysis of
the errors, @
http://lists.berlios.de/pipermail/stgt-devel/2008-February/001379.html said
"The offsets are always positive, which fits in with the theory that
future RDMAs are overwriting earlier ones. This goes against the
theory in your (my) patch, which guesses that the SCSI responsemessage
is sneaking ahead of RDMA operations."
and here starts the talking on possible relations of this error with
FMRs, where Pete suggested to disable FMRs and see if the problem
persists, I wasn't sure if you did that.
My guess is that the AMD hyper-transport may interfere with the fmr.
But I am no linux memory management specialist .. so please correct me
if I am wrong. Maybe the following happens: Bootet with one CPU all
FMR request goes to the 16GB RAM this single CPU directly addresses
via its memory controller. In case of more than one active CPU the
memory is fetched from both CPUs memory controllers with preference
to local memory. In seldom cases the memory manager fetchs memory for
the FMR process running on CPU0 from the CPU1 via the hyper-transport
channel and something weird happens.
To make sure we are on the same page (...) here: FMR (Fast Memory
Registration) is a means to register with the HCA a (say) arbitrary list
of pages to be used for an I/O. This page SC (scatter-gather) list was
allocated and provided by the SCSI midlayer to the iSER SCSI LLD
(low-level-driver) through the queuecommand interface. So I read your
comment as saying that when using one CPU and or a system with one
memory controller all I/O are served with pages from the "same memory"
where when this doesn't happen, something gets broken.
I wasn't sure to follow on the sentence "In seldom cases the memory
manager fetchs memory for the FMR process running on CPU0 from the CPU1
via the hyper-transport channel and something weird happens" - can you
explain a bit what you were referring to?
Or.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html