FUJITA Tomonori schrieb:
Here is a lengthy log for 25 minutes - with several device
suspends/resumes.
I'll give it 2 more hours testing...
Not 2 hours, but some observations - is it possible that tgtd serves
wrong data to the initiator when the access to the media is slow?
tgtd doesn't return wrong (bogus) data.
The initiator has to give up I/O requests at some point if the target
doesn't send responses for long time. Then you see the I/O errors.
I have the timeout set to several days, so it doesn't give up within
single minutes. I can shut the target down for days, and the initiator
will recover nicely when the target is back.
node.session.timeo.replacement_timeout = 1000000
And here, we had no network disconnections/malfunctions.
Or on the target side, the kernel gives tgtd I/O errors if the backing
store doesn't return responses for long time. Then tgtd sends the
errors to the initiator.
No, there were no I/O errors on the target machine.
Any I/O error would be reported in dmesg; I didn't have any of such errors.
(...)
And you see something like this in the kernel log:
sd 1:0:0:0: Device offlined - not ready after error recovery
sd 1:0:0:0: [sdd] Unhandled error code<6>sd 1:0:0:0: [sdd] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK
end_request: I/O error, dev sdd, sector 0
Buffer I/O error on device sdd, logical block 0
Buffer I/O error on device sdd, logical block 1
Buffer I/O error on device sdd, logical block 2
Buffer I/O error on device sdd, logical block 3
sd 1:0:0:0: rejecting I/O to offline device
Buffer I/O error on device sdd, logical block 0
You can replace dd with tgtd in the above example.
Exactly - you had errors in the kernel log. I didn't have anything like
that on the machine running tgtd.
The initiator machine did have these errors, though.
--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html