On Thu, 12 Feb 2009 11:29:45 +0100 "Dr. Volker Jaenisch" <volker.jaenisch@xxxxxxxxx> wrote: > Dear Mr. Tomonori! > > We got read errors usinfg iser (over infiniband) transport with stgtd (0.9.3). > I discussed this on the open-iscsi mailing list firstly. > > After review of our tests I found that restarting stgt > cures the read-errors for the next access to the target. > > Here is what we have done: > > On Initiator writing: > ares:~# lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000 mismatch=1 > 1000.0000 MB in 6.3606 secs, 157.2190 MB/sec > > Check on Target is fine: > athene:~# lmdd of=internal if=/dev/vg0/test ipat=1 bs=1M count=1000 > mismatch=1 > 1000.0000 MB in 0.8849 secs, 1130.0176 MB/sec > > On initiator reading: > ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10 > off=1000000 want=1a0000 got=1b3000 > off=1000000 want=1a0004 got=1b3004 > off=1000000 want=1a0008 got=1b3008 > off=1000000 want=1a000c got=1b300c > off=1000000 want=1a0010 got=1b3010 > off=1000000 want=1a0014 got=1b3014 > off=1000000 want=1a0018 got=1b3018 > off=1000000 want=1a001c got=1b301c > off=1000000 want=1a0020 got=1b3020 > off=1000000 want=1a0024 got=1b3024 > 1.0000 MB in 0.0064 secs, 157.2822 MB/sec > > But if I restart the TGT-Daemon on the target side: Every thing is ok. > ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10 > 1000.0000 MB in 22.2695 secs, 44.9045 MB/sec > But only for the first run of lmdd! Then the error strikes reproducable > every time. > > ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10 > off=0 want=8ae00 got=a9e00 > off=0 want=8ae04 got=a9e04 > off=0 want=8ae08 got=a9e08 > off=0 want=8ae0c got=a9e0c > off=0 want=8ae10 got=a9e10 > off=0 want=8ae14 got=a9e14 > off=0 want=8ae18 got=a9e18 > off=0 want=8ae1c got=a9e1c > off=0 want=8ae20 got=a9e20 > off=0 want=8ae24 got=a9e24 > 0.0000 MB in 0.0029 secs, 0.0000 MB/sec > ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10 > off=51000000 want=3129e00 got=3147e00 > off=51000000 want=3129e04 got=3147e04 > off=51000000 want=3129e08 got=3147e08 > off=51000000 want=3129e0c got=3147e0c > off=51000000 want=3129e10 got=3147e10 > off=51000000 want=3129e14 got=3147e14 > off=51000000 want=3129e18 got=3147e18 > off=51000000 want=3129e1c got=3147e1c > off=51000000 want=3129e20 got=3147e20 > off=51000000 want=3129e24 got=3147e24 > 51.0000 MB in 0.1463 secs, 348.5702 MB/sec > > How to debug further? In short, - the target box got the data from the initiator box and wrote it to disk properly. - the target box reads the data and and sends it to the initiator properly on the first run after restarting tgtd. - then the target box sends the wrong data after that. Right? How about writing twice? The target can still store the data (which written on the second run) on disk? -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html