On Mon, 15 Dec 2008 10:48:05 +0100 Tomasz Chmielewski <mangoo@xxxxxxxx> wrote: > FUJITA Tomonori schrieb: > > On Thu, 11 Dec 2008 07:58:30 -0800 > > "Jesse Nelson" <spheromak@xxxxxxxxx> wrote: > > > >> were running vanila 2.6.27.4 kern with tgt 0.9.2 with about 30 > >> targets and about 10mb/s throughput > >> i am constantly (daily) seeing tgtd segfault. no real deep info just > >> this error in the logs: > >> segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in > >> tgtd[400000+23000] > >> any ideas or suggestions how i can dig deeper here ? > > > > Can you run gdb with tgtd? > > > > If you can't, can you give the very detailed information about what > > you are doing, which enable me to do the same thing you do to > > reproduce the problem. > > I'm seeing those occasionally too (one tgtd process dies), but rather *very* rarely. > > It doesn't seem to depend on load type, number or connected/working initiators, > configured targets etc. and I'm not sure how to reproduce it. > > One thing that comes to my mind is that one tgtd process dies when initiator wants > to read data and tgtd can't "deliver" it immediately (i.e., I/O "frozen" because of > SATA resets/exceptions/timeouts). It doesn't happen always on such SATA timeouts and > is therefore hard to reproduce. TMF (an initiator tries to abort a request due to timeout) might be related with your problem. I'll dig into it this weekend. > Look at this log - tgtd segfaulted just after SATA timeouts (after ~50 days of working properly). > This happened with tgtd version fetched on 2008-Oct-24, running on x86, > with just two initiators connected, load to one target was perhaps about 5 MB/s, > to the second target was close to 0 MB/s. > > Dec 11 21:57:37 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > Dec 11 21:57:37 megathecus kernel: ata4.00: cmd 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in > Dec 11 21:57:37 megathecus kernel: res 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout) > Dec 11 21:57:37 megathecus kernel: ata4.00: status: { DRDY } > Dec 11 21:57:37 megathecus kernel: ata4: soft resetting link > Dec 11 21:57:37 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > Dec 11 21:57:37 megathecus kernel: ata4.00: configured for UDMA/133 > Dec 11 21:57:37 megathecus kernel: ata4: EH complete > Dec 11 21:58:07 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > Dec 11 21:58:07 megathecus kernel: ata4.00: cmd 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in > Dec 11 21:58:07 megathecus kernel: res 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout) > Dec 11 21:58:07 megathecus kernel: ata4.00: status: { DRDY } > Dec 11 21:58:07 megathecus kernel: ata4: soft resetting link > Dec 11 21:58:07 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > Dec 11 21:58:07 megathecus kernel: ata4.00: configured for UDMA/133 > Dec 11 21:58:07 megathecus kernel: ata4: EH complete > Dec 11 21:58:08 megathecus kernel: tgtd[2567]: segfault at 00000220 eip 0804f0b5 esp 77abdac0 error 4 > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 > Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > > > I reported a similar issue in June 2008 - see the thread titled > "disk kicked out of RAID -> tgtd segmentation fault": > > http://lists.wpkg.org/pipermail/stgt/2008-June/thread.html#1702 > http://lists.wpkg.org/pipermail/stgt/2008-July/thread.html#1746 > > Can it be related somehow? I thought that I fixed the bug in the above thread. -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html