Re: help tgt segfault

Tomasz Chmielewski <mangoo@xxxxxxxx> · Mon, 15 Dec 2008 10:48:05 +0100

FUJITA Tomonori schrieb:
On Thu, 11 Dec 2008 07:58:30 -0800
"Jesse Nelson" <spheromak@xxxxxxxxx> wrote:

were running vanila 2.6.27.4 kern with  tgt 0.9.2   with about 30
targets and about 10mb/s throughput
i am constantly (daily) seeing tgtd segfault. no real deep info just
this error in the logs:
    segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in
tgtd[400000+23000]
any ideas or suggestions how i can dig deeper here ?

Can you run gdb with tgtd?

If you can't, can you give the very detailed information about what
you are doing, which enable me to do the same thing you do to
reproduce the problem.

I'm seeing those occasionally too (one tgtd process dies), but rather *very* rarely.

It doesn't seem to depend on load type, number or connected/working initiators,
configured targets etc. and I'm not sure how to reproduce it.

One thing that comes to my mind is that one tgtd process dies when initiator wants 
to read data and tgtd can't "deliver" it immediately (i.e., I/O "frozen" because of 
SATA resets/exceptions/timeouts). It doesn't happen always on such SATA timeouts and 
is therefore hard to reproduce.

Look at this log - tgtd segfaulted just after SATA timeouts (after ~50 days of working properly).
This happened with tgtd version fetched on 2008-Oct-24, running on x86, 
with just two initiators connected, load to one target was perhaps about 5 MB/s,
to the second target was close to 0 MB/s.

Dec 11 21:57:37 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Dec 11 21:57:37 megathecus kernel: ata4.00: cmd 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in
Dec 11 21:57:37 megathecus kernel:          res 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout)
Dec 11 21:57:37 megathecus kernel: ata4.00: status: { DRDY }
Dec 11 21:57:37 megathecus kernel: ata4: soft resetting link
Dec 11 21:57:37 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Dec 11 21:57:37 megathecus kernel: ata4.00: configured for UDMA/133
Dec 11 21:57:37 megathecus kernel: ata4: EH complete
Dec 11 21:58:07 megathecus kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Dec 11 21:58:07 megathecus kernel: ata4.00: cmd 25/00:00:bf:78:1f/00:02:14:00:00/e0 tag 0 dma 262144 in
Dec 11 21:58:07 megathecus kernel:          res 40/00:01:01:4f:c2/40:00:15:00:00/00 Emask 0x4 (timeout)
Dec 11 21:58:07 megathecus kernel: ata4.00: status: { DRDY }
Dec 11 21:58:07 megathecus kernel: ata4: soft resetting link
Dec 11 21:58:07 megathecus kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Dec 11 21:58:07 megathecus kernel: ata4.00: configured for UDMA/133
Dec 11 21:58:07 megathecus kernel: ata4: EH complete
Dec 11 21:58:08 megathecus kernel: tgtd[2567]: segfault at 00000220 eip 0804f0b5 esp 77abdac0 error 4
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write Protect is off
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
Dec 11 21:58:08 megathecus kernel: sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I reported a similar issue in June 2008 - see the thread titled
"disk kicked out of RAID -> tgtd segmentation fault":

http://lists.wpkg.org/pipermail/stgt/2008-June/thread.html#1702
http://lists.wpkg.org/pipermail/stgt/2008-July/thread.html#1746

Can it be related somehow?

--
Tomasz Chmielewski
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html