Re: help tgt segfault

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tomasz Chmielewski schrieb:
FUJITA Tomonori schrieb:
On Thu, 11 Dec 2008 07:58:30 -0800
"Jesse Nelson" <spheromak@xxxxxxxxx> wrote:

were running vanila 2.6.27.4 kern with  tgt 0.9.2   with about 30
targets and about 10mb/s throughput
i am constantly (daily) seeing tgtd segfault. no real deep info just
this error in the logs:
    segfault at 8 ip 000000000040ebed sp 00007fffb259cb30 error 6 in
tgtd[400000+23000]
any ideas or suggestions how i can dig deeper here ?

Can you run gdb with tgtd?

If you can't, can you give the very detailed information about what
you are doing, which enable me to do the same thing you do to
reproduce the problem.

I'm seeing those occasionally too (one tgtd process dies), but rather *very* rarely.

It doesn't seem to depend on load type, number or connected/working initiators,
configured targets etc. and I'm not sure how to reproduce it.

One thing that comes to my mind is that one tgtd process dies when initiator wants to read data and tgtd can't "deliver" it immediately (i.e., I/O "frozen" because of SATA resets/exceptions/timeouts). It doesn't happen always on such SATA timeouts and is therefore hard to reproduce.

I can reproduce it reliably on a software RAID-5 array with a broken disk (with badblocks).

Just start badblocks -v /dev/broken/disk, wait for a broken area of the disk and tgtd will segfault.

I guess it will also segfault on prolonged I/O access.


ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in res 40/00:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x4 (timeout) ata4.00: status: { DRDY } ata4: hard resetting link ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata4.00: configured for UDMA/133 sd 4:0:0:0: [sdd] Result: hostbyte=0x00 driverbyte=0x08 sd 4:0:0:0: [sdd] Sense Key : 0xb [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
       1d 78 cd 3d
sd 4:0:0:0: [sdd] ASC=0x0 ASCQ=0x0
end_request: I/O error, dev sdd, sector 494456120
printk: 16 messages suppressed.
Buffer I/O error on device sdd, logical block 61807015
ata4: EH complete
sd 4:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 4:0:0:0: [sdd] Write Protect is off
sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: edma_err 0x00000084, EDMA self-disable
ata4.00: cmd 25/00:08:38:cd:78/00:00:1d:00:00/e0 tag 0 dma 4096 in
        res 51/40:00:3d:cd:78/40:00:1d:00:00/e0 Emask 0x9 (media error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { UNC }
ata4: hard resetting link


During the scan which revealed ~300 badblocks (but there were lots of SATA timeouts/resets),
tgtd segfaulted three times so far (with a check script started via cron every minute).

Dec 16 12:03:49 megathecus kernel: tgtd[3138]: segfault at 00000220 eip 0804f0b5 esp 77b87730 error 4
Dec 16 12:08:18 megathecus kernel: tgtd[3558]: segfault at 000001e4 eip 0804c2fa esp 77f8caf0 error 4
Dec 16 12:44:57 megathecus kernel: tgtd[3649]: segfault at 000001e4 eip 0804c2fa esp 77bc3f30 error 4


--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux SCSI]     [Linux RAID]     [Linux Clusters]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]

  Powered by Linux