Re: COSS causing squid Segment Violation on FreeBSD 6.2S

Adrian Chadd <adrian@xxxxxxxxxxxxxxx> · Thu, 26 Apr 2007 19:26:23 +0800

On Thu, Apr 26, 2007, Mark Powell wrote:
> Hi,
>   Just in the process of putting a small percentage of our web requests 
> through 3 new caches to test them. However, I'm encountering SEGV 
> seemingly due to COSS. Two of the caches ran for about a day and then 
> failed e.g.
> 
> 2007/04/26 10:39:26| storeCossCompletePendingReloc: got failure (-1)
> FATAL: Received Segment Violation...dying.

Hm! Well, thats a sign that the IO for that pending object relocation
failed. Not a good sign.

> When the caches restart, read the COSS dir and then when they finish 
> reading it they die with the same error again. They are both seemingly in 
> a loop doing this forever now. However, the other cache is still running 
> happily (perhaps just luck?).

I'd try to figure out why the relocation is failing. Find the line in
store_io_coss.c which logs that, stick a break statement in gdb and run
squid inside gdb (squid -ND). Then when the thing fails you'll want to
grab some information about the operation:

print *op
print *pr

This assumes, of course, you can handle a cache hanging in gdb and not restarting..

Adrian