Hey Dr,
(notes inside)
On 19/11/13 00:54, Amos Jeffries wrote:
Either event may have corrupted it slightly. Squid is supposed to
contain sufficient checksum protection in rock to cope with most forms
of corruption, but nobody's perfect.
So, please try to get a core dump, or stack trace of the problem before
going any further. This will help us to isolate where the problem is
occuring. If it is corruption related we will be needing to try and add
better protection for that case.
*after* that, please try:
* shutting Down your Squid by any means necessary to ensure there are 0
processes running.
"pgrep squid"
* *move* the caches to somewhere they can be analysed later if necessary.
* rebuild the configured cache_dir with squid -z
* wait until -z process completed *AND* there are 0 processes still
running in the background
And no traffic at all on the server.
* restart the main Squid
This entire process should not take more than a minute.
If the problem remains after doing that you will have successfully
eliminated cache corruption as a cause and we go back to needing a
backtrace to figure it out.
The same result\test can be achieved by running the service in "RAM
only" cache mode.
If all these Dying happens lots of times it means that to reproduce it
you will need a very small amount of run-time(probably).
I assume that Letting the service run on a "RAM only" mode will allow
this service to still serve clients more then 24\48 hours smoothly with
0 problems in cache.log.
In a case that you will still have troubles on a RAM only mode we can be
more then 90% sure that the cause was related to DISK cache.
I will not run to say that rock was fault at the problem.
If you can attach the related DISKs\partitions details from fstab it can
might help more.
(feel free to send all the technical data in a PM while it will can take
time to process and especially core dumps)
Regards,
Eliezer