Re: 3.5.27 to 4.4: a worker is dead

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 8 Nov 2018 16:32:03 +1300

On 8/11/18 12:24 AM, Heiler Bemerguy wrote:
> After some hours the worker 4 died unexpectely and didn't come back. I
> have cleaned the rock cache store before upgrading just to be safe..
> COLD start........
> 
> proxy    15478  0.6  1.8 9518972 1207212 ?     S    Nov06   6:17
> (squid-5) --kid squid-5 -s
> proxy    15480  0.4  1.4 9391808 954248 ?      S    Nov06   3:56
> (squid-3) --kid squid-3 -s
> proxy    15481  1.1  2.5 9734420 1679360 ?     S    Nov06  10:45
> (squid-2) --kid squid-2 -s
> proxy    26933  0.0  0.0 6501296 26768 ?       S    00:25   0:01
> (squid-coord-9) --kid squid-coord-9 -s
> proxy    26936  0.0  0.6 7424208 401788 ?      S    00:25   0:07
> (squid-disk-6) --kid squid-disk-6 -s
> proxy    27011  0.0  0.2 7424208 185580 ?      S    00:28   0:03
> (squid-disk-7) --kid squid-disk-7 -s
> proxy    27227  0.0  0.0 7424208 54980 ?       S    00:37   0:01
> (squid-disk-8) --kid squid-disk-8 -s
> 
> AMD64 12-core, 64gb ram
> 
> cache_mem 5200 MB
> maximum_object_size_in_memory 2 MB
> maximum_object_size 4 GB
> workers 5
> cache_dir rock /cache2 131000 min-size=1 max-size=196608
> cache_dir rock /cache3 131000 min-size=196609 max-size=624288
> cache_dir rock /cache4 131000 min-size=624289 max-swap-rate=500
> swap-timeout=500
> 

The worker "kid4" died with an exception doing something with shared-memory.

> 2018/11/07 07:06:19 kid4| clientProcessHit: URL mismatch,
> '[unknown_URI]' !=
> 'http://ocsp.godaddy.com//MEQwQjBAMD4wPDAJBgUrDgMCGgUABBTkIInKBAzXkF0Qh0pel3lfHJ9GPAQU0sSw0pHUTBFxs2HLPaH%2B3ahq1OMCAxvnFQ%3D%3D'
> 

Your rock cache apparently contains at least several objects with the
exact string "[unknown URI]" as their URL / store-ID key. These are
attempting to be delivered to the client when it requested that OSCP URL.

Squid has detected the problem and will fetch a new object from the
network instead of the cache. It also begins the process of purging that
corrupt object from the cache.

... but then ...

> 2018/11/07 07:07:11 kid4| WARNING: communication with /cache2/rock may
> be too slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os

Activity the worker is asking the Disker to do is taking a long time,
some actions have started to timeout. If they were fetching objects
those client transactions will be treated as MISS.
 I'm not sure what happens if those were deletions.

> 2018/11/07 07:07:11 kid4| clientProcessHit: URL mismatch,
> '[unknown_URI]' != 'http://www.orm.com.br/templates/noticiaDev.php'
> 2018/11/07 07:13:26 kid4| clientProcessHit: URL mismatch,
> '[unknown_URI]' !=
> 'http://storage.googleapis.com/update-delta/hfnkpimlhhgieaddgfemjhofmfblmnib/4803/4802/b767e40b59c48c2ec52977502ac10e35b84c00600197162f42dd941b5095cafd.crxd'
> 
> 2018/11/07 07:17:03 kid4| FATAL: Dying from an exception handling
> failure; exception: check failed: false
>     exception location: mem/PageStack.cc(106) push
> 

The SMP shared-memory space has been asked to allocate more memory than
it has total capacity.

...
> 2018/11/07 07:17:03 kid4| Starting Squid Cache version 4.4 for
> x86_64-pc-linux-gnu...
...
> 2018/11/07 07:17:03 kid4| WARNING: disk-cache maximum object size is too
> large for mem-cache: 4194304.00 KB > 2048.00 KB

The above might have something to do with the shared-memory problem.

It is part of rock design that the cache contents have to be able to be
stored in memory.

...
> 2018/11/07 07:17:09 kid4| storeDirWriteCleanLogs: Operation aborted.
> 2018/11/07 07:17:09 kid4| FATAL: kid4 registration timed out

The auto-recovery process keeps dying trying to re-start it because of
the very long time it takes to startup when the proxy is already under
traffic load and memory pressure. These type of delays are usually due
to Squid being configured with very large rock caches.

So, to solve this try reducing the cache size. rock was designed for
caches in the MB ranges with many small objects. So start small then
push it up until problems start to appear again.

Keep in mind that rock was designed and optimized for caches of several
hundred *MB* with millions of small 0-32KB sized objects. For large
objects the UFS caches are optimal.

Squid caching is intended for the two types to work together for best
performance at all object size ranges.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users