Re: Fwd: rock store: a bug or ...?

Nikolai Gorchilov <niki@xxxxxxxx> · Wed, 29 Jan 2014 12:26:17 +0200

I just made a new discovery - the problem seemingly disappears when
running Squid with -N, thus removing the disker process out of the
equation.

Could it be a worker-disker IPC related issue, instead of rock store
specific problem?

Best,
Niki

On Wed, Jan 29, 2014 at 12:00 PM, Nikolai Gorchilov <niki@xxxxxxxx> wrote:
> Dear Amos
>
> On Wed, Jan 29, 2014 at 5:18 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
>>>
>>> The simple steps to reproduce:
>>>
>>> 1. Empty store dir and recreate swap with -z
>>
>> Did you leave sufficient time after doing this to ensure that the -z
>> operation was completed successfully and all the "squid -z" spawned
>> processes stopped?
>
> Yes.
>
>>> 2. Start Squid with the above config
>>> 3. Open a new private browser session (or clear browser cache) and
>>> request www.w3.org. Most of the 35+ requested items will get cached.
>>> 4. Once the page loads fully, request full reload Ctrl-Shift-R/Cmd-Shift-R
>>
>> Step 4 operation is blocked by the "reload_into_ims on" config directive
>> which turns these force-reload (Cache-Control:no-cache) requests into
>> If-Modified-Since (IMS) requests.
>>
>> Under correct protocol operation they should be TCP_MISS responses.
>
> Exactly. This problem exists only when delivering TCP_HITs.
> TCP_REFRESH_UNMODIFIED for instance works fine.
>
> As mentioned in conf comments reload_into_ims is used only to make the
> problem reproduction easier - having multiple TCP_HITs in short period
> of time somehow trigger it.
>
> Same scenario works perfectly fine when using aufs.
>
>>>> During this (and every next) reload few (1-3-5) random requests out of
>>> all 35+ objects will enter into this strange state, until the user
>>> aborts the connection.
>>>
>>> Access.log excerpts:
>>> rock: http://pastebin.com/QWTgqRTD
>>> aufs: http://pastebin.com/0WGdHPP9
>>>
>>> Pay attention to '===[]===' marks in the logs, marking user's actions.
>>>
>>> Please help! Can you reproduce the problem or it's somehow related to
>>> my environment?
>>
>>
>> I note that you are only waiting 5-10 seconds between teh refresh and
>> escape. While this may be long enough, does the trace show any
>> difference if you leave it for a much longer time such as 30 or 60 seconds?
>
> I have seen connections in that state waiting for hours (say 1 night).
> Just did 30 minutes test:
>
> 1390989136.532 1910610 172.16.101.252 TCP_HIT_ABORTED/000 0 GET
> http://www.w3.org/2008/site/css/minimum - HIER_NONE/- -
>
> Checked active_connections for this same connection just before
> hitting ESC. Here's what Squid reports:
>
> Connection: 0x28e8be8
> FD 22, read 322, wrote 0
> FD desc: Reading next request
> in: buf 0x28eb9b0, offset 0, size 4096
> remote: 172.16.101.252:58477
> local: 192.168.1.10:3128
> nrequests: 1
> uri http://www.w3.org/2008/site/css/minimum
> logType TCP_HIT
> out.offset 0, out.size 0
> req_sz 322
> entry 0x2935d50/DC7BF6E1F61DA763BBCBD0043D4281A4
> start 1390987225.923229 (1886.649296 seconds ago)
> username -
>
> Hope this helps!
>
> Best,
> Niki