Re: Fwd: rock store: a bug or ...?

Alex Rousskov <rousskov@xxxxxxxxxxxxxxxxxxxxxxx> · Wed, 12 Mar 2014 22:46:38 -0600

On 01/29/2014 03:26 AM, Nikolai Gorchilov wrote:
> I just made a new discovery - the problem seemingly disappears when
> running Squid with -N, thus removing the disker process out of the
> equation.
> 
> Could it be a worker-disker IPC related issue, instead of rock store
> specific problem?

All worker-disker IPC related issues are rock-specific today because
other storage modules (i.e., cache_dir types) cannot use diskers (yet?).

Cheers,

Alex.

> On Wed, Jan 29, 2014 at 12:00 PM, Nikolai Gorchilov <niki@xxxxxxxx> wrote:
>> Dear Amos
>>
>> On Wed, Jan 29, 2014 at 5:18 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote:
>>>>
>>>> The simple steps to reproduce:
>>>>
>>>> 1. Empty store dir and recreate swap with -z
>>>
>>> Did you leave sufficient time after doing this to ensure that the -z
>>> operation was completed successfully and all the "squid -z" spawned
>>> processes stopped?
>>
>> Yes.
>>
>>>> 2. Start Squid with the above config
>>>> 3. Open a new private browser session (or clear browser cache) and
>>>> request www.w3.org. Most of the 35+ requested items will get cached.
>>>> 4. Once the page loads fully, request full reload Ctrl-Shift-R/Cmd-Shift-R
>>>
>>> Step 4 operation is blocked by the "reload_into_ims on" config directive
>>> which turns these force-reload (Cache-Control:no-cache) requests into
>>> If-Modified-Since (IMS) requests.
>>>
>>> Under correct protocol operation they should be TCP_MISS responses.
>>
>> Exactly. This problem exists only when delivering TCP_HITs.
>> TCP_REFRESH_UNMODIFIED for instance works fine.
>>
>> As mentioned in conf comments reload_into_ims is used only to make the
>> problem reproduction easier - having multiple TCP_HITs in short period
>> of time somehow trigger it.
>>
>> Same scenario works perfectly fine when using aufs.
>>
>>>>> During this (and every next) reload few (1-3-5) random requests out of
>>>> all 35+ objects will enter into this strange state, until the user
>>>> aborts the connection.
>>>>
>>>> Access.log excerpts:
>>>> rock: http://pastebin.com/QWTgqRTD
>>>> aufs: http://pastebin.com/0WGdHPP9
>>>>
>>>> Pay attention to '===[]===' marks in the logs, marking user's actions.
>>>>
>>>> Please help! Can you reproduce the problem or it's somehow related to
>>>> my environment?
>>>
>>>
>>> I note that you are only waiting 5-10 seconds between teh refresh and
>>> escape. While this may be long enough, does the trace show any
>>> difference if you leave it for a much longer time such as 30 or 60 seconds?
>>
>> I have seen connections in that state waiting for hours (say 1 night).
>> Just did 30 minutes test:
>>
>> 1390989136.532 1910610 172.16.101.252 TCP_HIT_ABORTED/000 0 GET
>> http://www.w3.org/2008/site/css/minimum - HIER_NONE/- -
>>
>> Checked active_connections for this same connection just before
>> hitting ESC. Here's what Squid reports:
>>
>> Connection: 0x28e8be8
>> FD 22, read 322, wrote 0
>> FD desc: Reading next request
>> in: buf 0x28eb9b0, offset 0, size 4096
>> remote: 172.16.101.252:58477
>> local: 192.168.1.10:3128
>> nrequests: 1
>> uri http://www.w3.org/2008/site/css/minimum
>> logType TCP_HIT
>> out.offset 0, out.size 0
>> req_sz 322
>> entry 0x2935d50/DC7BF6E1F61DA763BBCBD0043D4281A4
>> start 1390987225.923229 (1886.649296 seconds ago)
>> username -
>>
>> Hope this helps!
>>
>> Best,
>> Niki