I just made a new discovery - the problem seemingly disappears when running Squid with -N, thus removing the disker process out of the equation. Could it be a worker-disker IPC related issue, instead of rock store specific problem? Best, Niki On Wed, Jan 29, 2014 at 12:00 PM, Nikolai Gorchilov <niki@xxxxxxxx> wrote: > Dear Amos > > On Wed, Jan 29, 2014 at 5:18 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: >>> >>> The simple steps to reproduce: >>> >>> 1. Empty store dir and recreate swap with -z >> >> Did you leave sufficient time after doing this to ensure that the -z >> operation was completed successfully and all the "squid -z" spawned >> processes stopped? > > Yes. > >>> 2. Start Squid with the above config >>> 3. Open a new private browser session (or clear browser cache) and >>> request www.w3.org. Most of the 35+ requested items will get cached. >>> 4. Once the page loads fully, request full reload Ctrl-Shift-R/Cmd-Shift-R >> >> Step 4 operation is blocked by the "reload_into_ims on" config directive >> which turns these force-reload (Cache-Control:no-cache) requests into >> If-Modified-Since (IMS) requests. >> >> Under correct protocol operation they should be TCP_MISS responses. > > Exactly. This problem exists only when delivering TCP_HITs. > TCP_REFRESH_UNMODIFIED for instance works fine. > > As mentioned in conf comments reload_into_ims is used only to make the > problem reproduction easier - having multiple TCP_HITs in short period > of time somehow trigger it. > > Same scenario works perfectly fine when using aufs. > >>>> During this (and every next) reload few (1-3-5) random requests out of >>> all 35+ objects will enter into this strange state, until the user >>> aborts the connection. >>> >>> Access.log excerpts: >>> rock: http://pastebin.com/QWTgqRTD >>> aufs: http://pastebin.com/0WGdHPP9 >>> >>> Pay attention to '===[]===' marks in the logs, marking user's actions. >>> >>> Please help! Can you reproduce the problem or it's somehow related to >>> my environment? >> >> >> I note that you are only waiting 5-10 seconds between teh refresh and >> escape. While this may be long enough, does the trace show any >> difference if you leave it for a much longer time such as 30 or 60 seconds? > > I have seen connections in that state waiting for hours (say 1 night). > Just did 30 minutes test: > > 1390989136.532 1910610 172.16.101.252 TCP_HIT_ABORTED/000 0 GET > http://www.w3.org/2008/site/css/minimum - HIER_NONE/- - > > Checked active_connections for this same connection just before > hitting ESC. Here's what Squid reports: > > Connection: 0x28e8be8 > FD 22, read 322, wrote 0 > FD desc: Reading next request > in: buf 0x28eb9b0, offset 0, size 4096 > remote: 172.16.101.252:58477 > local: 192.168.1.10:3128 > nrequests: 1 > uri http://www.w3.org/2008/site/css/minimum > logType TCP_HIT > out.offset 0, out.size 0 > req_sz 322 > entry 0x2935d50/DC7BF6E1F61DA763BBCBD0043D4281A4 > start 1390987225.923229 (1886.649296 seconds ago) > username - > > Hope this helps! > > Best, > Niki