OK, Amos. Completely agree with your points. I din't want to enter into such lengthy discussions regarding a small optional feature, that brings little CPU optimisation. As I said earlier, I don't mind rewriting same URL twice (once on HTCP, then on HTTP request). Peace! :-) Let's discuss working solutions for: a/ No StoreID is used outside Squid b/ StoreID normalization on incoming ICP/HTCP requests c/ false-negative HTTP revalidation Best, Niki On Fri, Feb 14, 2014 at 5:20 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: > On 14/02/2014 2:20 p.m., Nikolai Gorchilov wrote: >> On Fri, Feb 14, 2014 at 2:04 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: >>> On 2014-02-14 09:04, Alex Rousskov wrote: >>>> >>>> On 02/13/2014 05:11 AM, Nikolai Gorchilov wrote: >>>> >>>>> I'd suggest first to review all possible StoreID use cases involving >>>>> cache peers before proceeding further. >>>>> >>>>> Let's define A as originating proxy and B - as the next hop proxy in >>>>> the request forwarding chain. UDP is alias for both ICP or HTCP query, >>>>> while TCP is synonym of the following HTTP request. >>>>> >>>>> Here are all valid usage scenarios I could think of: >>>>> 1. A & B use same StoreID rewiring logic >>>>> - No StoreID processing for incoming UDP on B is necessary >>>>> - UDP request uses StoreID >>>>> - TCP request uses URL >>>>> 2. A & B use different StoreID rewriting logic >>>>> - StoreID processing on incoming UDP on B >>>>> - UDP request uses URL >>>>> - TCP request uses URL >>>>> 3. A with StoreID enabled, B - disabled >>>>> - UDP request uses URL >>>>> - TCP request uses URL >>>>> 4. A with StoreIID disabled, B - enabled >>>>> - StoreID processing on incoming UDP on B >>>>> - UDP request uses URL >>>>> - TCP request uses URL >>>>> >>>>> In order to support all of the above we need the following two config >>>>> options: >>>>> - configuration switch to enable or disable StoreID processing on >>>>> incoming UDP >>>>> - cache_peer option to enable/disable querying the respective peer >>>>> using StoreID instead of URL >>>> >>>> >>>> >>>>> If you see any rifts in the above logic, please say. >>>> >>>> >>>> I question the value of supporting the implied "no StoreID processing" >>>> optimization above. AFAICT, if Squid always uses URLs for anything >>>> outside internal storage, everything would work correctly and all use >>>> cases will be supported well, without any additional options. >>>> >>>> If somebody wants to extend ICP/HTCP to include StoreId in the request >>>> (as an optional additional field), they may do so, but that optional >>>> optimization does not change the overall design principle: StoreId for >>>> the internal storage; URL for everything else. >>> >>> >>> Exactly. >>> >>> >>> Keeping two distinct cache_peer internal index representations in-sync with >>> regards to how some helper service is producing the IDs is not as trivial a >>> job as implied by the proposal. >>> Consider the process of upgrading either Squid or the helper on server A >>> simply *10 seconds* earlier than server B. For that period one of the >>> services may be pushing garbage cache IDs into the other. In that same time >>> the latest Squid could process several thousand requests - not exactly a >>> trivial amount of cache churn. >> >> UDP requests doesn't push anything. They just check if the peer has an >> object. If wrong (not in sync) cache ID is used - not a big deal. >> UDP_MISS response will be generated. And the originating peer will >> decide what to do next. > > But during this period there will be that huge amount of false-negative > results. Causing a desync in the frntend proxy as it believes either > that the object is not cached (adding to its own cache and bumping out > other existing content), or to fetch via some other route (possibly > causing cache of alternative path to churn). > > Either way its a waste of resources and work just so a small > optimization can take place in IPC/HTCP packet handling. Since chances > are high that the expensive store-ID lookup in the peer will be > short-circuited by the helper response cache anyway. > > >> >>> Also, the connection between those peers is not necessarily a direct 1-hop >>> connection. It may involve any kind of HTTP interception software >>> (firewalls, deep packet inspectors, etc) overlooked by even the most well >>> intended administrator. >> >> We're talking ICP/HTCP here. HTTP request shall always go with URL.... > > You just made the mistake of assuming "HTTP interceptin software" means > TCP. It does not. > > HTTP is transported over both TCP and UDP. HTCP for example has full > headers and is used at times for cache invalidation. Then there is the > COAP protocols. > >> >> I really don't understand your logic. Both you and Alex seem to be OK >> with the fact Squid is using StoreID for during HTTP with cache peers >> (let's call it "known limitation"), but using StoreID for ICP/HTCP >> queries is considered a bug that needs a fix. > > No. We are both *not okay* with using StoreID for the HTTP requests > between peers. > > Alex said the overall design principle: > "StoreId for the internal storage; URL for everything else." > > > "internal storage" != HTTP. > > >> >> For me it's quite the opposite - StoreID over HTTP shall be fixed >> ASAP, StoreID over ICP/HTCP shall be considered "known limitation". > > There is no "StoreID over X", never was. > > StoreID leaving the Squid instance in traffic is a bug. > > The known limitation for the StoreID model is that it leads to a high > false-negative rate for HTTP revalidation. > It causes a disconnect between the original request used to cache the > object and the current request. So ETag header for the cached objct does > not always match the current requested URL and causes a refresh update > with new content. > > Amos >