airplays55@xxxxxxxxx wrote: > I checked out the squid log analyzer programs, But > haven't found one that can provide a sample output > like what I need to see on the report. > Say for example I go to microsoft.com, click on > "products", then click on "visual studio .NET" > I'd like to see this in the logfile: > http://www.microsoft.com > http://www.microsoft.com/products > http://www.microsoft.com/products/visual_studio > This is a theoretical example as if those are the > actual URL locations typed into the address bar, or > clicked via hyperlink. > I don't see how the access.log can be used to provide > this kind of report. In this case the initial request seen by Squid (and logged in access.log) will be the URL typed into the address bar. Any additional content or redirects will be shown after. > For example, if I simply type microsoft.com in my > address bar and click on "office" in the left pane, > then check my access.log, I see 35 entries have been > added just by clicking the "office" link once. The first one will be for the page the hyperlink points to, and the rest will be for any redirects and/or additional content needed for the page. > the access.log doesn't seem to differentiate between what > the user clicked, and what the webpage requested to > display the whole page correctly. Because Squid doesn't see what the user clicked (in this case, "Office") - Squid sees the URL the hyperlink points to (which is what the browser actually requests). > More specifically, the first 3 entries say: > > 127.0.0.1 - - [22/Jan/2005:15:56:31 -0500] "GET > http://g.microsoft.com/mh_mshp/2 HTTP/1.1" 301 538 > TCP_MISS:DIRECT If you check in the browser, this is the URL the "Office" hyperlink points to. Again, Squid sees requested URLs, not how the hyperlink was displayed to the user by the browser. In this case, the HTTP status is 301, which means this is a redirect. > 127.0.0.1 - - [22/Jan/2005:15:56:32 -0500] "GET > http://office.microsoft.com/home/default.aspx > HTTP/1.1" 301 467 TCP_MISS:DIRECT This is another redirect. > 127.0.0.1 - - [22/Jan/2005:15:56:32 -0500] "GET > http://office.microsoft.com/en-us/default.aspx > HTTP/1.1" 200 52134 TCP_MISS:DIRECT The HTTP status code of 200 indicates that this is the page that was ultimately shown to the user. > I don't see how the access.log can be used to provide > this kind of report. It can't. All Squid sees (and logs) is a series of HTTP requests from the browser. It doesn't know how those requests were rendered by the browser. Also, I see you are using the Common Logfile format. I would really recommend you use the Squid native log format - most log analyzers can use both, and the Squid native log format provides a great deal more detail. > How is ANY logfile analyzer going to tell the > difference between the first entry (which the user > clicked on) and the second/third entries (which were > requested by the html from the first entry)? Perhaps by content-type and timing (look at the first text/html request in a series of requests within a small window of time from the same client). But there's no way to know with 100% certainty. If you need that level of certainty, you should be looking at the browser history and not your proxy logs. > Is there is a squid configuration parameter that will > allow the logs to be filtered appropriately? No - because what the browser sends to Squid and what the browser shows the client are two entirely different things. Again, for the information you want, the browser's history is the best place to look. Adam