Hi again...
After some more research it looks like squid only has access to the url domain if it's HTTPS and the only way to get the url path and query string is to use ssl_bump to decrypt https so squid can see url path and query arguments.
To use ssl_bump, I have to compile the code from source with --enable-ssl, create a certificate, and add it to the chain of certs to every other vm that proxies through squid, then squid can decrypt the https urls to see paths and query args and finally apply the regex to those urls in order to only allow explicit regex urls.
Is this correct?
On Mon, Oct 15, 2018 at 11:56 AM RB <ronthecon@xxxxxxxxx> wrote:
I think I know what the issue is which can give us a clue to what is going on.2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 0The above seems to be applying the regex to "wiki.squid-cache.org:443" instead of to "https://wiki.squid-cache.org/SquidFaq/SquidAcl". I added the regex ".*squid-cache.org.*" to my list of regular expressions and now I see this.2018/10/15 15:16:03.641 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'2018/10/15 15:16:03.641 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)'2018/10/15 15:16:03.641 kid1| RegexData.cc(93) match: aclRegexData::match: match '(^https?://[^/]+/wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org.*)' found in 'wiki.squid-cache.org:443'2018/10/15 15:16:03.641 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 1Any idea why url_regex wouldn't try to match the full url and instead only matches on the subdomain, host domain, and port?The Squid FAQ says the following:url_regex: URL regular _expression_ pattern matchingurlpath_regex: URL-path regular _expression_ pattern matching, leaves out the protocol and hostnamewith this example givenacl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$This seems to be the case between 3.3.8 (default on ubuntu 14.04) and 3.5.12 (default on ubuntu 16.04).Is there another configuration that forces url_regex to match the entire url? or should I use a different acl type?Best,On Mon, Oct 15, 2018 at 11:11 AM RB <ronthecon@xxxxxxxxx> wrote:Hi Matus,Thanks for responding so quickly. I uploaded my configurations here if that is more helpful: https://bit.ly/2NF4zNbThe config that I previously shared is called squid_corp.conf. I also noticed that if I don't use regular expressions and instead use domains, it works correctly:# acl whitelist url_regex "/vagrant/squid_sites.txt"acl whitelist url_regex .squid-cache.orgEvery time my squid.conf or my squid_sites.txt is modified, I restart the squid servicesudo service squid3 restartThen I use curl to test and now the url works.$ curl -sSL --proxy localhost:3128 -D - https://wiki.squid-cache.org/SquidFaq/SquidAcl -o /dev/null 2>&1HTTP/1.1 200 Connection establishedHTTP/1.1 200 OKDate: Mon, 15 Oct 2018 14:47:33 GMTServer: Apache/2.4.7 (Ubuntu)Vary: Cookie,User-Agent,Accept-EncodingContent-Length: 101912Cache-Control: max-age=3600Expires: Mon, 15 Oct 2018 15:47:33 GMTContent-Type: text/html; charset=utf-8But this does not allow me to get more granular. I can only allow all subdomains and paths for the domain squid-cache.org but I'm unable to only allow the regular expressions if I put them inline or put them in squid_sites.txt.# acl whitelist url_regex "/vagrant/squid_sites.txt"acl whitelist url_regex ^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*acl whitelist url_regex .*squid-cache.org/SquidFaq/SquidAcl.*If I put them inline like I have above, when I restarted squid it says the following2018/10/15 14:54:48 kid1| strtokFile: .*squid-cache.org/SquidFaq/SquidAcl.* not foundIf I put the expressions in the squid_sites.txt the above "not found" message isn't shown and this is the debug output in /var/log/squid3/cache.log (full output https://pastebin.com/NVwRxVmQ).2018/10/15 15:05:45.083 kid1| Checklist.cc(275) matchNode: 0x7fb0068da2b8 matched=1 async=0 finished=02018/10/15 15:05:45.083 kid1| Acl.cc(336) matches: ACLList::matches: checking whitelist2018/10/15 15:05:45.083 kid1| Acl.cc(319) checklistMatches: ACL::checklistMatches: checking 'whitelist'2018/10/15 15:05:45.083 kid1| RegexData.cc(71) match: aclRegexData::match: checking 'wiki.squid-cache.org:443'2018/10/15 15:05:45.084 kid1| RegexData.cc(82) match: aclRegexData::match: looking for '(^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*)|(squid-cache.org/SquidFaq/SquidAcl.*)'2018/10/15 15:05:45.084 kid1| Acl.cc(321) checklistMatches: ACL::ChecklistMatches: result for 'whitelist' is 02018/10/15 15:05:45.084 kid1| Acl.cc(349) matches: whitelist mismatched.2018/10/15 15:05:45.084 kid1| Acl.cc(354) matches: whitelist result is falseSo it's failing the regular _expression_ check. If I use grep to verify if the regex works, it does.> are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?Ah interesting. Are you saying that my https connections will always fail unless I use ssl_bump to decrypt https to http connections? How would this work correctly in production? Does squid proxy only block urls if it detects http? How do you configure ssl_bump to work in this case? and is that viable in production?> of course it matches all, everything should match "all".
> I more wonder why doesn't it match "http_access allow localhost"> have you reloaded squid config after changing it?
> Did squid confirm it?Would you have an example of one entire config file that would work to whitelist an http/https url using a regular _expression_?Best,On Mon, Oct 15, 2018 at 4:49 AM Matus UHLAR - fantomas <uhlar@xxxxxxxxxxx> wrote:KOn 15.10.18 01:04, RB wrote:
>I'm trying to deny all urls except for only whitelisted regular
>expressions. I have only this regular _expression_ in my file
>"squid_sites.txt"
>
>^https://wiki.squid-cache.org/SquidFaq/SquidAcl.*
are you aware that you can only see CONNECT in https requests, unless using
ssl_bump?
>acl bastion src 10.5.0.0/1
>acl whitelist url_regex "/vagrant/squid_sites.txt"
[...]
>http_access allow manager localhost
>http_access deny manager
>http_access deny !Safe_ports
>http_access allow localhost
>http_access allow purge localhost
>http_access deny purge
>http_access deny CONNECT !SSL_ports
>
>http_access allow bastion whitelist
>http_access deny bastion all
>I tried enabling debugging and tailing /var/log/squid3/cache.log but my
>curl statement keeps matching "all".
of course it matches all, everything should match "all".
I more wonder why doesn't it match "http_access allow localhost"
>$ curl -sSL --proxy localhost:3128 -D - "
>https://wiki.squid-cache.org/SquidFaq/SquidAcl" -o /dev/null 2>&1 | grep
>Squid
>X-Squid-Error: ERR_ACCESS_DENIED 0
>Any ideas what I'm doing wrong?
have you reloaded squid config after changing it?
Did squid confirm it?
--
Matus UHLAR - fantomas, uhlar@xxxxxxxxxxx ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
It's now safe to throw off your computer.
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users