On 30/05/11 00:22, Ghassan Gharabli wrote:
Hello,
I was trying to cache this website :
http://down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com/M15/Alaa_Zalzaly/Atrak/Nogomi.com_Alaa_Zalzaly-3ali_Tar.mp3
How do you cache or rewrite its uRL to static domain! :
down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp.nogomi.com
Does that URL matches this REGEX EXAMPLE or who can help me match this
Nogomi.com CDN?
#generic http://variable.domain.com/path/filename."ex", "ext" or "exte"
The line above describes what the 'm/' pattern produces for the $y array.
Well, kind of...
$1 is anything. utter garbage. could be a full worth of loose bits:
"http://evil.example.com/cache-poison?url=http://"
$2 appears to be a two-part domain name (ie "example.com" as opposed to
a three-part "www.example.com")
$3 is the file or script name.
$4 is the file extension type.
#http://cdn1-28.projectplaylist.com
#http://s1sdlod041.bcst.cdn.s1s.yimg.com
} elsif (m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
{
@y = ($1,$2,$3,$4);
$y[0] =~
s/([a-z][0-9][a-z]dlod[\d]{3})|((cache|cdn)[-\d]*)|([a-zA-A]+-?[0-9]+(-[a-zA-Z]*)?)/cdn/;
I assume you are trying to compress
"down2.nogomi.com.xn55571528exgem0o65xymsgtmjiy75924mjqqybp" down to
"cdn" without allowing any non-FQDN garbage to compress?
I would use: s/[a-z0-9A-Z\.\-]+/cdn/
and add a fixed portion to ensure that $y[1] is one of the base domains
in the CDN. Just in case some other site uses the same host naming scheme.
print $x . "storeurl://" . $y[0] . $y[1] . "/" . $y[2] . "." .
$y[3] . "\n";
I also tried to study more about REGULAR EXPRESSIONS but their
examples are only for simple URLS .. I really need to study more about
Complex URL .
Relax. You do not have to combine them all into one regex.
You can make it simple and efficient to start with and improve as your
knowledge does. If in doubt play it safe, storeurl_rewriting has at its
core the risk of XSS attack on your own clients (in the example above
$y[0] comes very close).
The hardest part is knowing for certain what all the parts of the URL
mean to the designers of that website. So that you only erase the
useless trackers and routing tags, while keeping everything important.
Amos
--
Please be using
Current Stable Squid 2.7.STABLE9 or 3.1.12
Beta testers wanted for 3.2.0.7 and 3.1.12.1