On Tue, 31 May 2011 20:47:13 +0300, Ghassan Gharabli wrote:
Im sorry again for the last email but I also have something to ask
for ..
(m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
now Im talking about this element ([\w\d]{2,4}) which seems to match
.ex , .ext or .exte for example .mp3
I understand that \w matches an alphanumeric character, including "_"
same as [A-Za-z0-9_] in ASCII
that I know it finds for numbers , letters including underscore ..
which is correct here but the thing that is confusing ot me
also we have used \d which finds for matches a digit same as [0-9] in
ASCII.. so we have used 0-9 twice! any comment about it?
No idea. As you say, it seems to be redundant.
Im also seeing these urls again
#generic http://variable.domain.com/path/filename."ex", "ext" or
"exte"
#http://cdn1-28.projectplaylist.com
#http://s1sdlod041.bcst.cdn.s1s.yimg.com
^ means that we matches the beginning of a line or string.
m/^http:\/\/ ... we used at the start (.*?) which seems to be to find
anything !
Yes.
If we want to look at this url ;
#http://s1sdlod041.bcst.cdn.s1s.yimg.com
If Im correct then (.*?) means to match "s1sdlod041" and then the
second element(\.[^\.\-]*?\..*?) we moved to . after
"s1sdlod041" so nw we have "http://s1sdlod041." but I want to know
how
about "[^\.\-]*?\..*?" like [] or we used ^ for \. and \-
coz we are also finding dashes or dots .. after that we used "*"
anything! and then Question Mark "?" .. something also confusing to
me
"\.." or "\..*?" .
(.*?) should match the whole: "s1sdlod041.bcst.cdn.s1s" or
"evil.com/?url=http://blah". Then...
Maybe a bug: this should probably be: ([\w\-\.]?) to avoid that OR.
(\.[^\.\-]*?\..*?) matches: "yimg.com" or "yimg.com/blah/blah". Then...
Maybe a bug: this should probably be: (\.[^\.\-]*?\.[\w]*?) to avoid
that OR and make the next bit match the whole path instead of filename.
\/ matches a "/". Then...
([^\?\&\=]*) matches "filename" or nothing. Then...
\. matches a ".". Then...
([\w\d]{2,4}) matches some alphanumeric 2-4 bytes long. Then...
\?? matches a '?' or nothing. Then...
.*$ matches anything else.
Maybe a bug: these late two should probably be: (\?.*)?$ to avoid a
lot more evilness.
another question to ask for ([^\?\&\=]*) umm I think this one is for
folders or what ?...
as I saw the slash \/ before it .. which seems to catch
/?url=blah&C=blah2 and the "*" matches "blah" and "bla2"
but please if you dont mind then you can explain or illustrate more
about (\.[^\.\-]*?\..*?) or maybe you can explain it well
see above.
using your way as Im sure you are a good teacher hehehe
Please explain the whole match to me
(m/^http:\/\/(.*?)(\.[^\.\-]*?\..*?)\/([^\?\&\=]*)\.([\w\d]{2,4})\??.*$/)
above.
I was eager to ask you all these questions from the start but I was
afraid thinking you'll not help anyway
that what I was trying to go so far is FileHippo domain
http://fs34.filehippo.com/6574/058e5771e07c467cb38d70ab6fbed3c0/Opera_1150b1_int_Setup.exe
in this case we have to try to change the domain into
"cdn.filehippo.com/6574/Opera_1150b1_int_Setup.exe" because we
removed
the hashed folder!
Its okay I have the script for it
#cdn, varialble 1st path
} elsif (($u =~ /filehippo/) &&
(m/^http:\/\/(.*?)\.(.*?)\/(.*?)\/(.*)\.([a-z0-9]{3,4})(\?.*)?/)) {
@y = ($1,$2,$4,$5);
$y[0] =~ s/[a-z0-9]{2,5}/cdn./;
print $x . "http://" . $y[0] . $y[1] . "/" . $y[2] . "." . $y[3] .
"\n";
and its working 100% . I can get it from cache too .. what if I want
to add wlxrs.com into ($u =~ /filehippo|wlxrs/)
does that match this URL?
http://css.wlxrs.com/HGjlAVvMlW6-1!iEEpuBkgo2TZKpU8RH!W4mH-UPgteZ8OD6Oxte!sCQWfQ1OB7A6B-NZoBS1jrItq7zq!v10A/OOB_30_IllustratedKai/15.40.1211/img/Kai_Sunny_thumbnail.jpg
I dont think so as it has "!" where should I add this one to match a
folder like
"/HGjlAVvMlW6-1!iEEpuBkgo2TZKpU8RH!W4mH-UPgteZ8OD6Oxte!sCQWfQ1OB7A6B-NZoBS1jrItq7zq!v10A/"
It will. The "([^\?\&\=]*)" pattern does not prevent '!' or any other
valid weird characters.
sometimes the CDN folder comes at the 1st folder or 2nd or 3rd ..
deopends on any website.
Yes. This is back to the knowing fine details about what the individual
website or CDN. The changes done have to be customised to individual
sites. If they change anything you have to alter the patterns.
can you lead me where should I find or edit this script to follow
WLXRS.COM
The second maybe-bug I pointed out before, when fixed should make $3
have the whole file path for you to play with.
Amos