On 07/13/2016 11:02 PM, cs@xxxxxxxxxx wrote:
On 13Jul2016 22:03, Mike Wright <nobody@xxxxxxxxxxxxxxxxxxxx> wrote:
OK, thanks everybody.
Had to use egrep. This works:
PATTERN='https?://[^/]*\.in(/.*)*'
egrep $PATTERN file.of.links > links.in
You need quotes around $PATTERN when you use it, thus:
egrep "$PATTERN" file.of.links > links.in
Arrrgh! I'd sloppily lost the double quotes during a cut and paste.
They've been restored.
You may be getting away with it here, but another pattern may well be
broken up by the shell on whitespace. Not to mention globbing (unquoted
askerisks and question marks, etc).
Covers cases with https and where nothing follows the .in
Your:
(/.*)*
is better written:
(/.*)?
i.e. it is there or it is not. As it happens the "*" form you used will
be matched as efficiently in this case, but there are plenty of patterns
where using "*" instead of something more constrained can lead to
exponential cost as the regexp engine tries many many more combinations
as it attempts to match. Always write these things as
pickily/conservatively as possible.
Makes sense. Exponential earnings = good, costs = bad ;)
The other nit is that you should use $lowercase variable names in the
shell instead of $UPPERCASE names for script local variables which you
do not intend to export. This is a good practice thing, but quite
important for reasons I can explain at length is requested.
Duly noted. I got bit by that yesterday when I stepped on a system
level variable.
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://lists.fedoraproject.org/admin/lists/users@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org