On 13Jul2016 22:03, Mike Wright <nobody@xxxxxxxxxxxxxxxxxxxx> wrote:
OK, thanks everybody.
Had to use egrep. This works:
PATTERN='https?://[^/]*\.in(/.*)*'
egrep $PATTERN file.of.links > links.in
You need quotes around $PATTERN when you use it, thus:
egrep "$PATTERN" file.of.links > links.in
You may be getting away with it here, but another pattern may well be broken up
by the shell on whitespace. Not to mention globbing (unquoted askerisks and
question marks, etc).
Covers cases with https and where nothing follows the .in
Your:
(/.*)*
is better written:
(/.*)?
i.e. it is there or it is not. As it happens the "*" form you used will be
matched as efficiently in this case, but there are plenty of patterns where
using "*" instead of something more constrained can lead to exponential cost as
the regexp engine tries many many more combinations as it attempts to match.
Always write these things as pickily/conservatively as possible.
The other nit is that you should use $lowercase variable names in the shell
instead of $UPPERCASE names for script local variables which you do not intend
to export. This is a good practice thing, but quite important for reasons I can
explain at length is requested.
Cheers,
Cameron Simpson <cs@xxxxxxxxxx>
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://lists.fedoraproject.org/admin/lists/users@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org