Re: Sed, awk? [solved]

Martin Stricker <shugal@xxxxxx> · Fri, 19 Dec 2003 23:33:31 +0100

Jake McHenry wrote:

> As far as the 1024 arg limit: I'm running the stock RH9 kernel, so
> this may be a problem. Could you explain further how I could use
> xargs to do the same?

command1 | xargs command2
Whatever command1 writes to standard output is piped to xarg's standard
input and then converted to an argument list for command2. xargs knows
about the argument list length restriction, thus the argument list will
only grow as long as allowed, then xargs starts another process of
command2 with more of command1's output (and so on until all output of
command1 has been passed to command2). Basically, xargs reads from the
standard input and converts that to an argument list for another
command.

I'm not writing directly to a file, inside the spam.spam and
> spam.ham files is the command sa-learn which reads all of the
> arguments passed into it as filenames. So I'm still working with
> the limitation unless I rewrite sa-learn, correct?

Correct. But usually (well-written) programs can take input from several
sources, like a file (as you mention in the other mail below) and
standard input. Many programs know that they should read from standard
input when the very last option they are passed is just a hyphen - . So
try something like
sa-learn --spam --(other args) -

If sa-learn does write to a file, you will get into trouble using xargs
(and it doesn't matter that you call sa-learn within a script): If your
argument list gets too long, xargs will spawn another sa-learn process
(be it directly or through a script). If you have more than one process
writing to the same file, you cannot predict what will happen! (except
if you use thorough file locking, but that's hard with scripts)

> I was just looking through the sa-learn docs and it says I can read
> filenames from a file. This was said to be a better way of handling
> this?
>
> -f file, --folders=file  Read list of files/directories from file

To write to a file instead of directly to your program, "pipe" the
output to a file (this is called output redirection. Input redirection
works similarly with < instead of >, so the file is used as input
instead of standard input):
command > output.txt
This is the original command, writing to standard output:
grep -R 'email' * | awk -F: '{print $1}' | uniq | xargs echo -n
This will write one line to the file named spam in the current
directory:
grep -R 'email' * | awk -F: '{print $1}' | uniq | xargs echo -n > spam
Normally if a program is reading a file, it expects one argument per
line in the file, so echo -n is not needed (it just removes newlines),
and therefore xargs is neither (it just prevents the argument list for
echo -n from overflowing). So you just use
grep -R 'email' * | awk -F: '{print $1}' | uniq > spam
and change spam.spam so sa-learn will be used as
sa-learn -f spam

Best regards,
Martin Stricker
-- 
Homepage: http://www.martin-stricker.de/
Linux Migration Project: http://www.linux-migration.org/
Red Hat Linux 9 for low memory: http://www.rule-project.org/
Registered Linux user #210635: http://counter.li.org/

-- 
Shrike-list mailing list
Shrike-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/shrike-list