RE: Sed, awk? [solved]

"Jake McHenry" <linux@xxxxxxxxxxxxxxxxx> · Mon, 22 Dec 2003 10:58:24 -0500

> -----Original Message-----
> From: shrike-list-admin@xxxxxxxxxx 
> [mailto:shrike-list-admin@xxxxxxxxxx] On Behalf Of Martin Stricker
> Sent: Friday, December 19, 2003 5:34 PM
> To: shrike-list@xxxxxxxxxx
> Subject: Re: Sed, awk? [solved]
> 
> 
> Jake McHenry wrote:
> 
> > As far as the 1024 arg limit: I'm running the stock RH9 kernel, so

> > this may be a problem. Could you explain further how I 
> could use xargs 
> > to do the same?
> 
> command1 | xargs command2
> Whatever command1 writes to standard output is piped to 
> xarg's standard input and then converted to an argument list 
> for command2. xargs knows about the argument list length 
> restriction, thus the argument list will only grow as long as 
> allowed, then xargs starts another process of command2 with 
> more of command1's output (and so on until all output of 
> command1 has been passed to command2). Basically, xargs reads 
> from the standard input and converts that to an argument list 
> for another command.
> 
> I'm not writing directly to a file, inside the spam.spam and
> > spam.ham files is the command sa-learn which reads all of the 
> > arguments passed into it as filenames. So I'm still working 
> with the 
> > limitation unless I rewrite sa-learn, correct?
> 
> Correct. But usually (well-written) programs can take input 
> from several sources, like a file (as you mention in the 
> other mail below) and standard input. Many programs know that 
> they should read from standard input when the very last 
> option they are passed is just a hyphen - . So try something 
> like sa-learn --spam --(other args) -
> 
> If sa-learn does write to a file, you will get into trouble 
> using xargs (and it doesn't matter that you call sa-learn 
> within a script): If your argument list gets too long, xargs 
> will spawn another sa-learn process (be it directly or 
> through a script). If you have more than one process writing 
> to the same file, you cannot predict what will happen! 
> (except if you use thorough file locking, but that's hard 
> with scripts)
> 
> > I was just looking through the sa-learn docs and it says I can
read 
> > filenames from a file. This was said to be a better way of
handling 
> > this?
> >
> > -f file, --folders=file  Read list of files/directories from file
> 
> To write to a file instead of directly to your program, 
> "pipe" the output to a file (this is called output 
> redirection. Input redirection works similarly with < instead 
> of >, so the file is used as input instead of standard 
> input): command > output.txt This is the original command, 
> writing to standard output: grep -R 'email' * | awk -F: 
> '{print $1}' | uniq | xargs echo -n This will write one line 
> to the file named spam in the current
> directory:
> grep -R 'email' * | awk -F: '{print $1}' | uniq | xargs echo 
> -n > spam Normally if a program is reading a file, it expects 
> one argument per line in the file, so echo -n is not needed 
> (it just removes newlines), and therefore xargs is neither 
> (it just prevents the argument list for echo -n from 
> overflowing). So you just use grep -R 'email' * | awk -F: 
> '{print $1}' | uniq > spam and change spam.spam so sa-learn 
> will be used as sa-learn -f spam
> 
> Best regards,
> Martin Stricker
> -- 
> Homepage: http://www.martin-stricker.de/
> Linux Migration Project: http://www.linux-migration.org/
> Red Hat Linux 9 for low memory: http://www.rule-project.org/ 
> Registered Linux user #210635: http://counter.li.org/
> 
> 
> -- 
> Shrike-list mailing list
> Shrike-list@xxxxxxxxxx 
> https://www.redhat.com/mailman/listinfo/shrike-list
> 

Just to let everyone know. I ran into problems using :

  grep -R -i "$email" * | awk -F: '{print $1}' | uniq | xargs echo -n

After around 200 args it would run some of them together. So I
switched to :

  grep -R -i "$email" * | cut -f 1 -d : | tr \\012 \\40

This works without any problems so far, just ran a query with 2354
args.. So I'm not sure where the 1024 arg limit idea came from.

Learned from 52 message(s) (2354 message(s) examined).
0.000          0          2          0  non-token data: bayes db
version
0.000          0       9632          0  non-token data: nspam
0.000          0      17887          0  non-token data: nham
0.000          0     146602          0  non-token data: ntokens
0.000          0 1070679408          0  non-token data: oldest atime
0.000          0 1072108196          0  non-token data: newest atime
0.000          0 1072108315          0  non-token data: last journal
sync atime
0.000          0 1072061149          0  non-token data: last expiry
atime
0.000          0    1382400          0  non-token data: last expire
atime delta
0.000          0       8307          0  non-token data: last expire
reduction count

I was using xargs for a while but ran into problems using this, just
as someone had said, it spawned off a new process and sa-learn didn't
like that very much. I haven't tried inputting from a file as of yet.
If the cut and tr method works for now, I will use that untill I run
into some more problems.

Thanks for all the help everyone gave.

Jake

-- 
Shrike-list mailing list
Shrike-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/shrike-list