> -----Original Message----- > From: shrike-list-admin@xxxxxxxxxx > [mailto:shrike-list-admin@xxxxxxxxxx] On Behalf Of Martin Stricker > Sent: Friday, December 19, 2003 5:34 PM > To: shrike-list@xxxxxxxxxx > Subject: Re: Sed, awk? [solved] > > > Jake McHenry wrote: > > > As far as the 1024 arg limit: I'm running the stock RH9 kernel, so > > this may be a problem. Could you explain further how I > could use xargs > > to do the same? > > command1 | xargs command2 > Whatever command1 writes to standard output is piped to > xarg's standard input and then converted to an argument list > for command2. xargs knows about the argument list length > restriction, thus the argument list will only grow as long as > allowed, then xargs starts another process of command2 with > more of command1's output (and so on until all output of > command1 has been passed to command2). Basically, xargs reads > from the standard input and converts that to an argument list > for another command. > > I'm not writing directly to a file, inside the spam.spam and > > spam.ham files is the command sa-learn which reads all of the > > arguments passed into it as filenames. So I'm still working > with the > > limitation unless I rewrite sa-learn, correct? > > Correct. But usually (well-written) programs can take input > from several sources, like a file (as you mention in the > other mail below) and standard input. Many programs know that > they should read from standard input when the very last > option they are passed is just a hyphen - . So try something > like sa-learn --spam --(other args) - > > If sa-learn does write to a file, you will get into trouble > using xargs (and it doesn't matter that you call sa-learn > within a script): If your argument list gets too long, xargs > will spawn another sa-learn process (be it directly or > through a script). If you have more than one process writing > to the same file, you cannot predict what will happen! > (except if you use thorough file locking, but that's hard > with scripts) > > > I was just looking through the sa-learn docs and it says I can read > > filenames from a file. This was said to be a better way of handling > > this? > > > > -f file, --folders=file Read list of files/directories from file > > To write to a file instead of directly to your program, > "pipe" the output to a file (this is called output > redirection. Input redirection works similarly with < instead > of >, so the file is used as input instead of standard > input): command > output.txt This is the original command, > writing to standard output: grep -R 'email' * | awk -F: > '{print $1}' | uniq | xargs echo -n This will write one line > to the file named spam in the current > directory: > grep -R 'email' * | awk -F: '{print $1}' | uniq | xargs echo > -n > spam Normally if a program is reading a file, it expects > one argument per line in the file, so echo -n is not needed > (it just removes newlines), and therefore xargs is neither > (it just prevents the argument list for echo -n from > overflowing). So you just use grep -R 'email' * | awk -F: > '{print $1}' | uniq > spam and change spam.spam so sa-learn > will be used as sa-learn -f spam > > Best regards, > Martin Stricker > -- > Homepage: http://www.martin-stricker.de/ > Linux Migration Project: http://www.linux-migration.org/ > Red Hat Linux 9 for low memory: http://www.rule-project.org/ > Registered Linux user #210635: http://counter.li.org/ > > > -- > Shrike-list mailing list > Shrike-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/shrike-list > Just to let everyone know. I ran into problems using : grep -R -i "$email" * | awk -F: '{print $1}' | uniq | xargs echo -n After around 200 args it would run some of them together. So I switched to : grep -R -i "$email" * | cut -f 1 -d : | tr \\012 \\40 This works without any problems so far, just ran a query with 2354 args.. So I'm not sure where the 1024 arg limit idea came from. Learned from 52 message(s) (2354 message(s) examined). 0.000 0 2 0 non-token data: bayes db version 0.000 0 9632 0 non-token data: nspam 0.000 0 17887 0 non-token data: nham 0.000 0 146602 0 non-token data: ntokens 0.000 0 1070679408 0 non-token data: oldest atime 0.000 0 1072108196 0 non-token data: newest atime 0.000 0 1072108315 0 non-token data: last journal sync atime 0.000 0 1072061149 0 non-token data: last expiry atime 0.000 0 1382400 0 non-token data: last expire atime delta 0.000 0 8307 0 non-token data: last expire reduction count I was using xargs for a while but ran into problems using this, just as someone had said, it spawned off a new process and sa-learn didn't like that very much. I haven't tried inputting from a file as of yet. If the cut and tr method works for now, I will use that untill I run into some more problems. Thanks for all the help everyone gave. Jake -- Shrike-list mailing list Shrike-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/shrike-list