e-mail serving

R P Herrold <herrold@xxxxxxxxxxxx> · Wed, 3 Aug 2011 14:55:18 -0400 (EDT)

On Wed, 3 Aug 2011, Always Learning wrote:

> On Wed, 2011-08-03 at 11:03 -0700, Todd wrote:
>
>> indeed no, but I want to work on some pattern matching, analysis for a
>> piece of software I have wanted to write for years..
>
> Lots of success and good luck. Do let us know how it goes.

umm -- high speed, automated harvesting of email and running 
regex against the corpus to yield say, a list of currently 
live addresses seems to fit the problem description.  Why 
would you wish the creation of a yet another such spammer 
tool, good luck? ;)

That said, procmail can do such trivially, and single pass 
filtering a million pieces a day is trivial, but the bandwidth 
to get it to a single machine is rather high for a residential 
link ... trivial in a colo

let's do some science:

>From my mailspool, I have 6124 pieces taking up 139,083,522 
bytes just now

[herrold@centos-5 ~]$ echo "( 139083522 / 6124 ) " | bc
22711

so 22k bytes per piece x 1 million ~= 22 G per day

86400 seconds in a day, on the simplifying assumption that one 
has a level steady state load (which could be done by setting 
a peripheral MX unit to handle the inload).  I was handling 
750k / day with a central unit and two MX satelites on RHL 7 
with 200 MHz Pentiums and perhaps 64M or ram in them

[herrold@centos-5 ~]$ echo "22000000000 / 86400" | bc
254629 bytes per second

so roughly a T-1

A single Linux box on a 386 with 16M ram running RHL 4.0 a 
decade ago had no problem with such loads.  Getting 
an efficient regex algorithm would be the choke point

-- Russ herrold
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos