content filters!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Cachers,

I've almost finished the content filtering code. At the moment it is
working merrily on all proxied/cached XOVER and XHDR's. I haven't
finished the ARITCLE/HEAD/BODY intergration yet (it's quite tricky to do
efficiently). Before I do, and the whole scheme is sealed in concrete do
any of you have any requests as far as content filtering is concerned?
Is this something people will use? Appart from the $$ MAKE MONEY FAST $$
spam example, you can do things in the access file like this:

# host patern		group	permissions		filters userfiles
young_user@*.my.domain	*	read,post,filter	under18.filter
young_user@*.my.domain	sci.*	read,post
young_user@*.my.domain	*cancer*	read,post

Note that the group specifier is quite userful here, because it adds
context to the filtering. For instance, "testicle" in sci.* is probably
good reading material for an 8 year old, but maybe not elsewhere.

(from nntpcache.access)
[...]
# if "censor" or "filter" are in the permissions then the fourth
# field contains a list of filter files which are used for pattern
# matching on content/headers. The difference between "censor"
# and "filter" is that when an nntpcache client asks for a censored
# piece of information nntpcache returns information of the type
# requested, but with the content replaced with a message stating
# that the message was censored. the "filter" permission on the other
# attemps to remove information matching the filter transparently.
#
[...]
# host patern		group	permissions		filters userfiles
#
[...]
localhost		*	read,post,filter	spam.filter,pedo.filter
[....]

(from spam.filter)
# fuzzy filters (spam filter)
#
#
#
# scope	weight	options	pattern
#
# "scope" defines the "search scope" and can have one of the
# following values:
#
# 	head		-	matches against article header
#	body		-	matches against article body
#	article		-	matches against the lot
#	header:		-	matches against a particular header
#
# nb. "header:" is an actual header, e.g "Subject:". The
#     header is case insensitive and is used for matching
#     against XOVER and XHDR requests (i.e when then client
#     fetches a list of articles, not the articles themselves)
#
# "weight" is a positive or negative floating point value, normally
# on or between -10000 and +10000, and optionally prefixed with
# a '*' or '/' which designates the weight should be added to
# the current score, which is then multiplied/divided by the
# the weight. weights are floating point, so can numbers like
# 0.25 etc.
#
# the score starts at zero. if the filter exists with a score
# greater than +100, then the filter is presumed to be true.
# if at any stage, the score is above or equal to 10,000 or below or
# equal to -10,000 then the filter stops scanning with that score.
#
# "options" are currently one or more of the following (seperated
# by commas, with the obvious semantic limitations on not having
# two options which imply the opposite thing):
#
#	regex		-	pattern is a regular expression
#				(this is the default)
#	simple		-	pattern is a simple pattern
#				(i.e "[]?*" opperators only)
#	case		-	pattern is case sensitive
#	nocase		-	pattern is case insensitive
#
# "pattern" is the regex or simple pattern to match for. It
# may be split over sucessive lines iff the first chracter of
# the next line is white-space. White space on either side
# of the pattern is always ignored.

Subject: 10001	regex,nocase	(make|earn|get|find|\*\*|\$\$).*(money|cash|dollar|\$\$)

-- 
"Of all tyrannies a tyranny sincerely  exercised for the good of its victims  
 may be the most  oppressive.  It may be better to live under  robber barons  
 than  under  omnipotent  moral busybodies,  The robber baron's  cruelty may  
 sometimes sleep,  his cupidity may at some point be satiated; but those who  
 torment us for own good  will torment us  without end,  for they do so with 
 the approval of their own conscience."    -   C.S. Lewis, _God in the Dock_ 
+---------------------+--------------------+----------------------------------+
|Julian Assange RSO   | PO Box 2031 BARKER | Secret Analytic Guy Union        |
|proff@suburbia.net   | VIC 3122 AUSTRALIA | finger for PGP key hash ID =     |
|proff@gnu.ai.mit.edu | FAX +61-3-98199066 | 0619737CCC143F6DEA73E27378933690 |
+---------------------+--------------------+----------------------------------+


[Index of Archives]     [Yosemite]     [Yosemite Campsites]     [Bugtraq]     [Linux]     [Trn]

Powered by Linux