Hans Fugal wrote: > There have been some differing opinions on whether a wiki will attract > spam and what to do about it. Here's a message about what the > RubyGarden wiki has experienced and done. Some of you may be familiar > with Ruby, and know that it is an extremely cool language but not > (yet) as popular as other languages like perl, python, Java, etc. If > you haven't heard of it, well that just attests to its not being a > major player in the language market (yet). Yet they struggle with wiki > spam. Hans, Thanks! Very interesting approach! I'd like to find out how much time Jim spends dealing with the tarpit. I may write to him someday, unless you find it convenient to do so. regards, Randy Kramer PS: Unless his efforts take zero time, I'd rather wait till a spam problem exists on WikiLearn before implementing such an approach. In the meantime, WikiLearn has, for example, the registration requirement. > > ---------- Forwarded message ---------- > From: "Jim Weirich" <jim@xxxxxxxxxxxxxxxx> > To: comp.lang.ruby > Date: Tue, 14 Dec 2004 03:21:02 +0900 > Subject: Wiki Spam Report > Wiki Spam Report > ---------------- > > I thought I would take some time and report on the wiki spam situation > on RubyGarden. As I hope you have noticed, the wiki has been > remarkably spam free. This email will tell you what measures we have > taken to get to this point. > > But first ... > > Some Numbers > ------------ > > Over the past 10 days, we have had: > > 93 updates to the wiki page, all (AFAICT) spam free. > (although I might have missed spotting some). > > 46 updates to the wiki tarpit. Of those, we had ... > 3 innocent updates > 2 questionable updates > 1 update by me > 40 spams > > The Mechanism > ------------- > > Spammers are automatically routed to a wiki tarpit. The tarpit is an > (almost) exact copy of the real RubyGarden wiki. Making changes to > the tarpit looks as if you are making changes to the real wiki. And > since spammers get their pages from the wiki, it looks like (to them) > that they have successfully spammed our site. > > However, everyone else never gets to see the spam. > > By tricking the spammers into thinking they are successful, they don't > put any additional effort into bypassing our spam detection criteria. > This is important! When we explicitly denied them access to the wiki, > then went to great lengths to figure out how to get around the > restrictions. I haven't seen any of that kind of probing with the > tarpit. > > Detecting Spammers > ------------------ > > The current spammer detection logic is based on two observations: > > (1) Spammers almost never use an IP address that has reverse lookup > enabled. This effectively means that it appears (to the wiki > software) that your host name looks like a numeric IP address. > > (2) Spammers almost never set user preferences on the wiki. > > So if both of these conditions are true, we treat the access as a spammer > and send it to the tarpit. > > Now this isn't perfect, but that's OK. We also have a explicit ban > list for spammers who pass one of (1) or (2) above. And we have an > explicit allow list that overrides the automatic spammer detection. > > Innocent Users > -------------- > > Can innocent users get trapped by the Tapit? The short answer is yes. > However, we are monitoring the tarpit and will attempt to rescue such > users. > > In the past 10 days, there were at least 3 page updates that were from > innocent users. One guy (bless his heart) even removed some spam from > the tarpit for us. > > When I see innocents trapped in the tarpit, I add their IP address to > the allow list and manually update the wiki with their changes (if > they are significant). > > Detecting the Tarpit? > --------------------- > > The tarpit is deliberately designed to look like the original wiki, so > it is sometimes difficult to tell when you are trapped. Here's some > suggestions. > > You are probably in the Tarpit when: > > * there are a lot of recent updates made with numeric IP addresses > rather than host names. > > * a lot of the pages have spam. > > Although neither of these suggestions are foolproof. I refresh the tarpit > from the real wiki occasionally (to keep it looking realistic). > Immediately after a refresh it is /very/ difficult to tell the difference. > > If you think you are trapped by the tarpit, send me > (jim@xxxxxxxxxxxxxxxx) an email with your IP address and I will check > the logs. If you are trapped, we can add your IP address to the allow > list. > > If you are worried about getting caught in the tarpit, just make sure you > have your user preferences set when accessing the tarpit (click on the > preferences link from any wiki page). > > Summary > ------- > > I am pretty happy with the current wiki situation. In fact, the > tarpit has been so successful, that I am considering lifting the ban > on lower case http. The ban currently isn't buying us any benefits > and is rather annoying (I'll make it so both upper and lower case > work). > > Thanks for your time. > > -- > -- Jim Weirich jim@xxxxxxxxxxxxxxxx http://onestepback.org > ----------------------------------------------------------------- > "Beware of bugs in the above code; I have only proved it correct, > not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)