Asp Forum - Another Wiki/Spam Update

Jim Weirich

11/12/2004 4:44:00 AM

During the a question/answer session at the NoFluff/JustStuff conference in
Cincinnati this summer, someone asked since there are so many things in the
IT world to learn, how does one tell what technologies to investigate and
what technologies to put on the back burner. The general answer from the
panel was to wait until you hear about something 6 times. At that point it
is probably worth investigating.

So, I'm jumping the clock here because I only heard of the following twice,
but it was twice in a two day period and it does have bearing on the wiki
spam issue.

I first heard about this from Austin Ziegler in an IM message about ruwiki.
Austin told me that Ruwiki will not link to external sites directly, but will
go through a page rank stripping redirect service supported by Google.
Hmmm ... interesting I thought.

Then today on the PreventingWikiSpam wiki page, LeoO writes:

> Maybe I'm missing something, but why not pass all outgoing links through the
> google redirect, thereby denying the spammer of their all important
> PageRank?.
> http://www.google.com/url?sa=D...

Ok, thats two references. LeoO also provided a link to
http://simon.incutio.com/archive/2004/05/1... where you can read more
details.

So, I went ahead and enabled the Google redirect for external links on the
RubyGarden wiki. I'll leave it there for a few days and see how it works.
If anyone has problems, feel free to drop me a line at jim@weirichhouse.org.

Just a couple of observations:

(1) Although it denies spammers the benefits of their activies, I'm not
convinced that it will prevent spamming in anything but the most indirect
ways. However, denying them those benefits still makes me feel all tinglely
inside.

(2) As currently implemented, URL with CGI parameters in them might have
problems. For example, in the link:

http://rubygarden.org/ruby?action=browse&id=RubyD...

everything from "&id=" to the end will be ignored when translated to

http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyD...

A workaround is to use something like http://t... (e.g. the above link
is equivalent to http://t.../5jmyb).

(3) As I mentioned, if there is negative pushback on this change, it can be
easily backed out.

Thanks for listening.

--
-- Jim Weirich jim@weirichhouse.org http://onest...
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

8 Answers

James Britt

11/12/2004 5:06:00 AM

Jim Weirich wrote:

...

> (2) As currently implemented, URL with CGI parameters in them might have
> problems. For example, in the link:
>
> http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> everything from "&id=" to the end will be ignored when translated to
>
> http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> A workaround is to use something like http://t... (e.g. the above link
> is equivalent to http://t.../5jmyb).

As practical as they may be, I'm less than enthused with passing my
links through tinyurl. I have much more faith in Google, and expect
that redirection through tinyurl will ultimately lead to some business
plan I may not care for.

Implementing the same behavior in Ruby should be trivial, and I would be
far more comfortable seeing links go through a Ruby-oriented site run by
a known member of the Ruby community (e.g., www.rubyurl.com, which
appears to be free)

Interesting idea, though, passing through Google.

James

gabriele renzi

11/12/2004 10:37:00 AM

James Britt ha scritto:

>>
>> A workaround is to use something like http://t... (e.g. the
>> above link is equivalent to http://t.../5jmyb).
>
>
> As practical as they may be, I'm less than enthused with passing my
> links through tinyurl. I have much more faith in Google, and expect
> that redirection through tinyurl will ultimately lead to some business
> plan I may not care for.
>
> Implementing the same behavior in Ruby should be trivial, and I would be
> far more comfortable seeing links go through a Ruby-oriented site run by
> a known member of the Ruby community (e.g., www.rubyurl.com, which
> appears to be free)

qurl.net runs with ruby FWIW.

Eric Hodel

11/12/2004 7:47:00 PM

On Nov 11, 2004, at 8:43 PM, Jim Weirich wrote:

> (2) As currently implemented, URL with CGI parameters in them might
> have
> problems. For example, in the link:
>
> http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> everything from "&id=" to the end will be ignored when translated to
>
>
> http://www.googl...sa=D&q=http://rubygarden...
> action=browse&id=RubyDiscussions

You just need to escape all [^a-zA-Z]:

http://www.googl...
sa=D&q=http%3a%2f%2frubygarden.org%2fruby%3faction%3dbrowse%26id%3dRubyD
iscussions

pull the code out of cgi.rb and you're done!

Florian Gross

11/12/2004 8:23:00 PM

Jim Weirich wrote:

> (2) As currently implemented, URL with CGI parameters in them might have
> problems. For example, in the link:
>
> http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> everything from "&id=" to the end will be ignored when translated to
>
> http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyD...

Can't you just use CGI.escape for this?

Jim Weirich

11/12/2004 11:42:00 PM

On Friday 12 November 2004 02:46 pm, Eric Hodel wrote:
> On Nov 11, 2004, at 8:43 PM, Jim Weirich wrote:
> > (2) As currently implemented, URL with CGI parameters in them might
> > have problems. [...]
>
> You just need to escape all [^a-zA-Z]:

Actually, I tried this, but then google barfed on the resulting URL. Perhaps
I encoded incorrectly. I'll give it another try when I get a chance.

Thanks.

--
-- Jim Weirich jim@weirichhouse.org http://onest...
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Jim Weirich

11/13/2004 3:26:00 AM

On Friday 12 November 2004 06:41 pm, Jim Weirich wrote:
> On Friday 12 November 2004 02:46 pm, Eric Hodel wrote:
> > On Nov 11, 2004, at 8:43 PM, Jim Weirich wrote:
> > > (2) As currently implemented, URL with CGI parameters in them might
> > > have problems. [...]
> >
> > You just need to escape all [^a-zA-Z]:
>
> Actually, I tried this, but then google barfed on the resulting URL.
> Perhaps I encoded incorrectly. I'll give it another try when I get a
> chance.

Got it working now. I must have fat fingered it earlier. Thanks.

--
-- Jim Weirich jim@weirichhouse.org http://onest...
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

Jim Weirich

11/13/2004 3:44:00 AM

On Friday 12 November 2004 03:28 pm, Florian Gross wrote:
> Can't you just use CGI.escape for this?

You know, its funny how the brain works. I saw this comment and thought to
myself "Of course! It would be much nicer just to use the CGI module
directly. That's what I will do."

So I bring up the editor and actually enter the code "CGI.escape($url)" into
the program, save it, and run a quick test.

But now I get the error:
Bareword "CGI" not allowed while "strict subs" in use at [...]

Now I'm sure most everybody who has been following this thread probably
realizes what is going on, but I still didn't see it. Half of my brain is
processing the problem that Perl doesn't like a bare CGI stuck into its code,
and the other half of the brain is trying to figure out why perfectly legal
Ruby code is causing an error. All of a sudden, the two halves of my brain
decided to talk to each other: "Duh! You're writing Ruby code in a Perl
program! Of course it doesn't work. Sheesh!"

After my brain got done rsyncing itself, I tried the code "$q->escape($url);"
and that works great.

Austin... it's become imperative that you get Ruwiki released soon [1]. I'm
afraid if I spend much more time in this Perl code I will become permanently
brain damaged.

--
-- Jim Weirich jim@weirichhouse.org http://onest...
-----------------------------------------------------------------
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)

[1] But hey, no pressure. Right!

Belorion

2/2/2005 9:37:00 PM

I don't mean to drudge up an old thread unnecessarily, but I
encountered this today:
http://www.google.com/googleblog/2005/01/preventing-comment....
Basically, it looks like google is trying to do something to help
stop comment/wiki spam. Implementing something like this won't *stop*
spammers (unless they know the site uses it), but if enough people
start doing it maybe this sort of spam will decrease in the long run.

On Fri, 12 Nov 2004 13:43:40 +0900, Jim Weirich <jim@weirichhouse.org> wrote:
> During the a question/answer session at the NoFluff/JustStuff conference in
> Cincinnati this summer, someone asked since there are so many things in the
> IT world to learn, how does one tell what technologies to investigate and
> what technologies to put on the back burner. The general answer from the
> panel was to wait until you hear about something 6 times. At that point it
> is probably worth investigating.
>
> So, I'm jumping the clock here because I only heard of the following twice,
> but it was twice in a two day period and it does have bearing on the wiki
> spam issue.
>
> I first heard about this from Austin Ziegler in an IM message about ruwiki.
> Austin told me that Ruwiki will not link to external sites directly, but will
> go through a page rank stripping redirect service supported by Google.
> Hmmm ... interesting I thought.
>
> Then today on the PreventingWikiSpam wiki page, LeoO writes:
>
> > Maybe I'm missing something, but why not pass all outgoing links through the
> > google redirect, thereby denying the spammer of their all important
> > PageRank?.
> > http://www.google.com/url?sa=D...
>
> Ok, thats two references. LeoO also provided a link to
> http://simon.incutio.com/archive/2004/05/1... where you can read more
> details.
>
> So, I went ahead and enabled the Google redirect for external links on the
> RubyGarden wiki. I'll leave it there for a few days and see how it works.
> If anyone has problems, feel free to drop me a line at jim@weirichhouse.org.
>
> Just a couple of observations:
>
> (1) Although it denies spammers the benefits of their activies, I'm not
> convinced that it will prevent spamming in anything but the most indirect
> ways. However, denying them those benefits still makes me feel all tinglely
> inside.
>
> (2) As currently implemented, URL with CGI parameters in them might have
> problems. For example, in the link:
>
> http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> everything from "&id=" to the end will be ignored when translated to
>
> http://www.google.com/url?sa=D&q=http://rubygarden.org/ruby?action=browse&id=RubyD...
>
> A workaround is to use something like http://t... (e.g. the above link
> is equivalent to http://t.../5jmyb).
>
> (3) As I mentioned, if there is negative pushback on this change, it can be
> easily backed out.
>
> Thanks for listening.
>
> --
> -- Jim Weirich jim@weirichhouse.org http://onest...
> -----------------------------------------------------------------
> "Beware of bugs in the above code; I have only proved it correct,
> not tried it." -- Donald Knuth (in a memo to Peter van Emde Boas)
>
>

comp.lang.ruby

Another Wiki/Spam Update

Jim Weirich

James Britt

gabriele renzi

Eric Hodel

Florian Gross

Jim Weirich

Jim Weirich

Jim Weirich

Belorion

x Login to ForumsZone