[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

HTML filtering in weblog/BBS software

Alexey Verkhovsky

10/14/2004 12:23:00 PM

Hi all,

I am writing some sort of BBS in Ruby (on Rails). I downloaded and
included RedCloth for template rendering (in 5 lines of code and 15
lines of test - wow!). It's cool, but allows to include any HTML.

Now, I don't want to let some kiddie include some <javascript/> that
would make an innocent BBS thread pop 50 new browsers - no matter how
cool it might seem.

I wonder if there is any existing code to sanitize user inputs by
replacing dangerous HTML tags (like the aforementioned <javascript/>),
that I could use with RedCloth to alleviate this risk.

Ditto for plain text inputs (user names, subjects and other such).

Alex



3 Answers

Austin Ziegler

10/14/2004 1:42:00 PM

0

On Thu, 14 Oct 2004 21:22:43 +0900, Alexey Verkhovsky <alex@verk.info> wrote:
> Hi all,
>
> I am writing some sort of BBS in Ruby (on Rails). I downloaded and
> included RedCloth for template rendering (in 5 lines of code and 15
> lines of test - wow!). It's cool, but allows to include any HTML.
>
> Now, I don't want to let some kiddie include some <javascript/> that
> would make an innocent BBS thread pop 50 new browsers - no matter how
> cool it might seem.
>
> I wonder if there is any existing code to sanitize user inputs by
> replacing dangerous HTML tags (like the aforementioned <javascript/>),
> that I could use with RedCloth to alleviate this risk.
>
> Ditto for plain text inputs (user names, subjects and other such).

There is some work that I'm doing with Ruwiki that is currently in CVS
that covers this -- it currently covers it too well, but it does cover
it. (I just fixed this.)

# Find HTML tags
SIMPLE_TAG_RE = %r{<[^<>]+?>} # Ensure that only the tag is grabbed.
HTML_TAG_RE = %r{\A< # Tag must be at start of match.
(/)? # Closing tag?
([\w:]+) # Tag name
(?:\s+ # Space
([^>]+) # Attributes
(/)? # Singleton tag?
)? # The above three are optional
>}x
ATTRIBUTES_RE = %r{([\w:]+)(=(?:\w+|"[^"]+?"|'[^']+?'))?}x
ALLOWED_ATTR = %w(style title type lang dir class id cite datetime abbr) +
%w(colspan rowspan compact start media)
ALLOWED_HTML = %w(abbr acronym address b big blockquote br caption cite) +
%w(code col colgroup dd del dfn dir div dl dt em h1 h2 h3) +
%w(h4 h5 h6 hr i ins kbd kbd li menu ol p pre q s samp) +
%w(small span span strike strong style sub sup table tbody) +
%w(td tfoot th thead tr tt u ul var)

# Clean the content of unsupported HTML and attributes. This includes
# XML namespaced HTML. Sorry, but there's too much possibility for
# abuse.
def clean(content)
content = content.gsub(SIMPLE_TAG_RE) do |tag|
tagset = HTML_TAG_RE.match(tag)

if tagset.nil?
tag = Ruwiki.clean_entities(tag)
else
closer, name, attributes, single = tagset.captures

if ALLOWED_HTML.include?(name.downcase)
unless closer or attributes.nil?
attributes = attributes.scan(ATTRIBUTES_RE).map do |set|
if ALLOWED_ATTR.include?(set[0].downcase)
set.join
else
""
end
end.compact.join(" ")
tag = "<#{closer}#{name} #{attributes}#{single}>"
else
tag = "<#{closer}#{name}>"
end
else
tag = Ruwiki.clean_entities(tag)
end
end

tag
end
end

Ruwiki.clean_entities converts all instances of & => &amp;, < => &lt;,
and > => &gt;.

-austin
--
Austin Ziegler * halostatue@gmail.com
* Alternate: austin@halostatue.ca
: as of this email, I have [ 5 ] Gmail invitations


Florian Gross

10/14/2004 2:16:00 PM

0

Alexey Verkhovsky wrote:

> Hi all,

Moin!

> I am writing some sort of BBS in Ruby (on Rails). I downloaded and
> included RedCloth for template rendering (in 5 lines of code and 15
> lines of test - wow!). It's cool, but allows to include any HTML.

There's two options for not allowing user-specified HTML and style
sheets. (Even style sheets can contain JavaScript.) Just use RedCloth
like this:

RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
# => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"

BlueCloth and RDoc have similar options AFAIK.

Regards,
Florian Gross

Mauricio Fernández

10/14/2004 2:54:00 PM

0

On Thu, Oct 14, 2004 at 11:19:47PM +0900, Florian Gross wrote:
> >I am writing some sort of BBS in Ruby (on Rails). I downloaded and
> >included RedCloth for template rendering (in 5 lines of code and 15
> >lines of test - wow!). It's cool, but allows to include any HTML.
>
> There's two options for not allowing user-specified HTML and style
> sheets. (Even style sheets can contain JavaScript.) Just use RedCloth
> like this:
>
> RedCloth.new("h1. A <b>bold</b> man", [:filter_html, :filter_styles])
> # => "<h1>A &lt;b&gt;bold&lt;/b&gt; man</h1>"
>
> BlueCloth and RDoc have similar options AFAIK.

IIRC RDoc doesn't allow raw HTML by design.

--
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com