Alex Gutteridge
6/20/2007 8:42:00 AM
On 20 Jun 2007, at 17:15, merrittr wrote:
> hi i am trying to strip out text between body tags but when run it i
> get:
>
> rob@rob-laptop:~/ruby$ ./html2.rb
> ./html2.rb:14: unknown regexp options - bdy
> ./html2.rb:14: unterminated string meets end of file
> ./html2.rb:14: parse error, unexpected tSTRING_END, expecting
> tSTRING_CONTENT or tREGEXP_END or tSTRING_DBEG or tSTRING_DVAR
>
>
>
>
> #! /usr/bin/ruby
>
> @h = File.open "test.html"
> @response = @h.gets
>
> text = @response.scan(/<body[^>]*>(.+?)</body>/)[0]
> puts text
You need to escape the '/' in your regexp, and unless your html file
is one line you may need to also add the multiline option:
text = @response.scan(/<body[^>]*>(.+?)<\/body>/m)[0]
Alex Gutteridge
Bioinformatics Center
Kyoto University