Asp Forum - Using gsub to remove embedded newlines in HTML file

Wes Gamble

8/2/2006 10:59:00 PM

I have an HTML file that is in a string.

I want to use gsub! to recursively remove any embedded newlines and
whitespace within two known delimeters.

Given a string that includes this kind of string:

~^LNK:http://slashdot.org/login.pl?op=ne...
Create a new account
^~

I want to replace the above with:

~^LNK:http://slashdot.org/login.pl?op=ne...Create a new account^~

(stripping out the newlines and whitespace)

Having trouble writing the regex for this.

I think I want something like:

/~\^LNK:.*?([\s\r\n])+.*?\^~/

that I could use in:

str.gsub!(/~\^LNK:.*?([\s\r\n])+.*?\^~/, '')

to replace all of the whitespace, or potential newline characters with
null strings.

But I don't think this will work because I really need to loop _within_
each substring of my large HTML string. The thing about gsub is that it
will substitute the entire matched string.

Do I need to scan out the ~^LNK.*?^~, operate on those and then put them
back into the larger string?

I'm not sure I'm asking this very well, so I apologize if that's the
case.

Thanks,
Wes

--
Posted via http://www.ruby-....

4 Answers

Wes Gamble

8/2/2006 11:05:00 PM

Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\s\r\n]/, '')
@html.gsub!(/#{link_line}/mi, new_link_line)
end

--
Posted via http://www.ruby-....

Wes Gamble

8/2/2006 11:40:00 PM

Wes Gamble wrote:
> Something like:
>
> @html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
> new_link_line = link_line.gsub(/[\s\r\n]/, '')
> @html.gsub!(/#{link_line}/mi, new_link_line)
> end

This seems to work well:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\t\r\n]/, '')
@html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
link_line != new_link_line
end

I wonder if I could have done with with one @html.gsub!() command, but
this is much more understandable to me anyway so I'll stick with this.

Thanks,
Wes

--
Posted via http://www.ruby-....

Carlos

8/3/2006 2:52:00 AM

Wes Gamble wrote:

> Wes Gamble wrote:
>
>>Something like:
>>
>> @html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
>> new_link_line = link_line.gsub(/[\s\r\n]/, '')
>> @html.gsub!(/#{link_line}/mi, new_link_line)
>> end
>
>
> This seems to work well:
>
> @html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
> new_link_line = link_line.gsub(/[\t\r\n]/, '')
> @html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
> link_line != new_link_line
> end

You can use a block with gsub:
@html.gsub!(/~\^LNK:.*?~/mi) { |s| s.gsub /\s/, '' }

or something like that.

Good luck.
--

Wes Gamble

8/3/2006 8:49:00 PM

Thanks. That is the _Ruby_ way to do it, and that's what I wanted to
know :).

I've used blocks with gsub but I keep forgetting that I can put anything
in there - so far I've only used backrefs to pull out pieces of the
matching regex to rearrange things.

Wes

--
Posted via http://www.ruby-....

comp.lang.ruby

Using gsub to remove embedded newlines in HTML file

Wes Gamble

Wes Gamble

Wes Gamble

Carlos

Wes Gamble

x Login to ForumsZone