Asp Forum - sgub stretching over several lines

Jan Ask

8/5/2007 3:23:00 AM

Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience<\/h3>
<div class="field-items>
<div class="field-item>/, '</audience>')

It somehow doesn't seem to work.

I would like to know how to use a wildcard like '*', '%' or '...' like
below:

string = string.gsub(/<h ... tem>/, '</audience>')

Thanks!
Ask
--
Posted via http://www.ruby-....

6 Answers

Shai Rosenfeld

8/5/2007 7:41:00 AM

Jan Ask wrote:
> Hi,
>
> I am trying to do a in a replace a long (multiple line) string:
>
> string = string.gsub(/<h3
> class="field-label>audience<\/h3>
> <div class="field-items>
> <div class="field-item>/, '</audience>')
>
> It somehow doesn't seem to work.
>
> I would like to know how to use a wildcard like '*', '%' or '...' like
> below:
>
> string = string.gsub(/<h ... tem>/, '</audience>')
>
> Thanks!
> Ask

> string = string.gsub(/<h.*tem>/, '</audience>')

*

# the . regexp wildcard means, 'any' character (space, symbol, letter)
# the * regexp wildcard is an operator saying: any regexp match before
me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the '<h' string and the 'tem>' string.
hth

happy sunday btw
--
Posted via http://www.ruby-....

Alex Gutteridge

8/5/2007 8:53:00 AM

On 5 Aug 2007, at 16:41, Shai Rosenfeld wrote:

> Jan Ask wrote:
>> Hi,
>>
>> I am trying to do a in a replace a long (multiple line) string:
>>
>> string = string.gsub(/<h3
>> class="field-label>audience<\/h3>
>> <div class="field-items>
>> <div class="field-item>/, '</audience>')
>>
>> It somehow doesn't seem to work.
>>
>> I would like to know how to use a wildcard like '*', '%' or '...'
>> like
>> below:
>>
>> string = string.gsub(/<h ... tem>/, '</audience>')
>>
>> Thanks!
>> Ask
>
>> string = string.gsub(/<h.*tem>/, '</audience>')
>
> .*
>
> # the . regexp wildcard means, 'any' character (space, symbol, letter)
> # the * regexp wildcard is an operator saying: any regexp match before
> me can appear 0 or more times:
>
> i.e, you get whatever character you want, however many times you want
> it, between the '<h' string and the 'tem>' string.
> hth
>
> happy sunday btw
> --
> Posted via http://www.ruby-....
>

* won't match over multiple lines without the m modifier on the
RegExp, which I think is the OP's problem:

irb(main):021:0> string = "Hi\nJan\nAsk"
=> "Hi\nJan\nAsk"
irb(main):022:0> string.gsub(/Hi.*Ask/,'Hi Jane Ask')
=> "Hi\nJan\nAsk"
irb(main):023:0> string.gsub(/Hi.*Ask/m,'Hi Jane Ask')
=> "Hi Jane Ask"

Alex Gutteridge

Bioinformatics Center
Kyoto University

Sebastian Hungerecker

8/5/2007 9:08:00 AM

Alex Gutteridge wrote:

> .* won't match over multiple lines without the m modifier on the
> RegExp, which I think is the OP's problem:

That can't be the OP's problem since the OP doesn't actually use .* in his
regexp (or any other kind of wildcard). He was asking how to use wildcards
so he could simplify his regexp (and make it work).
To know why his original regexp didn't work, we'd have to see the string it's
supposed to match, I suppose.

--
NP: Explosions in the Sky - Day 1
Jabber: sepp2k@jabber.org
ICQ: 205544826

Jan Ask

8/6/2007 1:12:00 AM

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit more
complicated.

I am basically trying to cleanup a long xml file. A typical part of the
string looks like this:

<div class="field field-type-text field-field-audience">

<h3 class="field-label">audience</h3>

<div class="field-items">

<div class="field-item">Public</div>

</div>

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan
--
Posted via http://www.ruby-....

Alex Gutteridge

8/6/2007 2:08:00 AM

On 6 Aug 2007, at 10:12, Jan Ask wrote:

> Alex & Sebastian,
>
> Thanks for taking the time to reply. The string.gsub(/start.*end/m,
> 'some_value') did indeed help, but I am afraid my problem is a bit
> more
> complicated.
>
> I am basically trying to cleanup a long xml file. A typical part of
> the
> string looks like this:
>
> <div class="field field-type-text field-field-
> audience">
>
> <h3 class="field-label">audience</h3>
>
> <div class="field-items">
>
> <div class="field-item">Public</div>
>
> </div>
>
> </div>
>
> <div class="field field-type-text field-field-
> creator">
> <h3 class="field-label">creator</h3>
> <div class="field-items">
> <div class="field-item">Tom Jones</
> div>
> </div>
> </div>
>
> I am trying to format it like this:
> <audience>Public</audience>
> <creator>Tom Jones</creator>
>
> So the problem is that the values in the xml change throughout the
> string, so I cannot do a pattern match for them directly. Any ideas
> would be hugely appreciated!
>
> Jan
> --
> Posted via http://www.ruby-....

Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

require 'rubygems'
require 'hpricot'

string = DATA.read #read in string

string.gsub!(/</,'<') #Convert lt and gt symbols to real <>
string.gsub!(/>/,'>')
string.gsub!(/"/,'"') #Put in quotes

doc = Hpricot(string) #Parse with Hpricot

fields = ['audience','creator'] #Create array of 'fields' to extract

fields.each do |f| #For each field...
el = doc.search("//div[@class='field field-type-text field-field-#
{f}']") #...find appropriate divs
el.each do |e| # for each field div...
puts "<#{f}>" + e.at("//div[@class='field-item']").inner_html +
"</#{f}>" #print data
end
end

__END__
<div class="field field-type-text field-field-audience">

<h3 class="field-label">audience</h3>

<div class="field-items">

<div class="field-item">Public</div>

</div>

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

Alex Gutteridge

Bioinformatics Center
Kyoto University

Jan Ask

8/6/2007 2:20:00 AM

Alex Gutteridge wrote:
> On 6 Aug 2007, at 10:12, Jan Ask wrote:
>
>>
>>
>>
>> Posted via http://www.ruby-....
> Without knowing the whole problem it is difficult to say what the
> best solution is, but for the string you post above, I would clean it
> up and parse with something like Hpricot:

Thanks, I will have a try.

By the way, I see you are in Kyoto. I am studying at Tsukuba University
(about an hour from Tokyo), so if you come to the big city, I owe you a
beer!
--
Posted via http://www.ruby-....

comp.lang.ruby

sgub stretching over several lines

Jan Ask

Shai Rosenfeld

Alex Gutteridge

Sebastian Hungerecker

Jan Ask

Alex Gutteridge

Jan Ask

x Login to ForumsZone