[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Cutting a piece of text

Zdebel

2/12/2006 4:18:00 PM

Helo !
I've started to learn ruby and I'm amazed with it. Now I have a problem
that I can't solve. If I have a string like this:
"<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how can I
cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
"<lyrcis> Lalalalala </lyrics>" Could you please help me ?

--
Posted via http://www.ruby-....


10 Answers

James Gray

2/12/2006 4:57:00 PM

0

On Feb 12, 2006, at 10:18 AM, Zdebel wrote:

> Helo !
> I've started to learn ruby and I'm amazed with it. Now I have a
> problem
> that I can't solve. If I have a string like this:
> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how
> can I
> cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
> "<lyrcis> Lalalalala </lyrics>" Could you please help me ?

You can do it with a regular expression like the following, but I
must stress that this isn't very robust:

>> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>".sub
(/<(\w+)[^>]+>/, "<\\1>")
=> "<lyrics> Lalalalala </lyrics>"

Hope that helps.

James Edward Gray II


David Vallner

2/12/2006 5:05:00 PM

0

Dna Nedela 12 Február 2006 17:18 Zdebel napísal:
> Helo !
> I've started to learn ruby and I'm amazed with it. Now I have a problem
> that I can't solve. If I have a string like this:
> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how can I
> cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
> "<lyrcis> Lalalalala </lyrics>" Could you please help me ?

The very geeky, and most probably least error-prone way would be whacking the
string with a DOM parser, clearing the attributes, and then printing it out
again. Unfortunately, I haven't been doing any DOM manipulation in Ruby, so I
can't provide code.

David Vallner


James Gray

2/12/2006 5:14:00 PM

0

On Feb 12, 2006, at 11:05 AM, David Vallner wrote:

> Dna Nedela 12 Február 2006 17:18 Zdebel napísal:
>> Helo !
>> I've started to learn ruby and I'm amazed with it. Now I have a
>> problem
>> that I can't solve. If I have a string like this:
>> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how
>> can I
>> cut the " artist=XXX album=XXX title=XXX" part, so it would look
>> like:
>> "<lyrcis> Lalalalala </lyrics>" Could you please help me ?
>
> The very geeky, and most probably least error-prone way would be
> whacking the
> string with a DOM parser, clearing the attributes, and then
> printing it out
> again. Unfortunately, I haven't been doing any DOM manipulation in
> Ruby, so I
> can't provide code.

The following is how you do it for valid XML, but the posted example
wasn't quite:

#!/usr/local/bin/ruby -w

require "rexml/document"

doc = "<lyrics artist='XXX' album='XXX' title='XXX'> Lalalalala </
lyrics>"
xml = REXML::Document.new(doc)
xml.root.attributes.clear
xml.write
puts

__END__

James Edward Gray II



samuel.murphy

2/12/2006 5:17:00 PM

0

Learn regular expressions. Here's a not great example:

a = "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>"
b = a.gsub(/\w*=\w*/ , "")
c = b.gsub(/\s/, "")
print c, "\n"

<lyrics>Lalalalala</lyrics>


A slightly (yes very slightly) more realistic example:

a = '<lyrics artist="Prince" album="purplerain" title="computerblue">
Lalalalala </lyrics>'
b = a.gsub(/\w*="\w*"/ , "")
c = b.gsub(/\s/, "")
print c, "\n"


<lyrics>Lalalalala</lyrics>


And what if there are spaces in a tag:

a = '<lyrics artist="Prince" album="purplerain" title="Computer Blue">
Lalalalala </lyrics>'
b = a.gsub(/\w*=".*"/ , "")
c = b.gsub(/\s/, "")

James Gray

2/12/2006 5:20:00 PM

0

On Feb 12, 2006, at 11:05 AM, Zdebel wrote:

> I wish I knew how this (/<(\w+)[^>]+>/, "<\\1>")
> regular expresion works :).

It reads:

/ < # find a < character
( # capture this next part into $1 (\\1 in the replacement
string)
\w+ # followed by one or more word characters
) # end capture
[^>]+ # followed by one or more non > characters
> # and finally a > character
/x


The replacement just restores the <\w+> and leaves out the [^>]+ part
(the space and attributes).

Hope that helps.

James Edward Gray II



Zdebel

2/12/2006 5:33:00 PM

0

Big thank you too all of you guys for such a response. This helped me
alot and my script is working, but I will practice more using your
advices :)

--
Posted via http://www.ruby-....


Marcin Mielzynski

2/12/2006 6:05:00 PM

0

James Edward Gray II wrote:

> >> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala
> </lyrics>".sub(/<(\w+)[^>]+>/, "<\\1>")
> => "<lyrics> Lalalalala </lyrics>"

reluctant would a bit faster:

p "<lyrics artist=XXX album=XXX title=XXX> Lalalalala
</lyrics>".gsub(/<(\w+).*?>/, "<\\1>")


lopex

David Vallner

2/12/2006 8:21:00 PM

0

Dna Nedela 12 Február 2006 19:30 James Edward Gray II napísal:
> On Feb 12, 2006, at 12:08 PM, Marcin Mielzynski wrote:
> > James Edward Gray II wrote:
> >> >> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </
> >>
> >> lyrics>".sub(/<(\w+)[^>]+>/, "<\\1>")
> >> => "<lyrics> Lalalalala </lyrics>"
> >
> > reluctant would a bit faster:
> >
> > p "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </
> > lyrics>".gsub(/<(\w+).*?>/, "<\\1>")
>
> Are you sure?
>
> $ ruby regexp_time.rb
> Rehearsal -------------------------------------------------
> /<(w+)[^>]+>/ 7.210000 0.030000 7.240000 ( 7.266166)
> /<(w+).*?>/ 7.710000 0.020000 7.730000 ( 7.757304)
> --------------------------------------- total: 14.970000sec
>
> user system total real
> /<(w+)[^>]+>/ 7.170000 0.030000 7.200000 ( 7.227075)
> /<(w+).*?>/ 7.730000 0.020000 7.750000 ( 7.777196)
> $ cat regexp_time.rb
> #!/usr/local/bin/ruby -w
>
> require "benchmark"
>
> tests = 1000000
> data = "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>"
>
> Benchmark.bmbm do |x|
> x.report("/<(\w+)[^>]+>/") do
> tests.times { data.sub(/<(\w+)[^>]+>/, "<\\1>") }
> end
> x.report("/<(\w+).*?>/") do
> tests.times { data.sub(/<(\w+).*?>/, "<\\1>") }
> end
> end
>
> __END__
>
> ;)
>
> James Edward Gray II

The nongreedy match has to "back up" and retry on every character after the
tag name, whileas James' [^>] doesn't ever have to back up. In fact, even a
greedy .* would probably be faster than a nongreedy one in this case.

Gotta love the black art that is optimizing regexps.

David Vallner


Marcin Mielzynski

2/12/2006 8:37:00 PM

0

David Vallner wrote:

> The nongreedy match has to "back up" and retry on every character after the
> tag name, whileas James' [^>] doesn't ever have to back up. In fact, even a
> greedy .* would probably be faster than a nongreedy one in this case.
>
> Gotta love the black art that is optimizing regexps.
>

Ooops.. You are right!

But as I read greedy quantifiers do backtrack as well (but not in the
case above).

/a+aa/ =~ "aaaaa"
will backtrack two characters

only possesive quantifier (in oniguruma e.g.) consumes in the real,
greedy way.

so
/a++aa/ =~ "aaaaa"
won't match.

lopex

William James

2/13/2006 7:53:00 AM

0

Zdebel wrote:
> Helo !
> I've started to learn ruby and I'm amazed with it. Now I have a problem
> that I can't solve. If I have a string like this:
> "<lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>" how can I
> cut the " artist=XXX album=XXX title=XXX" part, so it would look like:
> "<lyrcis> Lalalalala </lyrics>" Could you please help me ?
>
> --
> Posted via http://www.ruby-....

p " <lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>".
sub(/\s+[^<>]*(?=>)/, '' )

p " <lyrics artist=XXX album=XXX title=XXX> Lalalalala </lyrics>".
scan( /\G ( [^<]+ ) | \G ( < \S* ) [^>]* ( > ) /x ).
flatten.compact.join