[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Regular expressions and long text

Guillermo.Acilu

6/20/2008 5:56:00 PM

[Note: parts of this message were removed to make it a legal post.]

Hello guys,

I've started with Ruby a month ago and I am doing some works with strings
and regular expressions. I am trying to take a long text and store the
individual sentences in an array. I can split a sentence in words and
store them in an array, but I cannot manage to do it with sentences.

I have used the following assignment to work with the words:

str = "Ruby is great"
words = []
words = str.scan(/\w+/)

The result is words[0]="Ruby" words[1]="is" and words[3]="great"

I would like to do the following:

str = "Ruby is great. We all know that."

and get words[0]="Ruby is great" and ruby[1]="We all know that"

Any ideas on how to do it with a regular expression instead of looping
through the string looking for the "."?

Thanks,

Guillermo

15 Answers

Sandro Paganotti

6/20/2008 7:01:00 PM

0

[Note: parts of this message were removed to make it a legal post.]

I did not understand if you want to split the string on the full stop
str.split(".")
or divide the string in words and split them in two groups:


str = "Ruby is great. We all know that."
([(v=str.split(" "))[0...k=((l=(v.size))/2)]]+[v[k..l]]).map{|e|e.join(" ")}
=> ["Ruby is great.", "We all know that."]


On Fri, Jun 20, 2008 at 5:56 PM, <Guillermo.Acilu@koiaka.com> wrote:

> Hello guys,
>
> I've started with Ruby a month ago and I am doing some works with strings
> and regular expressions. I am trying to take a long text and store the
> individual sentences in an array. I can split a sentence in words and
> store them in an array, but I cannot manage to do it with sentences.
>
> I have used the following assignment to work with the words:
>
> str = "Ruby is great"
> words = []
> words = str.scan(/\w+/)
>
> The result is words[0]="Ruby" words[1]="is" and words[3]="great"
>
> I would like to do the following:
>
> str = "Ruby is great. We all know that."
>
> and get words[0]="Ruby is great" and ruby[1]="We all know that"
>
> Any ideas on how to do it with a regular expression instead of looping
> through the string looking for the "."?
>
> Thanks,
>
> Guillermo
>



--
Go outside! The graphics are amazing!

Bryan JJ Buckley

6/21/2008 9:24:00 AM

0

You can split on a regex for a full-stop followed by (optional) whitespace.

>> str.split(/\.\s?/)
=> ["Ruby is great", "We all know that"]

--
JJ

Clint

6/24/2008 10:13:00 AM

0

Sandro Paganotti pisze:
> [Note: parts of this message were removed to make it a legal post.]
>
> I did not understand if you want to split the string on the full stop
> str.split(".")
> or divide the string in words and split them in two groups:
>
>
> str = "Ruby is great. We all know that."
> ([(v=str.split(" "))[0...k=((l=(v.size))/2)]]+[v[k..l]]).map{|e|e.join(" ")}
> => ["Ruby is great.", "We all know that."]
>
>
> On Fri, Jun 20, 2008 at 5:56 PM, <Guillermo.Acilu@koiaka.com> wrote:
>
>> Hello guys,
>>
>> I've started with Ruby a month ago and I am doing some works with strings
>> and regular expressions. I am trying to take a long text and store the
>> individual sentences in an array. I can split a sentence in words and
>> store them in an array, but I cannot manage to do it with sentences.
>>
>> I have used the following assignment to work with the words:
>>
>> str = "Ruby is great"
>> words = []
>> words = str.scan(/\w+/)
>>
>> The result is words[0]="Ruby" words[1]="is" and words[3]="great"
>>
>> I would like to do the following:
>>
>> str = "Ruby is great. We all know that."
>>
>> and get words[0]="Ruby is great" and ruby[1]="We all know that"
>>
>> Any ideas on how to do it with a regular expression instead of looping
>> through the string looking for the "."?
>>
>> Thanks,
>>
>> Guillermo
>>
>
>
>

Hi

maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"
irb(main):012:0>

greetings

Clint

6/24/2008 10:15:00 AM

0

Guillermo.Acilu@koiaka.com pisze:
> [Note: parts of this message were removed to make it a legal post.]
>
> Hello guys,
>
[cut]
> I would like to do the following:
> str = "Ruby is great. We all know that."
> and get words[0]="Ruby is great" and ruby[1]="We all know that"
>
> Any ideas on how to do it with a regular expression instead of looping
> through the string looking for the "."?
>
Hi,
maybe you should to try this: words = str.split(/\.\s*/)

it works for me:

irb(main):008:0> str = "Ruby is great. We all know that."
=> "Ruby is great. We all know that."
irb(main):009:0> words = str.split(/\.\s*/)
=> ["Ruby is great", "We all know that"]
irb(main):010:0> words[0]
=> "Ruby is great"
irb(main):011:0> words[1]
=> "We all know that"

greetings

Zhukov Pavel

6/24/2008 10:52:00 AM

0

On Tue, Jun 24, 2008 at 2:18 PM, shaman <noone@nowhere.com> wrote:
> Guillermo.Acilu@koiaka.com pisze:
>>
>> [Note: parts of this message were removed to make it a legal post.]
>>
>> Hello guys,
>>
> [cut]
>>
>> I would like to do the following:
>> str = "Ruby is great. We all know that."
>> and get words[0]="Ruby is great" and ruby[1]="We all know that"
>>
>> Any ideas on how to do it with a regular expression instead of looping
>> through the string looking for the "."?
>>
> Hi,
> maybe you should to try this: words = str.split(/\.\s*/)
>
> it works for me:
>
> irb(main):008:0> str = "Ruby is great. We all know that."
> => "Ruby is great. We all know that."
> irb(main):009:0> words = str.split(/\.\s*/)
> => ["Ruby is great", "We all know that"]
> irb(main):010:0> words[0]
> => "Ruby is great"
> irb(main):011:0> words[1]
> => "We all know that"
>
> greetings
>
>

even more simple

irb(main):001:0> "Ruby is great. We all know that.".split(".")
=> ["Ruby is great", " We all know that"]

Raveendran Jazzez

6/25/2008 5:25:00 AM

0

Hi,

I think u expect this output.. so pls try it..

str="Ruby is great. We all know that."
a= str.split('.').join(' ')
words=[]
words=a.scan(/\w+/)


=> words=["Ruby","is","great","We","all","know","that"]

Regards,
P.Raveendran
http://raveendran.wor...



unknown wrote:
> Hello guys,
>
> I've started with Ruby a month ago and I am doing some works with
> strings
> and regular expressions. I am trying to take a long text and store the
> individual sentences in an array. I can split a sentence in words and
> store them in an array, but I cannot manage to do it with sentences.
>
> I have used the following assignment to work with the words:
>
> str = "Ruby is great"
> words = []
> words = str.scan(/\w+/)
>
> The result is words[0]="Ruby" words[1]="is" and words[3]="great"
>
> I would like to do the following:
>
> str = "Ruby is great. We all know that."
>
> and get words[0]="Ruby is great" and ruby[1]="We all know that"
>
> Any ideas on how to do it with a regular expression instead of looping
> through the string looking for the "."?
>
> Thanks,
>
> Guillermo

--
Posted via http://www.ruby-....

Caike

7/4/2008 12:12:00 AM

0

On Jun 25, 2:25 am, Raveendran Jazzez <jazzezr...@gmail.com> wrote:
> Hi,
>
> I think u expect this output.. so pls try it..
>
> str="Ruby is great. We all know that."
> a= str.split('.').join(' ')
> words=[]
> words=a.scan(/\w+/)
>
> => words=["Ruby","is","great","We","all","know","that"]
>
> Regards,
> P.Raveendranhttp://raveendran.wor...
>
>
>
> unknown wrote:
> > Hello guys,
>
> > I've started with Ruby a month ago and I am doing some works with
> > strings
> > and regular expressions. I am trying to take a long text and store the
> > individual sentences in an array. I can split a sentence in words and
> > store them in an array, but I cannot manage to do it with sentences.
>
> > I have used the following assignment to work with the words:
>
> > str = "Ruby is great"
> > words = []
> > words = str.scan(/\w+/)
>
> > The result is words[0]="Ruby" words[1]="is" and words[3]="great"
>
> > I would like to do the following:
>
> > str = "Ruby is great. We all know that."
>
> > and get words[0]="Ruby is great" and ruby[1]="We all know that"
>
> > Any ideas on how to do it with a regular expression instead of looping
> > through the string looking for the "."?
>
> > Thanks,
>
> > Guillermo
>
> --
> Posted viahttp://www.ruby-....

If you want to stick to a regex based solution.

>> str = "one one one. two. three."
=> "one one one. two. three."
>> str.scan(/\w[\s|\w]*./)
=> ["one one one.", "two.", "three."]

And you could keep going adding more words in the same pattern

>> str = "one one one. two. three. four. five."
=> "one one one. two. three. four. five."
>> str.scan(/\w[\s|\w]*./)
=> ["one one one.", "two.", "three.", "four.", "five."]


It may not be the best solution to this problem, but it is always good
have your regexp skills up to date ;)

Hassan Schroeder

7/4/2008 2:24:00 AM

0

Very late to this thread, but...

On Sat, Jun 21, 2008 at 2:23 AM, Bryan JJ Buckley <jjbuckley@gmail.com> wrote:
> You can split on a regex for a full-stop followed by (optional) whitespace.
>
> >> str.split(/\.\s?/)
> => ["Ruby is great", "We all know that"]

str="Dr. Feelgood will meet you at the corner of Foo St. and Bar Dr.
tonight at 8:00; bring $2.98 -- exact change -- to resolve the 5.5%
interest you owe."

:-)
--
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com

? ??

12/11/2008 8:07:00 AM

0

Hi,

I've one program to replace text's contents.

def replace (aPatten, aReplace)

# I need some logic to translate string to patten

contents = File.read("data")
contents.gsub!(aPatten, aReplace)
File.open("result", "w") do |file|
file << contents
end
end


Another class give an aPatten argument as a "/[aeiou]/" and aReplace
as a "*". Both of them are String type.

And I know I can get a normal result when I put in /[aeiou]/ instead
of "/[aeiou]/".

Any ideas on how to do I convert string to patten?

Robert Klemme

12/11/2008 8:17:00 AM

0

2008/12/11 Jun Young Kim <jykim@altibase.com>:
> I've one program to replace text's contents.
>
> def replace (aPatten, aReplace)
>
> # I need some logic to translate string to patten
>
> contents = File.read("data")
> contents.gsub!(aPatten, aReplace)
> File.open("result", "w") do |file|
> file << contents
> end
> end
>
>
> Another class give an aPatten argument as a "/[aeiou]/" and aReplace as a
> "*". Both of them are String type.
>
> And I know I can get a normal result when I put in /[aeiou]/ instead of
> "/[aeiou]/".
>
> Any ideas on how to do I convert string to patten?

How about looking at the documentation?

http://www.ruby-doc.org/core/classes/R...

Btw, I rather tend to make it a requirement that the argument has the
appropriate type. Since #gsub is capable of working with String and
Regexp as pattern, I would not change your method's implementation but
the code invoking it.

Taking this one step further: I would choose a different abstraction:

def transform from_file, to_file
repl = yield(File.read(from_file)) and
File.open(to_file, "w") do |io|
io.write(repl)
end
end

Then you can do

transform "data", "result" do |content|
content.gsub! /[aeiou]/, "*"
content
end

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end