[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

newbieQ: new array from old array w regex

Charles L. Snyder

6/10/2005 4:11:00 AM

Hi,

As an exercise to learn to use Ruby, I am trying to search large xml files
to extract just the zip codes from the author address nodes in the file:

require "rexml/document"
include REXML
doc = Document.new File.new("JPS.xml")
mylist2=mylist=[]
doc.elements.each("RECORDS/RECORD/AUTHOR_ADDRESS") {|element| mylist.push
element.text}
puts mylist.each {|m| /b\d{5}-\d{4}\b|\b\d{5}\b/.match(m)}

This only gives me the entire contents of the address node - how do I
extract only the zip codes (preferably putting them in an array). My next
step is to end up with a hash table of unique zipcodes and their frequency
of occurrence...

Thanks

CLS




2 Answers

Nakada, Nobuyoshi

6/10/2005 5:22:00 AM

0

Hi,

At Fri, 10 Jun 2005 13:15:28 +0900,
Charles L. Snyder wrote in [ruby-talk:145032]:
> This only gives me the entire contents of the address node - how do I
> extract only the zip codes (preferably putting them in an array). My next
> step is to end up with a hash table of unique zipcodes and their frequency
> of occurrence...

Enumerable#each just returns the receiver itself.

puts mylist.grep(/b\d{5}-\d{4}\b|\b\d{5}\b/)

--
Nobu Nakada


Robert Klemme

6/10/2005 6:55:00 AM

0

nobuyoshi nakada wrote:
> Hi,
>
> At Fri, 10 Jun 2005 13:15:28 +0900,
> Charles L. Snyder wrote in [ruby-talk:145032]:
>> This only gives me the entire contents of the address node - how do I
>> extract only the zip codes (preferably putting them in an array). My
>> next step is to end up with a hash table of unique zipcodes and
>> their frequency of occurrence...
>
> Enumerable#each just returns the receiver itself.
>
> puts mylist.grep(/b\d{5}-\d{4}\b|\b\d{5}\b/)

You still get the complete string. This one might work better

puts mylist.inject([]){|res,e| res << $& if /\b\d{5}(?:-\d{4})?\b/ =~ e;
res}

Also, could it be that the leading backslash for the word boundary was
missing in the original rx?

Kind regards

robert