[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.ruby

Capitalization

Jason Vogel

12/8/2006 6:35:00 PM

Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
RegEx is the answer, just don't know where to start.

Current Source:
str.split(' ').each {|w| w.capitalize!}.join(' ')

Text:
ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
SELLER HEAT/AC/DUCTWORK

Result:
Additional Spa (only Available W/purchase Of Pool Or Spa)
Seller Heat/ac/ductwork

Desired:
Additional Spa (Only Available w/Purchase of Pool or Spa)
Seller Heat/AC/Ductwork

Isssus:
- Need to capitalize after a "/'
- Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
"w/[a]" => "w/[A]")

Thanks,
Jason

11 Answers

Martin DeMello

12/8/2006 6:39:00 PM

0

On 12/9/06, Jason Vogel <jasonvogel@gmail.com> wrote:
>
> Isssus:
> - Need to capitalize after a "/'
> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
> "w/[a]" => "w/[A]")

Take a look at http://zem.novylen.net/ruby/ti... (especially
the icap method).

martin

Paul Lutus

12/8/2006 7:35:00 PM

0

Jason Vogel wrote:

> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
> RegEx is the answer, just don't know where to start.
>
> Current Source:
> str.split(' ').each {|w| w.capitalize!}.join(' ')
>
> Text:
> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> SELLER HEAT/AC/DUCTWORK
>
> Result:
> Additional Spa (only Available W/purchase Of Pool Or Spa)
> Seller Heat/ac/ductwork
>
> Desired:
> Additional Spa (Only Available w/Purchase of Pool or Spa)
> Seller Heat/AC/Ductwork
>
> Isssus:
> - Need to capitalize after a "/'
> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
> "w/[a]" => "w/[A]")

How many special cases? In the worst case, you would have to use a
dictionary to avoid treating acronyms as a word. You already have two
rather difficult rules, one having to do with acronyms, another having to
do with special treatment of the sequence "w/".

What I am saying is this is likely to be more difficult than it seems,
especially because we only have one example of what might end up being
thousands of examples of free-form text.

--
Paul Lutus
http://www.ara...

Daniel Finnie

12/8/2006 10:08:00 PM

0

Try this:
str.gsub(/[A-Za-z]+/) {|x| x.capitalize}

If you want the W of W/ uncapitalized:
str.downcase.gsub(/[A-Za-z]+(?!\/)/) {|x| x.capitalize}

Paul Lutus wrote:
> Jason Vogel wrote:
>
>> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
>> RegEx is the answer, just don't know where to start.
>>
>> Current Source:
>> str.split(' ').each {|w| w.capitalize!}.join(' ')
>>
>> Text:
>> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
>> SELLER HEAT/AC/DUCTWORK
>>
>> Result:
>> Additional Spa (only Available W/purchase Of Pool Or Spa)
>> Seller Heat/ac/ductwork
>>
>> Desired:
>> Additional Spa (Only Available w/Purchase of Pool or Spa)
>> Seller Heat/AC/Ductwork
>>
>> Isssus:
>> - Need to capitalize after a "/'
>> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
>> "w/[a]" => "w/[A]")
>
> How many special cases? In the worst case, you would have to use a
> dictionary to avoid treating acronyms as a word. You already have two
> rather difficult rules, one having to do with acronyms, another having to
> do with special treatment of the sequence "w/".
>
> What I am saying is this is likely to be more difficult than it seems,
> especially because we only have one example of what might end up being
> thousands of examples of free-form text.
>

Daniel Finnie

12/8/2006 10:12:00 PM

0

Oops, forgot to paste this one in:
To get keep words like "of" and "is" lowercase: (basically anything
under 3 letters)
text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}


Daniel Finnie wrote:
> Try this:
> str.gsub(/[A-Za-z]+/) {|x| x.capitalize}
>
> If you want the W of W/ uncapitalized:
> str.downcase.gsub(/[A-Za-z]+(?!\/)/) {|x| x.capitalize}
>
> Paul Lutus wrote:
>> Jason Vogel wrote:
>>
>>> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
>>> RegEx is the answer, just don't know where to start.
>>>
>>> Current Source:
>>> str.split(' ').each {|w| w.capitalize!}.join(' ')
>>>
>>> Text:
>>> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
>>> SELLER HEAT/AC/DUCTWORK
>>>
>>> Result:
>>> Additional Spa (only Available W/purchase Of Pool Or Spa)
>>> Seller Heat/ac/ductwork
>>>
>>> Desired:
>>> Additional Spa (Only Available w/Purchase of Pool or Spa)
>>> Seller Heat/AC/Ductwork
>>>
>>> Isssus:
>>> - Need to capitalize after a "/'
>>> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
>>> "w/[a]" => "w/[A]")
>>
>> How many special cases? In the worst case, you would have to use a
>> dictionary to avoid treating acronyms as a word. You already have two
>> rather difficult rules, one having to do with acronyms, another having to
>> do with special treatment of the sequence "w/".
>>
>> What I am saying is this is likely to be more difficult than it seems,
>> especially because we only have one example of what might end up being
>> thousands of examples of free-form text.
>>
>
>

Jacob Fugal

12/9/2006 12:02:00 AM

0

On 12/8/06, Daniel Finnie <danfinnie@optonline.net> wrote:
> Oops, forgot to paste this one in:
> To get keep words like "of" and "is" lowercase: (basically anything
> under 3 letters)
> text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}

I agree with Paul Lutus, there are too many special cases. And
Daniel's regex here is a good example. I can spot at least three (to
me) obvious errors:

1) Anything with a '/' trailing will not get capitalized, so in the
OP's example, neither "heat" nor "ac" would be capitalized at all.

2) There are plenty of words with fewer than three letters that should
be capitalized. The first person pronoun "I", for instance. Or even
"of" or "is", if they're the first word in the sentence.

3) In the absence of 1 and 2, "ac" would still get turned into "Ac"
rather than "AC".

Jacob Fugal

Daniel Finnie

12/9/2006 12:35:00 AM

0

Jacob Fugal wrote:
> On 12/8/06, Daniel Finnie <danfinnie@optonline.net> wrote:
>> Oops, forgot to paste this one in:
>> To get keep words like "of" and "is" lowercase: (basically anything
>> under 3 letters)
>> text.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}
>
> I agree with Paul Lutus, there are too many special cases. And
> Daniel's regex here is a good example. I can spot at least three (to
> me) obvious errors:
>
> 1) Anything with a '/' trailing will not get capitalized, so in the
> OP's example, neither "heat" nor "ac" would be capitalized at all.
Trailing /'s do work as long as the word before it is at least 3 letters
long.
irb(main):004:0> src.downcase.gsub(/[A-Za-z]{3,}(?!\/)/) {|x| x.capitalize}
=> "Additional Spa (Only Available w/Purchase of Pool or Spa) Seller
Heat/ac/Ductwork "

> 2) There are plenty of words with fewer than three letters that should
> be capitalized. The first person pronoun "I", for instance. Or even
> "of" or "is", if they're the first word in the sentence.
>
> 3) In the absence of 1 and 2, "ac" would still get turned into "Ac"
> rather than "AC".

These are valid points that I feel shouldn't be incorporated into the
original regexp.


William James

12/9/2006 2:24:00 AM

0

Jason Vogel wrote:
> Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
> RegEx is the answer, just don't know where to start.
>
> Current Source:
> str.split(' ').each {|w| w.capitalize!}.join(' ')
>
> Text:
> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> SELLER HEAT/AC/DUCTWORK
>
> Result:
> Additional Spa (only Available W/purchase Of Pool Or Spa)
> Seller Heat/ac/ductwork
>
> Desired:
> Additional Spa (Only Available w/Purchase of Pool or Spa)
> Seller Heat/AC/Ductwork
>
> Isssus:
> - Need to capitalize after a "/'
> - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
> "w/[a]" => "w/[A]")
>
> Thanks,
> Jason

specials = %w( of or w AC ).
inject({}){|h,s| h.update({s.downcase,s}) }

puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
specials[s] or s.capitalize }.join

__END__
ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
SELLER HEAT/AC/DUCTWORK


--- output -----
Additional Spa (Only Available w/Purchase of Pool or Spa)
Seller Heat/AC/Ductwork

Jason Vogel

12/10/2006 7:37:00 AM

0

William,

This is exactly what I'm looking for. I don't understand it, but it's
what I'm looking for.

Would you mind explaining what your code does?

Thanks,
Jason



On Dec 8, 8:23 pm, "William James" <w_a_x_...@yahoo.com> wrote:
> Jason Vogel wrote:
> > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
> > RegEx is the answer, just don't know where to start.
>
> > Current Source:
> > str.split(' ').each {|w| w.capitalize!}.join(' ')
>
> > Text:
> > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> > SELLER HEAT/AC/DUCTWORK
>
> > Result:
> > Additional Spa (only Available W/purchase Of Pool Or Spa)
> > Seller Heat/ac/ductwork
>
> > Desired:
> > Additional Spa (Only Available w/Purchase of Pool or Spa)
> > Seller Heat/AC/Ductwork
>
> > Isssus:
> > - Need to capitalize after a "/'
> > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
> > "w/[a]" => "w/[A]")
>
> > Thanks,
> > Jasonspecials = %w( of or w AC ).
> inject({}){|h,s| h.update({s.downcase,s}) }
>
> puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
> specials[s] or s.capitalize }.join
>
> __END__
> ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> SELLER HEAT/AC/DUCTWORK
>
> --- output -----
> Additional Spa (Only Available w/Purchase of Pool or Spa)
> Seller Heat/AC/Ductwork

Paul Lutus

12/10/2006 8:22:00 AM

0

Jason Vogel wrote:

> William,
>
> This is exactly what I'm looking for. I don't understand it, but it's
> what I'm looking for.
>
> Would you mind explaining what your code does?

Here is the code the prior poster offered (and please do not top-post -- it
makes it hard to reconstruct the thread):

puts DATA.read.downcase.split( /([^a-z]+)/ ).map{ |s| specials[s] or
s.capitalize }.join

Here is the breakdown:

DATA.read.downcase

Means: "read the data, convert entirely to lowercase"

..split( /([^a-z]+)/ )

means: "split the data on non-alphabetic boundaries (which in this case
produces an array of entities consisting of words and slashes)"

..map{ |s| specials[s] or s.capitalize }

Means: "submit each word to a block of code that either succeeds in matching
the word with a predefined special set of exceptions or, failing that,
capitalizes the word (uppercases the first character)"

..join

Means: "join the array into one continuous line separated by spaces"

Finally, print it all using the very first command on the line -- "puts".

--
Paul Lutus
http://www.ara...

William James

12/10/2006 4:54:00 PM

0

Jason Vogel wrote:
> William,
>
> This is exactly what I'm looking for. I don't understand it, but it's
> what I'm looking for.
>
> Would you mind explaining what your code does?
>
> Thanks,
> Jason
>
>
>
> On Dec 8, 8:23 pm, "William James" <w_a_x_...@yahoo.com> wrote:
> > Jason Vogel wrote:
> > > Disclaimer : Ruby Nuby and I don't know RegEx basically at all. I know
> > > RegEx is the answer, just don't know where to start.
> >
> > > Current Source:
> > > str.split(' ').each {|w| w.capitalize!}.join(' ')
> >
> > > Text:
> > > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> > > SELLER HEAT/AC/DUCTWORK
> >
> > > Result:
> > > Additional Spa (only Available W/purchase Of Pool Or Spa)
> > > Seller Heat/ac/ductwork
> >
> > > Desired:
> > > Additional Spa (Only Available w/Purchase of Pool or Spa)
> > > Seller Heat/AC/Ductwork
> >
> > > Isssus:
> > > - Need to capitalize after a "/'
> > > - Need specific word case handling (e.g. "Ac" => "AC","or" => "or",
> > > "w/[a]" => "w/[A]")
> >
> > > Thanks,
> > > Jasonspecials = %w( of or w AC ).
> > inject({}){|h,s| h.update({s.downcase,s}) }
> >
> > puts DATA.read.downcase.split( /([^a-z]+)/ ).map{|s|
> > specials[s] or s.capitalize }.join
> >
> > __END__
> > ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
> > SELLER HEAT/AC/DUCTWORK
> >
> > --- output -----
> > Additional Spa (Only Available w/Purchase of Pool or Spa)
> > Seller Heat/AC/Ductwork

It helps to inspect the data structures.

Try:

specials = %w( of or w AC ).
inject({}){|h,s| h.update({s.downcase,s}) }

p specials

text = DATA.read.downcase
p text.split( /([^a-z]+)/ )
puts text.split( /([^a-z]+)/ ).map{|s|
specials[s] or s.capitalize }.join

__END__
ADDITIONAL SPA (ONLY AVAILABLE W/PURCHASE OF POOL OR SPA)
SELLER HEAT/AC/DUCTWORK