[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Best Way to extract Numbers from String

Jimbo

3/20/2010 3:04:00 AM

Hello

I am trying to grab some numbers from a string containing HTML text.
Can you suggest any good functions that I could use to do this? What
would be the easiest way to extract the following numbers from this
string...

My String has this layout & I have commented what I want to grab:
[CODE] """</th>
<td class="last">43.200 </td>
<td class="change indicator" nowrap>0.040 </td>

<td>43.150 </td> #
I need to grab this number only
<td>43.200 </td>
<td>43.130 </td> #
I need to grab this number only
<td>43.290 </td> <td>43.100 </td> # I need to
grab this number only
<td>7,450,447 </td>
<td class="middle"><a
href="/asx/markets/optionPrices.do?
by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
a></td>
<td class="middle"><a
href="/asx/markets/warrantPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">Warrants &amp; Structured
Products</a></td>
<td class="middle"><a
href="/asx/markets/cfdPrices.do?
by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
<td class="middle"><a href="http://hfgapps.hubb.com...
Charts.aspx?
TimeFrame=D6&amp;compare=comp_index&amp;indicies=XJO&amp;pma1=20&amp;pma2=20&amp;asxCode=BHP"><img
src="/images/chart.gif" border="0" height="15" width="15"></a>
</td>
<td><a href="/research/announcements/status_notes.htm#XD">XD</a>
</td>
<td><a href="/asx/statistics/announcements.do?
by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
</td>
</tr>"""[/CODE]
4 Answers

Gabriel Genellina

3/20/2010 3:22:00 AM

0

En Sat, 20 Mar 2010 00:04:08 -0300, Jimbo <nilly16@yahoo.com> escribió:

> I am trying to grab some numbers from a string containing HTML text.
> Can you suggest any good functions that I could use to do this? What
> would be the easiest way to extract the following numbers from this
> string...
>
> My String has this layout & I have commented what I want to grab:
> [CODE] """</th>
> <td class="last">43.200 </td>
> <td class="change indicator" nowrap>0.040 </td>
>
> <td>43.150 </td> #
> I need to grab this number only
> <td>43.200 </td>
> <td>43.130 </td> #
> I need to grab this number only

I'd use BeautifulSoup [1] to handle bad formed HTML like that.

[1] http://www.crummy.com/software/Beau...

--
Gabriel Genellina

Luis M. González

3/20/2010 12:51:00 PM

0

On Mar 20, 12:04 am, Jimbo <nill...@yahoo.com> wrote:
> Hello
>
> I am trying to grab some numbers from a string containing HTML text.
> Can you suggest any good functions that I could use to do this? What
> would be the easiest way to extract the following numbers from this
> string...
>
> My String has this layout & I have commented what I want to grab:
> [CODE] """</th>
>                                 <td class="last">43.200 </td>
>                                 <td class="change indicator" nowrap>0.040 </td>
>
>                                                    <td>43.150 </td> #
> I need to grab this number only
>                                 <td>43.200 </td>
>                                                    <td>43.130 </td> #
> I need to grab this number only
>                                 <td>43.290 </td>                                         <td>43.100 </td> # I need to
> grab this number only
>                                 <td>7,450,447 </td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/optionPrices.do?
> by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> a></td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/warrantPrices.do?
> by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> Products</a></td>
>                                 <td class="middle"><a
>                                         href="/asx/markets/cfdPrices.do?
> by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
>                                 <td class="middle"><a href="http://hfgapps.hubb.com...
> Charts.aspx?
> TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
> src="/images/chart.gif" border="0" height="15" width="15"></a>
> </td>
>                                 <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
>                                 </td>
>                                 <td><a href="/asx/statistics/announcements.do?
> by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> </td>
>                         </tr>"""[/CODE]


You should use BeautifulSoup or perhaps regular expressions.
Or if you are not very smart, lik me, just try a brute force approach:

>>> for i in s.split('>'):
for e in i.split():
if '.' in e and e[0].isdigit():
print (e)


43.200
0.040
43.150
43.200
43.130
43.290
43.100
>>>

Jimbo

3/20/2010 10:40:00 PM

0

On Mar 20, 11:51 pm, Luis M. González <luis...@gmail.com> wrote:
> On Mar 20, 12:04 am, Jimbo <nill...@yahoo.com> wrote:
>
>
>
>
>
> > Hello
>
> > I am trying to grab some numbers from a string containing HTML text.
> > Can you suggest any good functions that I could use to do this? What
> > would be the easiest way to extract the following numbers from this
> > string...
>
> > My String has this layout & I have commented what I want to grab:
> > [CODE] """</th>
> >                                 <td class="last">43.200 </td>
> >                                 <td class="change indicator" nowrap>0.040 </td>
>
> >                                                    <td>43.150 </td> #
> > I need to grab this number only
> >                                 <td>43.200 </td>
> >                                                    <td>43.130 </td> #
> > I need to grab this number only
> >                                 <td>43.290 </td>                                         <td>43.100 </td> # I need to
> > grab this number only
> >                                 <td>7,450,447 </td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/optionPrices.do?
> > by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</
> > a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/warrantPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured
> > Products</a></td>
> >                                 <td class="middle"><a
> >                                         href="/asx/markets/cfdPrices.do?
> > by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td>
> >                                 <td class="middle"><a href="http://hfgapps.hubb.com...
> > Charts.aspx?
> > TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< img
> > src="/images/chart.gif" border="0" height="15" width="15"></a>
> > </td>
> >                                 <td><a href="/research/announcements/status_notes.htm#XD">XD</a>
> >                                 </td>
> >                                 <td><a href="/asx/statistics/announcements.do?
> > by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a>
> > </td>
> >                         </tr>"""[/CODE]
>
> You should use BeautifulSoup or perhaps regular expressions.
> Or if you are not very smart, lik me, just try a brute force approach:
>
> >>> for i in s.split('>'):
>
>         for e in i.split():
>                 if '.' in e and e[0].isdigit():
>                         print (e)
>
> 43.200
> 0.040
> 43.150
> 43.200
> 43.130
> 43.290
> 43.100
>
>
>
> - Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Thanks very much, I'm going to look at regular expressions but that
for your code, it shows me how I can do it iwth standard python :)

Novocastrian_Nomad

3/21/2010 3:38:00 AM

0

Regular expression are very powerful, and I use them a lot in my
paying job (unfortunately not with Python). You are however,
basically using a second programing language, which can be difficult
to master.

Does this give you the desired result?

import re

matches = re.findall('<td>([\d\.,]+)\s*</td>', code)
for match in matches:
print match

resulting in this output:
43.150
43.200
43.130
43.290
43.100
7,450,447