Asp Forum
Home
|
Login
|
Register
|
Search
Forums
>
comp.lang.ruby
hpricot and regexp?
Feng Tien
5/14/2008 5:43:00 AM
I'm trying to grab the "cache date" off of the google search.
using Mechanize (and built in hpricot)
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("
http://www.google....
)
search_form = page.forms.with.name("f").first
search_form.q = "Hello"
search_results = agent.submit(search_form)
cache_date = agent.click search_results.links.text('Cached')
date = cache_date.search('table table > td').inner_html
How do i grab the date like on this page:
http://209.85.173.104/search?q=cache%3Ashacknews.com&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client...
the part that's right after "as retrieved on" (the date)
Is there a built in hpricot method that can search by rexep?
or will I have to use something like gsub?
--
Posted via
http://www.ruby-...
.
3 Answers
Feng Tien
5/14/2008 5:47:00 AM
0
Feng Tien wrote:
> I'm trying to grab the "cache date" off of the google search.
>
> using Mechanize (and built in hpricot)
>
>
> agent = WWW::Mechanize.new
> agent.user_agent_alias = 'Mac Safari'
> page = agent.get("
http://www.google....
)
> search_form = page.forms.with.name("f").first
> search_form.q = "Hello"
> search_results = agent.submit(search_form)
> cache_date = agent.click search_results.links.text('Cached')
>
> date = cache_date.search('table table > td').inner_html
>
>
> How do i grab the date like on this page:
>
http://209.85.173.104/search?q=cache%3Ashacknews.com&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client...
>
> the part that's right after "as retrieved on" (the date)
> Is there a built in hpricot method that can search by rexep?
> or will I have to use something like gsub?
oops, I mean, grep.
oh, i got it down to this:
date = cache_date.search('table table > td').inner_text.grep(/retrieved
on (.+)./)
which outputs:["This is G o o g l e's cache of
http://www....
as
retrieved on May 11, 2008 01:09:29 GMT.\n"]
How do I get rid of everything before the date?
--
Posted via
http://www.ruby-...
.
Feng Tien
5/14/2008 6:02:00 AM
0
Feng Tien wrote:
> Feng Tien wrote:
>> I'm trying to grab the "cache date" off of the google search.
>>
>> using Mechanize (and built in hpricot)
>>
>>
>> agent = WWW::Mechanize.new
>> agent.user_agent_alias = 'Mac Safari'
>> page = agent.get("
http://www.google....
)
>> search_form = page.forms.with.name("f").first
>> search_form.q = "Hello"
>> search_results = agent.submit(search_form)
>> cache_date = agent.click search_results.links.text('Cached')
>>
>> date = cache_date.search('table table > td').inner_html
>>
>>
>> How do i grab the date like on this page:
>>
http://209.85.173.104/search?q=cache%3Ashacknews.com&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client...
>>
>> the part that's right after "as retrieved on" (the date)
>> Is there a built in hpricot method that can search by rexep?
>> or will I have to use something like gsub?
>
>
> oops, I mean, grep.
>
> oh, i got it down to this:
>
> date = cache_date.search('table table > td').inner_text.grep(/retrieved
> on (.+)./)
>
>
> which outputs:["This is G o o g l e's cache of
http://www....
as
> retrieved on May 11, 2008 01:09:29 GMT.\n"]
>
> How do I get rid of everything before the date?
Now I have this:
date = cache_date.search('table table > td').inner_text.grep(/retrieved
on (.+)./).to_s.gsub(/.+as retrieved on /,"").gsub(/.\n/,"")
which gives me exactly what i need. is there a better way to doing this?
--
Posted via
http://www.ruby-...
.
Phlip
5/14/2008 9:01:00 AM
0
supplementing other answers...
> date = cache_date.search('table table > td').inner_text.grep(/retrieved
> on (.+)./)
p $1
Did the () capture it?
Servizio di avviso nuovi messaggi
Ricevi direttamente nella tua mail i nuovi messaggi per
hpricot and regexp?
Inserendo la tua e-mail nella casella sotto, riceverai un avviso tramite posta elettronica ogni volta che il motore di ricerca troverà un nuovo messaggio per te
Il servizio è completamente GRATUITO!
x
Login to ForumsZone
Login with Google
Login with E-Mail & Password