[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

Saving a page loaded using the webbrowser library?

Dr. Benjamin David Clarke

3/25/2010 7:41:00 AM

Does anyone know of a way to save the a loaded web page to file after
opening it with a webbrowser.open() call?

Specifically, what I want to do is get the raw HTML from a web page.
This web page uses Javascript. I need the resulting HTML after the
Javascript has been run. I've seen a lot about trying to get Python to
run Javascript but there doesn't seem to be any promising solution. I
can get the raw HTML that I want by saving the page after it has been
loaded via the webbrowser.open() call. Is there any way to automate
this? Does anyone have any ideas for better approaches to this
problem? I don't need ti to be pretty or anything.
1 Answer

Irmen de Jong

3/25/2010 8:13:00 AM

0

On 3/25/10 8:41 AM, Dr. Benjamin David Clarke wrote:
> Does anyone know of a way to save the a loaded web page to file after
> opening it with a webbrowser.open() call?
>
> Specifically, what I want to do is get the raw HTML from a web page.
> This web page uses Javascript. I need the resulting HTML after the
> Javascript has been run. I've seen a lot about trying to get Python to
> run Javascript but there doesn't seem to be any promising solution. I
> can get the raw HTML that I want by saving the page after it has been
> loaded via the webbrowser.open() call. Is there any way to automate
> this? Does anyone have any ideas for better approaches to this
> problem? I don't need ti to be pretty or anything.

I think I would use an appropriate GUI automation library to simulate
user interaction with the web browser that you just started, and e.g.
select the File > Save page as > HTML only menu option from the browser...

If the javascript heavily modifies the DOM, that might not work however.
You might need additional tooling such as Web Developer Toolbar for
Firefox where you then can View Source > View Generated Source.

irmen