[lnkForumImage]
TotalShareware - Download Free Software

Confronta i prezzi di migliaia di prodotti.
Asp Forum
 Home | Login | Register | Search 


 

Forums >

comp.lang.python

MediaWiki to RTF/Word/PDF

Josh English

2/17/2010 9:00:00 PM

I have several pages exported from a private MediaWiki that I need to
convert to a PDF document, or an RTF document, or even a Word
document.

So far, the only Python module I have found that parses MediaWiki
files is mwlib, which only runs on Unix, as far as I can tell. I'm
working on Windows here.

Has anyone heard of a module that parses wiki markup and transforms
it? Or am I looking at XSLT?

Josh
4 Answers

Diez B. Roggisch

2/17/2010 9:36:00 PM

0

Am 17.02.10 22:00, schrieb Josh English:
> I have several pages exported from a private MediaWiki that I need to
> convert to a PDF document, or an RTF document, or even a Word
> document.
>
> So far, the only Python module I have found that parses MediaWiki
> files is mwlib, which only runs on Unix, as far as I can tell. I'm
> working on Windows here.

I think you stand a chance making it run under windows using mingw.
Might be a bit daunting though.

Other than that, yep, XSLT is your friend.

Diez

John Bokma

2/17/2010 10:10:00 PM

0

Josh English <joshua.r.english@gmail.com> writes:

> I have several pages exported from a private MediaWiki that I need to
> convert to a PDF document, or an RTF document, or even a Word
> document.
>
> So far, the only Python module I have found that parses MediaWiki
> files is mwlib, which only runs on Unix, as far as I can tell. I'm
> working on Windows here.
>
> Has anyone heard of a module that parses wiki markup and transforms
> it? Or am I looking at XSLT?

One option might be to install a printer driver that prints to PDF and
just print the web pages.

Using Saxon or AltovaXML and a suitable stylesheet might give you the
nicest looking result though (but quite some work).

--
John Bokma j3b

Hacking & Hiking in Mexico - http://john...
http://castle... - Perl & Python Development

Paul Rubin

2/18/2010 4:14:00 AM

0

Josh English <joshua.r.english@gmail.com> writes:
> Has anyone heard of a module that parses wiki markup and transforms
> it? Or am I looking at XSLT?

MediaWiki markup is quite messy and unless MediaWiki has an XML export
feature that I don't know about, I don't see what good XSLT can do you.
(The regular MediaWiki API generates XML results with wiki markup
embedded in it). It looks like mediawiki itself can create pdf's (see
any page on en.wikibooks.org for example), but the rendered pdf is not
that attractive.

I remember something about a Haskell module to parse mediawiki markup,
if that is of any use.

Snaky Love

2/18/2010 11:30:00 AM

0

Hi,

- I checked some ways doing this, and starting over with a new thing
will give you a lot of headaches - all XSLT processors have one or
another problem - success depends very much on how you where using
wikipedia (plugins?) and you will have to expect a lot of poking
around with details and still not beeing happy with the solutions
available - there are many really crazy approaches out there to
generate pdf of mediawiki "markup" - many tried, not many succeeded,
most of them stop at "good enough for me"-level. So it might be a good
idea, not running too far away from what they are doing at
http://code.pedi... - you will spend much less time with
installing ubuntu in a virtualbox.

However there is one quite impressive tool, that does pdf conversion
via css and is good for getting the job done quick and not too dirty:
http://www.princexml.co... - scroll down to the mediawiki
examples - they offer a free license for non-commercial projects.

Good luck!

Have a nice day,
Snaky