My Bookshelf | Cart | Support & Forum
Mobipocket ebooks homepage
Support & Community
  Support Home Search Help Log in  
Reply to topic How can I convert the Wikipedia?
How can I convert the Wikipedia?
eliotropo


Joined: 12 Jul 2007
Posts: 3
Reply with quote
Hi, I've seen the wikipedia offline is availible for mobipocket reader in german and english. It would be great if I could convert any wikipedia (I mean, in any language) to mobipocket reader format.

You can download the "original" wikipedia in two formats: in html and in xml.

I've tried to convert it with mobipocket converter, but doesn't work. When I try the html version, it only converts the index.html file, and ignores all the rest of indexes articles. Whit the XML format, the software simply can't open it Sad

So please, I really want to have an encyclopedia in my device. Could somebody hel me? I know it is possible, because it exists in other languages.

Thanks in advance.
View user's profileFind all posts by eliotropoSend private message
robert_marquardt


Joined: 09 Feb 2006
Posts: 339
Reply with quote
Please search the forum. The creator of the german Wikipedia in Mobipocket format has asked for help here. It will give you an impression how hard it is.
View user's profileFind all posts by robert_marquardtSend private message
mobi_fabien


Joined: 08 Feb 2006
Posts: 2682
Location: Mobipocket
Reply with quote
Hello,

converting the Wikipedia is indeed very complex for a number of reasons. The HTML from Wikipedia has to be adapted, and if you want to make a searchable dictionary, you have to add Mobipocket specific tags. The Creator alone will not be enough, you have to program scripts for instance to process the content.

http://www.mobipocket.com/forum/viewtopic.php?p=3737#3737

Best regards,

Fabien

_________________
Symbian eBook Reader
eBook publishing tools
View user's profileFind all posts by mobi_fabienSend private message
robert_marquardt


Joined: 09 Feb 2006
Posts: 339
Reply with quote
Not to mention that it needs several computers to run several days for only creating the 500 MB german Wikipedia.
View user's profileFind all posts by robert_marquardtSend private message
frank1212


Joined: 16 Jul 2007
Posts: 21
Reply with quote
HI,

I've written scripts that take static wikipedia html-dumps, remove everything unneeded (menus, footers, you name it...) and creates multiple datafiles which are to be used in mobigen.exe .

A single datafile looks like this:

Code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de" dir="ltr">
<head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<mbp:pagebreak/>
<idx:entry name="wp"><idx:orth>Some article title</idx:orth>
<a name="abc" external="yes" />
<h1>Some article title>

lots of html...
<a onclick="window.open('subdoc.mobi#xyz')">some intra-wikipedia-but-inter-file-link</a>
<a onclick="window.open('#foobar')">some intra-file-link</a>

</idx:entry>

<mpb:pagebreak/>
<idx:entry name="wpfb"><idx:orth>Some other article title</idx:orth>
<a name="foobar" external="yes" />
[...]

</body>
</html>


A single datafile's opf file looks like this:

Code:

<?xml version="1.0" encoding="utf-8"?>
<package unique-identifier="wpfb_0">
        <metadata>
                <dc-metadata xmlns:dc="http://purl.org/metadata/dublin_core"
                        xmlns:oebpackage="http://openebook.org/namespaces/oeb-package/1.0/">
                        <dc:Identifier id="uid">WPFB_0</dc:Identifier>
                        <dc:Title>WPFB_0</dc:Title>
                        <dc:Creator>FB</dc:Creator>
                        <dc:Date>18/7/2007</dc:Date>
                        <dc:Copyrights>GNU FDL</dc:Copyrights>
                        <dc:Publisher></dc:Publisher>
                        <dc:Subject></dc:Subject>
                <dc:Language>de</dc:Language></dc-metadata>
                <x-metadata/>
        </metadata>

        <manifest>
                <item id="wpfb_0" href="wpfb_0.html" media-type="text/x-oeb1-document" />
        </manifest>
        <tours>
        </tours>
        <spine>
        </spine>
        <guide>
                <reference type="start" title="Start" href="index_search('wpfb')" />
                <reference type="toc" title="Find article" href="index_search('wpfb')" />
        </guide>
</package>


Using mobigen.exe under Linux works like a charm - thanks to wine Smile .


However, there are some problems:

When I click on a link (MP reader on Symbian Series80, Nokia 9300i), it says Could not open D%3A%5Cebooks%5Csubdocs#wp_XRCO=20Award .

I don't know, how to access the index in the reader, although I added guide/reference-tags.

The former wikipedia builder guy mentioned his idea of make a data file just for the index. His idea was to put one inter-file-link per wikipedia article into this data file. Unfortunately, I've no idea, how to define such an automatic redirect. Could someone post a short example?

What's the difference between *.mobi (the ones generated by mobigen.exe) and *.prc files?

Thank you for any help,

Frank
View user's profileFind all posts by frank1212Send private message
mobi_fabien


Joined: 08 Feb 2006
Posts: 2682
Location: Mobipocket
Reply with quote
Hello,

do you mean that you have create one subdoc per Wikipedia article?

Quote:
When I click on a link (MP reader on Symbian Series80, Nokia 9300i), it says Could not open D%3A%5Cebooks%5Csubdocs#wp_XRCO=20Award.

Does it work in the Reader Desktop?
I don't understand that part: subdocs#wp_XRCO=20Award
How do you name subdocs?

Quote:
What's the difference between *.mobi (the ones generated by mobigen.exe) and *.prc files?

None, just the extension.

Best regards,

Fabien

_________________
Symbian eBook Reader
eBook publishing tools
View user's profileFind all posts by mobi_fabienSend private message
frank1212


Joined: 16 Jul 2007
Posts: 21
Reply with quote
mobi_fabien wrote:
Hello,

do you mean that you have create one subdoc per Wikipedia article?

Quote:
When I click on a link (MP reader on Symbian Series80, Nokia 9300i), it says Could not open D%3A%5Cebooks%5Csubdocs#wp_XRCO=20Award.

Does it work in the Reader Desktop?


I have no idea. The desktop reader, I got to work under linux (4.9), is unable to open .mobi-files . Mobireader 6.0
is provided as msi-package only, which I'm unable to installe using wine. Is there a chance to get a .exe-installer or an archive containing a preinstalled version?

Quote:

I don't understand that part: subdocs#wp_XRCO=20Award
How do you name subdocs?


wpmp_0.mobi, wpmp_1.mobi, ...

Question is: How should I name them and how does a inter-file-link has to look like?

Regards,

Frank
View user's profileFind all posts by frank1212Send private message
Merging the indices
frank1212


Joined: 16 Jul 2007
Posts: 21
Reply with quote
I successfully distributed all the wikipedia articles to multiple "small" (~ 60MB) html-files, created one .opf per html file and compiled them. The result was a set of .mobi-files.

I created a master .opf-file, added the .opf-files of the "small" html-files as "mbp-special-child" manifest-items and expected the index
of the already compiled .mobi files to be extracted and be written to a new .mobi file.

This way, the last wikipedia was compiled. However - it doesn't work for me Sad .

Does anyone have an idea, what the problem might be?

I put a "small" wikipedia (just articles beginning with X and Q) on my ftp-server:

ftp://instantafs.cbs.mpg.de/wp

It has to be compiled this way:

Code:

mobigen.exe wpfb_0.opf
mobigen.exe wpfb_li.opf
mobigen.exe wpfb.opf


The last opf should generate the master index.

Regards,

Frank
View user's profileFind all posts by frank1212Send private message
How can I convert the Wikipedia?
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT - 5 Hours  
Page 1 of 1  

  
  
 Reply to topic  
Powered by phpBB © phpBB Group - Design by phpBBStyles.com | Styles Database.
Content © www.mobipocket.com

Home - eBooks - E-news - Free software - My shopping cart
Forum - Developers - Contact us - Privacy policy - Terms of trade

English German French Spanish