Home  Beta programs 
  Welcome to Mobipocket Developer Center
powered by FreeFind

Creating Content

Getting Started
What is the Mobipocket file formatHow do I create a Mobipocket eBookStandard eBooksAdvanced eBooks
eBook features
Cross-platform feature supportImage supportTable supportCover PageParagraph rendering and hyphenationHyperlinksGuide itemsFramesIndexes and DictionariesSectionsAuthoring tips
Extended eBook features
DatabasesSQL queriesJavascriptHTML forms
Reference
Open-eBook HTML tagsSupported HTML entitiesHTML form tagsMobipocket custom tagsOPF x-metadata tagsMobipocket URLsMobipocket Document Object Model (DOM)Mobipocket Active Server Pages (ASP)Mobipocket Active Data Objects (ADO)Mobipocket Javascript Objects
Medical
Drug interaction module
Advanced topics
Setting margins
Home | Hide TOC | Download Sample | Add to Favorites updated: 2008-02-15

How to make dictionaries and indexes


Contents
Introduction
Indexes and dictionaries
  Reference list of <idx> tags
  Inflections for dictionaries
  What is the format string? requires Reader 5
How to open the Index search in the Reader
  Examples
  Index search functions
  Common parameters for index functions
Custom OPF metadata for dictionaries
Samples
Testing
  With the Emulator
  Testing on a PDA device

1. Introduction

The Mobipocket Index publishing tools enable to produce eBooks which include alphabetical index searching capabilities and dictionaries that can be used in lookup functions. A dictionary is an eBook .prc file - see documentation on the Mobipocket Publisher software for a definition of the .prc file format. Like any other eBook file, the dictionary eBook :

  1. Contains all the rich formatting of an OEB publication : HTML 3.2 formatting, images, tables, hyperlinks, style-sheets, a table of contents, etc..
  2. Is a cross-platform file: the same dictionary .prc file can be used on any PalmOS, Windows CE/PocketPC, Franklin eBookman, Epoc32/Nokia9210 device, as well as on PC Windows.
  3. Can be distributed through the secure Mobipocket DRM (Digital Rights management) system, which will ensure the protection of copyright against illegal duplication and modification of the content.

In addition to this index lookup functions enable a quick search for any word in the dictionary.

2. Indexes and dictionaries

The publishing tool builds indexes into an eBook .prc file based on the entries that are marked up in the OEB source with a set of <idx> XML tags. One or more indexes can be built into the eBook. Production of the OEB source is out of the scope of the Mobipocket publishing tools : the data is generally output from a database (Access, SQL, XML, ...), and written into the OEB/HTML file by a software program.

N.B. : After adding the <idx> mark-ups, the source is still an X-HTML valid publication.

2.1 Reference list of <idx> tags

<idx:entry>..</idx:entry>

Marks the scope of an entry in the index

<idx:entry name="xxx"> :

Use the  name attribute to identify an index when there is more than one index in the ebook.

<idx:orth>Label of entry in Index</idx:orth>

Marks the text that will appear in the index search box for that entry.

Note: the label of the entry is limited to 127 characters in the index search view. If longer than 127 characters, the full text will be visible in the flow of the book but only the first 127 characters will be used in the index search.

<idx:orth value="Label of entry in Index"/> :

Use the  value attribute to include text for the label in the entry that you do not want to display in the OEB flow

<idx:orth format="some format string"/> :

Use the  format attribute to specify the format that should be applied to this label of the entry. The formatted text will appear in the index search box for this entry. requires Reader 5 Click here for more details about the format string.

<idx:key name="xx">..</idx:key>

Enables to search for an entry in the index by an alternative key. You can specify one or more alternative keys. Use the type attribute to distinguish between key searches.

Example in an address book : you can search for an entry by the Name of the person; and as an alternative search, you can search for an entry by Company, or by City. In a first step in the index search box, you will enter the company name, and when selecting a company, this opens a second window with a list of names of people belonging to that company.

<idx:entry>
<idx:orth>John Martin</idx:orth>
Company : <idx:key name="company">Mobipocket</idx:key>
City : <idx:key name="city">Seattle</idx:key>
Phone number : 01010101
</idx:entry>
<idx:key key="xxx"> :

Use the key attribute to include text for the alternative key that you do not want to display in the OEB flow

<idx:key each-word="true"> :

Use the each-word if you want to include every single word in the labeled string of text as an entry in the alternative key search.

<idx:key scriptable="yes"> :

The only possible value for the scriptable attribute is "yes". Set this attribute to specify that this key should be accessible from the index. requires Reader 5

  • <idx:short> : used in dictionaries to mark the scope of the text that will be displayed in the popup window when selecting the word entry within any ebook
  • <idx:gramgrp infl="xxx"> : used in dictionaries to indicate the list of inflections attached to a grammatical group

See the samples included in the SDK for examples on the use of these tags.

Note that the TEI tags used in Microsoft Reader dictionary publications are also supported : <tei-ms:entry> , <tei-ms:orth>, etc...

<idx:key> and <idx:orth> tags also support the style and the indent attributes:

  • style attribute: possible values are bold, italic, underlined and inactive. Values can be combined in a comma-separated list. This attribute specifies how a keyword will be displayed in the index search mode of the reader.
  • indent attribute: the only possible value is "1". If set, the entry will be displayed indented in the index search. Also, the part of text of the entry that is the same as the text of the first non-indented preceding (in alphabetical order) entry will not be displayed.

Example: Here is a sample definition:

<idx:entry>
   <idx:orth style="bold, inactive">Cleopatra</idx:orth>
   <idx:orth style="italic" indent="1">Cleopatra, the life of the queen of queens</idx:orth>
   <idx:orth style="bold" indent="1">Cleopatra, everything about her nose</idx:orth>
   <idx:orth indent="1">Cleopatra, greatest achievements</idx:orth>
</idx:entry>

And here is the way it will be displayed in the index search mode of the reader:

Cleopatra
     the life of the queen of queens
     everything about her nose
     greatest achievements

<idx:string name= "xxx" value="xxx" />

Defines a non searchable field wich contains string data in this index. Use the name attribute to specify the name of the field and the value attribute to specify the content of the field. requires Reader 4.8

<idx:string name="email" value="John@mail.com" />
<idx:string name="email" value="MJohn@mail.com" />

Defines a multi-valued field "email" which contains two values for the current entry

The main purpose of <idx:string/> is to make the content of this field accessible in javascript functions.

Example :

...
<idx:entry name="contact">
<idx:orth>John Martin</idx:orth>
<idx:string name="email" value="John@mail.com" />
<idx:string name="email" value="MJohn@mail.com" />
</idx:entry>
...

in some javascript function:
var WordEntry = current_index_entry('contact');
var Email = WordEntry.email;
var Emails = WordEntry.Fields('email');
var count = Emails.Count;

After a call to this javascript function,
Email = John@mail.com (i.e the first value of the field "email" in the current entry)
Emails[0] = John@mail.com
Emails[1] = MJohn@mail.com
count = 2 (i.e the number of values in the field "email")

<idx:subentry name= ""/>

Defines a part of the entry. The main purpose of this tag is to make easily accessible a part of the OEB flow via some link on the displayed page: the name of the subentry acts like an anchor. requires Reader 4.8

Example:

...
<idx:entry name="contact">
<idx:orth>John Martin</idx:orth>
<idx:subentry name= "more_details"/>
...
in some javascript function :
var WordEntry = current_index_entry('contact');
window.open(WordEntry.more_details.anchor);

This javascript function will jump to the position of <idx:subentry name = "more_details"/> tag in the current entry and the corresponding page will be displayed.

<idx:entry id="id1">
   <idx:orth ></idx:orth>
</idx:entry>
<idx:ext-subentry name= "residence" id="id1"/>

Defines a subentry which is linked to the main entry but whose OEB flow is not part of the main entry's one. The value of the attribute id in <idx:ext-subentry> tag and the value of the attribute id in <idx:entry> tag must be the same.
In javascript functions, external subentries can be accessed exactely the same way as the subentries.
requires Reader 4.8

See the sample dictionary included in the SDK for examples on the use of these tags.

2.2 Inflections for dictionaries

Inflections are handled by the inflection index which is built into the dictionary by the Creator software based on the inflected forms which are tagged in the content using the <idx:infl> tag. Inflections are attached to the orthography of the entry. They must be psecified inside of an <idx:orth> tag. If an entry has multiple orthographies, each must have its own inflections.

Example:

<idx:orth>record
  <idx:infl inflgrp="noun">
    <idx:iform name="plural" value="records" />
  </idx:infl>
  <idx:infl inflgrp="verb">
    <idx:iform name="present participle" value="recording" />
    <idx:iform name="past participle" value="recorded" />
    <idx:iform name="present 3ps" value="records" />
  </idx:infl>
</idx:orth>

The "inflgrp" and "name" attributes are optional. "idx:infl", "idx:iform" and the "value" attribute are manfdatory.

The Creator software uses a powerful algorithm to build the inflection index which allows to dramatically reduce the size required for the index : inflections are not stored as entries in the index, but are deduced from a set of rules, which are automatically generated based on the inflected forms contained in the publication. This applies to any language.

When reading an eBook with the Mobipocket Reader (version 4.3 onwards) : selecting any word in the text of the eBook brings up a popup menu which allows to search for the definition of the selected word in any dictionary available on the PDA. Selecting an inflected form will bring up the base form.

Previous versions of the file format supported another way of specifying inflected forms. You could use the "infl" attribute in either the <idx:orth> or the <idx:gramgrp> tag and specify a comma-separated list of inflected forms. This syntax is now deprecated.

2.3 What is the format string? requires Reader 5

NB : Formatting can be applied only on named indexes.

During an alphabetical search in Mobipocket Reader, entries of an index are sorted according to data in the field defined by "<idx:orth>" tags. However the text that will appear in the results list of the index search is defined by the attribute "format".

  • If no format has been specified, the text that will be displayed is the content of the "<idx:orth>" field.
  • If a format string has been specified, the text that will be displayed follows the rules of the format string.

The rules of the format string are as follows:

  1. Any substring that contains exclusively alphanumeric characters ( [a-z][A-Z] ) and the "_" character is an identifier. In the results list of the index search, all identifiers will be replaced by their value or their first value if they refer to a multi-valued field in the index.
    Identifiers can be "orth" when refering to the content of "<idx:orth>" field or the name of any other field defined using "<idx:string>" tag or "<idx:key>" tag, provided the attribute "scriptable" of "<idx:key>" was set to "yes".

    A sample of tagged HTML

    <idx:entry name="furniture" scriptable="yes">
       <idx:orth format="orth - color - location">chair</idx:orth>
       <idx:string value="brown" name="color" />
       <idx:string value="black" name="color" />
       <idx:key value="office" name="location" scriptable="yes"/>
    </idx:entry>

    The text that will be displayed in the index search page for this entry :

    chair - brown - office


  2. Any string that is placed in between single quotes will be considered as a raw string and will be displayed as, in the index search results list.

    A sample of tagged HTML

    <idx:entry name="furniture" scriptable="yes">
     <idx:orth format="orth - 'exists in' color">chair</idx:orth>
       <idx:string value="brown" name="color" />
    </idx:entry>

    The text that will be displayed in the index search page for this entry :

    chair - exists in brown


  3. Two single quotes will be displayed as a single quote.

NB: Rule no.2 and rule no.3 imply that the format string must have an even number of single quotes!
  1. The character "|" placed just after an identifier marks the begining of a separator string which ends when another "|" is found. In the index search results list, the identifier will be replaced by the multiple values of the field it refers to, separated by the separator string.
    NB: Only the first "<idx:orth>" of an entry is accessible for the moment.
    NB: The separator string follows rule no.2 and rule no.3
    NB: The separator string must be terminated, otherwise prcgen will cause an error.

    A sample of tagged HTML

    <idx:entry name="furniture" scriptable="yes">
       <idx:orth format="orth - 'exists in' color|, |">chair</idx:orth>
       <idx:string value="brown" name="color" />
       <idx:string value="black" name="color" />
    </idx:entry>

    The text that will be displayed in the index search page for this entry :

    chair - exists in brown, black
The format attribute is compatible with style, indent and numbered attributes of <idx:orth> tag.

See the sample furniture catalog included in the SDK for examples on the use of the format string.

3 How to open the index search in the Reader

The following sample of tagged HTML will be used in all examples below:

<idx:entry name="myfriends">
Name : <idx:orth>John Martin</idx:orth>
Company : <idx:key name="company">Mobipocket</idx:key>
City : <idx:key name="city">Seattle</idx:key>
Phone number : 01010101
</idx:entry>

3.1 Examples

You can add script functions within the HTML source of the eBook to open the index search screen. Below are a few syntax examples. For a full reference of index search functions, see the next paragraph.

<a onclick="index_search()" >Search in the Address book</a>

Opens the index search screen with a list of names in alphabetical order. The first index in the book is used if you do not specify the name of a given index.

<a onclick= "index_search('myfriends', '', 'John')" >Open the Address book at name John</a>

Opens the index search screen with a list of names in alphabetical order, starting with the entry named 'John'. 'John' is automatically pasted in the ibnput box of the search screen and the list is scrolled to the corretc alphabetical position.

<a onclick= "filtered_index_search('myfriends', 'company')">Search by Company</a>

Opens the index search screen with a list of Companies in alphabetical order; after selecting a company, the index will display a second window with the list of names under that company

<a onclick= "filtered_index_search('myfriends', 'city')">Search by City</a>

Opens the index search screen with a list of Cities in alphabetical order; after selecting a city, the index will display a second window with the list of names listed under that city

<a onclick= "cond_index_search('myfriends', 'company', 'Microsoft')"></a>

Opens the index search screen with a list of names in alphabetical order listed under the company name Microsoft

You can also add items in the Guide of the .opf publication file. They will appear in the top right page menu of the Reader.

<guide> <reference type="names" title="Search by name" onclick= "index_search()"/> <reference type="company" title="Search by company" onclick= "filtered_index_search('myfriends', 'company')"/> </guide>

3.2 Index search functions

The full list of index search functions is: Please follow the links for reference details.

3.3 Common parameters for index functions

In index search functions, initial parameters can differ, but they all share the same last 3 parameters.

Frameset parameter

Frameset is the name of the frameset to be used around the index search control. If empty, it defaults to the current one. If you want to display the index search screen without a frameset, specify the name of a frameset without frames, or more easily the name of a frameset that does not exist like "nothing" for example.

Callback parameter requires Reader 4.8

JSCallback is the name (string) of a global JavaScript function that will be called when the user clicks on an entry in the index search screen. The default behaviour whan clicking on an entry is to jump to the HTML part of that entry. This parameter allows you to override that behaviour.

The callback function must have a single parameter of type RecordSet. When the callback is called, it is passed the RecordSet corresponding to the search set of the index search (simple index or SQL request results) positioned on the item the user clicked. You can then use various RecordSet properties to interact with the selected entry. See the full list here: Object RecordSet

The following sample is an implementation, using the JSCallback parameter, of the default index search behaviour, i.e. a simple jump to the destination entry:

function f_jscallback_default_jump(input_recordset)
{
   window.open(input_recordset.anchor);
}

And here is how you call it:

sql_search('SELECT * FROM myfriends', 'Caption string', 'nothing', 'f_jscallback_default_jump')

Configuration flags requires Reader 5.0

The last common parameter is a bit field with various configuration flags. They control the appearance of the index search screen as well as the list of algorithms used in searches. This parameter is a numeric value. Its valus should be the sum of all desired flag values.

A first set of flags enables extra search algorithms wich will be executed if the user types something in the input box and, instead of clicking on a entry in the list belox the box, validates the string (for example by pressing the ENTER key)

Value Description
1 Disinflection on ENTER. For this option to be available, inflection data must have been added to the document using <idx:orth infl="..."> or <idx:gramgrp infl="..."> tags. See paragraph 2.2 Inflections for dictionaries" for details.
2 Compound word search on ENTER. Compound word search can be useful on "sticky" languages like german. The compound word search algorithm can break a word like "Umweltverschmutzung" into its components "Umwelt" and "Verschmutzung" before performing the search.
4 Spell correction on ENTER. Data must be compiled using the <idx:entry spell="yes"> syntax for this feature to be available.
8 Wildcard search on ENTER. Data must be compiled using the <idx:entry wild="yes"> (or <idx:entry spell="yes">) syntax for this feature to be available. Two wilcards can be used: * matches any number of characters, ? matches exactly one character.

A second set of flags controls the appearance and behaviour of the index search screen.

16 Highlight first exact match in the alpha-list (if not set, first canonized match is highlighted)
256 If this flag is set, the input box does not appear in the index search screen. The alpha-list is still active though. When two consecutive search screens are used (filtered_index_search) this flag only affects the first one.
512 No alpha-list if nothing is typed in the input box. The list only appears when you type something. When two consecutive search screens are used (filtered_index_search) this flag only affects the first one.
1024 No underline for links in alpha-search (recommended for asian scripts only).
2048 No input box in last screen. Same as flag 256 for the last screen when using filtered_index_search or sql_search.
4096 No alpha-list if nothing typed in last screen. Same as flag 512 for the last screen when using filtered_index_search or sql_search.

Example: filtered index search with the alpaha-search initially hidden in the first screen and no input box in the second screen:

filtered_index_search('myfriends', 'company', 'Please enter company', 'Please select a person', '', '', '', 512 + 2048)

A third set of flags controls the IME (Input Method Editor) in the input box of the index search screen. For example, you can use it to force hiragana input with no kanji converion in a Japanese dictionary. These flags are referenced here for future use but migh not yet be available on all platforms. Please be aware that only one IME flag can be used at a time.

0x10000 Force input method to Japanese Katakana (without kanji conversion).
0x20000 Force input method to Japanese Hiragana (without kanji conversion).
0x30000 Force input method to Latin (native keyboard input).
0x40000 Force input method to Chinese PinYin (without ideogram conversion).
0x50000 Force input method to Korean Hangul.

4.Custom OPF metadata for dictionaries

You also need to set source language and target language for dictionaries. If a dictionary has multiple indexes, you also have to specify the name of the primary lookup index .

<x-metadata>
<DictionaryInLanguage>en-us</DictionaryInLanguage>
<DictionaryOutLanguage>en-us</DictionaryOutLanguage>
<DefaultLookupIndex>Index Name goes here</DefaultLookupIndex>
...
</x-metadata>

5. Samples

Check sample of standard Mobipocket eBooks:

In the samples for this section (download lint at the top of this page):

dictionary.opf : sample dictionary.

furniture.opf : example of a furniture catalog with formatted text in index search.

6. Testing

6.1 With the Emulator

The Mobipocket Reader Emulator for PC enables you to test the rendering of an eBook on a PC with customizable skins for all the PDA platforms : PalmOs, WindowsCE, Pocket PC, Franklin eBookman, Epoc32.

After you have installed the Mobipocket Emulator on your PC, to open a dictionary, right-click on the dictionary ".prc" file in your Windows Explorer, and select "Open with Mobipocket Reader Emulator".

- Look-up functions : selecting any word in the text of an eBook brings up a popup menu which allows to search for the selected word in all the dictionaries available on the device. inflections are handled by the inflection index which can be built into a dictionary.

6.2 Testing on a PDA device

Dictionary files can be loaded onto any PDA with the Mobipocket Reader.

Copyright 2000-2007 Mobipocket.com