Login or register Large RSS Icon

odt2txt

odt2txt.py is a Python script that converts Open Document Text (ODT) files to plain text. The output text is marked up using Markdown syntax, which preserves some of the most important formatting. In other words, you get the best of both worlds. It's text, so you can use your favorite text-processing tools, e.g.

odt2txt.py myDoc.odt | less

On the other hand, enough formatting is preserved that the resulting text can be converted into HTML using markdown.py:

odt2txt.py myDoc.odt > tmp.txt
markdown.py tmp.txt > myDoc.html

You might want to have a look at a sample ODT document and the corresponding text and html files.

Status

The following ODT formatting is converted to corresponding Markdown syntax:

  • italics (becomes _italics_)
  • bold (becomes **bold**)
  • bold italics (***bold italics***)
  • simple ordered and unordered lists
  • block quotes (indented paragraphs become Markdown blockquotes)
  • code blocks (monospace paragraphs become Markdown code-blocks)
  • hyperlinks
  • footnotes

The following ODT features are not supported but hopefully will be soon:

  • simple tables
  • images

Installation and Usage

Download odt2txt.py then run it from the command line:

python odt2txt.py myDoc.odt > myDoc.txt

To convert it the file to HTML, use markdown.py:

python markdown.py -footnotes myDoc.txt > myDoc.html

License

The code is dual-licensed under GPL and BSD License. Other licensing arrangements can be discussed.

Change Log

Aptil 7, 2006: First version.

Powered by Sputnik | XHTML 1.1