There are some tools available to convert RTF files to text on Linux.
OpenSource Unoconv is a frontend written in Python for OpenOffice (“it needs a recent OpenOffice with UNO bindings”) to convert between many file formats.
Note: Libreoffice (a fork of OpenOffice), and probably OpenOffice itself, can also be invoked from the commandline.
libreoffice --invisible --convert-to pdf file1.doc
OpenSource AbiWord can be used from the command line to export to HTML and text.
abiword –to=txt –to-name=output.txt myfile.doc
The OpenSource DocToText by Silvercoders is available for Windows (where it runs out of the box) and Linux, where some adjustments are necessary.
- error: ./doctotext: error while loading shared libraries: libxlsreader.so.0: cannot open shared object file: No such file or directory
you have to add the dynamically linked libraries which this software brings along with it to your system
- error: ./doctotext: error while loading shared libraries: libgsf-1.so.114: cannot open shared object file: No such file or directory
apt-get install libgsf-bin
Under Debian / Ubuntu; Unfortunately this also installs the X-Window System.
DocSplit is an OpenSource project by the folks at DocumentCloud. Offers a wide array of conversion facilities, including OCR to UTF-8.
It will OCR the text for each page for which it fails to extract the text (using Tesseract as a backend for that).
Uses JODConverter, which in turn uses OpenOffice.
DocSplit is both a ruby gem, and a commandline tool.
“Because documents need to be in PDF format before any metadata, text, or images are extracted, it's faster to use docsplit pdf to convert it up front, if you're planning to run more than one extraction. Otherwise Docsplit will write out the PDF version to a temporary file before proceeding with each command.”
CatDoc reads Microsoft Word files and outputs text to the standard output.
A Python script frontend to OpenOffice conversion. According to the author, meant as easier command line option than JODConverter.
Java OpenDocument Converter, uses OpenOffice as backend. Also includes command line tools, from the same author as PyODConverter. It is no longer mantained, the author would be happy for someone to fork him on GitHUB.
AntiWord exists for a huge number of platforms; Unfortunately, it opens .doc documents only.
OpenSource wvWare reads Word formats, there are some tools for command-line usage, but the author recommends to use Abiword to do conversion tasks. Abiword uses wvWare libraries to do Word file handling internally.
GNU’s UnRTF converts RTF to HTML, which in turn can be converted into other formats.
RTF to HTML converter
This platform-independent tool Converts RTF to HTML file (in ISO-8859-2 encoding)
Sources: SuperUser question