Html2text

html2text is a Perl script which I use to convert HTML documents to plain text. You can fetch it from CVS via chiark's cvsweb interface; visit this URL:

http://www.chiark.greenend.org.uk/ucgi/~richardk/cvsweb/webtools/html2text

When you have downloaded the script, put it somewhere appropriate in your path and make it executable. Due to a limitation in the Perl HTML::Parser class, it requires the input to be "normalised", e.g. by a program such as sgmlnorm. For example:

sgmlnorm faq.html | html2text > faq.txt

sgmlnorm is part of the SP package, which you can find out more about at http://www.jclark.com.

Bugs

I've only bothered to get this program working as far as is necessary for the documents I currently want to convert: therefore it's entirely possible that it won't work very well for your documents. If find you have to modify it to get it to work for you, please consider sending me the diff.

Please report bugs to me at richard+html2text@sfere.greenend.org.uk. You should check that your bug is not fixed in a later version before doing so.

Copyright

Html2text is copyright © 2000 Richard Kettlewell. You may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

RJK | Contents