All » Articles » Xaraya (4)
All » Downloads (7)

Spellchecker for Xaraya or other UTF-8 XML files

Posted by: Ferenc Veres on February 20, 2005 05:42:51 PM +00:00(269458 Reads)

This Perl script is an integration of other 2 scripts, allows users to spellcheck UTF-8 encoded XML files. The script is designed to spellcheck Xaraya CMS translations, but you can use it for other UTF8-XML files too. The source code is pre-configured for checking Xaraya files (e.g. default XML node names).

Spellchecking Xaraya:

If you use your national Xaraya site ( for translating the system, spellchecking Xaraya is done by spellchecking a downloaded local copy of the language pack, and fixing the errors online. You must run this program on those local files and fix the errors in the Translations module online on the NLS site.

This may sound a bit odd, but this maintains the advantages what you already had on the NLS site, like co-operation, BitKeeper push and so on. Believe me, the work is very simple and quick this way.

To help the translation, the temporary TXT files use a filename which refers to the real template file, thus you can identify which page to load in the Translations module. The name of the file is always displayed on the top of the spellchecker window (assuming you use ispell).

To spellcheck a module run a "find" command on your Linux, because this "lazy" script can spellcheck only a single file at a time (the unix philosophy..).

find modules/articles -name \*.xml -exec {} \;

Theoritically you could also save the changes back to the file direcly, if you want to fix a local copy of the language pack.


Original man page

(99% of it was written by the author of xml_spellcheck):




xml_utf8_spellcheck [options] <files>


xml_utf8_spellcheck lets you spell check the content of an XML file.  It extracts the text (the content of elements and optionally of attributes), decodes utf8 to latin1/2, call a spell checker on it and then recreates the XML document.


Note that all options can be abbreviated to the first letter These are the original options of
--conf <configuration_file>
Gets the options from a configuration file. NOT IMPLEMENTED YET.
--spellchecker <spellchecker>
The command to use for spell checking, including any option. By default "ispell -d magyar" is used
--backup-extension <extension>
By default the original file is saved with a ".bak" extension. This option changes the extension
Spell check attribute content. By default attribute values are NOT spell checked. NOT YET IMPLEMENTED
--exclude_elements <list_of_excluded_elements>
A list of elements that should not be spell checked
--include_elements <list_of_included_elements>
A list of elements that should be spell checked (by default all elements are spell checked).
"--exclude_elements" and "--include_elements" are mutually exclusive
--pretty_print <optional_pretty_print_style>
A pretty print style for the document, as defined in XML::Twig. If the option is provided without a value then the "indented" style is used
--spell_charset <character_encoding_name>
The encoding of the temporary file which is passed to the spellchecker.
Default is iso-8859-2.

Dislay the tool version and exit
Display help message and exit
Display longer help message and exit


To spellcheck one single file: my-utf8-file.xml

To spellcheck one complete directory use:

find modules/articles -type f -name \*.xml -exec {} \;


"<" : "lt;" replace happens inside CDATA elements too, which is a serious bug.


--conf option
--attribute option


XML::Twig, Getopt::Long, Pod::Usage, File::Temp XML::Twig requires XML::Parser.




This program is Copyright 2005 by Ferenc Veres
Original xml_spellcheck is Copyright 2003 by Michel Rodriguez

This program is free software; you can redistribute it and/or modify it under the terms of the Perl Artistic License or the GNU General Public License as published by the Free Software Foundation either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MER- CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

If you do not have a copy of the GNU General Public License write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


Integrated 2 scripts together: Ferenc Veres <lionNO@SPAMnetngine.NOhu>

Original Michel Rodriguez <mirodNO@SPAMxmltwig.NOcom>
Original Has no author name marked, sorry. (License: PD)

xml_utf8_spellcheck is available at


Note: Comments are owned by the poster. We are not responsible for their content.

#1 Smithk725 on June 30, 2018 05:14 AM
 re  John

Your goal is to breed all the different dragons kaadkbdacdabfkdd

#2 Pharmd114 on June 30, 2018 10:36 PM
 re  viagra from india

Hello! [url=]viagra from india[/url]

#3 Pharma467 on June 30, 2018 10:47 PM
 re  viagra from india


#4 Pharmf37 on July 02, 2018 04:24 AM
 re  viagra india

#5 cialis on July 03, 2018 05:14 AM
 re  generic cialis online

#6 buy_viagra on July 05, 2018 04:23 PM
 re  buy viagra

#7 cialis on July 17, 2018 06:06 PM
 re  generic cialis online

#8 buy_viagra on July 24, 2018 03:14 AM
 re  buy viagra

#9 viagra_online on July 28, 2018 12:33 AM
 re  viagra online

#10 buy_viagra on July 30, 2018 03:53 PM
 re  buy viagra

#11 canadian_viagra on August 04, 2018 08:48 AM
 re  canadian viagra

#12 buy_cialis on August 09, 2018 05:28 PM
 re  buy cialis

Post a new comment


(required but not published)

About me

Photo of me Ferenc Veres
web developer
about me

Commodore books
Commodore logo My C64 and Plus/4 book collection (Hungarian): Commodore könyvek
Featured article

Exisitng editors for text data DjVu files are quite limited, like for example DjVuSmooth. So I've implemented a new editor in JavaScript, that allows editing both the strucutre of the text (paragraphs, lines, words,...) and the coordinates of the text boxes by simply dragging with the mouse, features like create, delete, merge are also available.

My other websites
My hosting choice

My affiliate link to the hosting center:

InterServer Web Hosting and VPS