In the KDE 3.X series, KSpell2 was used to check for misspellings. Several plugins were provided for to allow various spellchecking engines to be used. Checking was done in one of two ways, either by checking in the background or via an interactive dialog. The background checking was implanted by checking a word each iteration of the event loop. Words were chosen to be checked by using a simple algorithm that worked for common European languages, but was of limited utility to those languages and scripts with more complexity. For the future, something else was needed...
The Future (KDE 4.X)
For KDE 4.X we've created Sonnet. Sonnet will include those functions provided by KSpell2, but will expand its scope. This includes grammar and style checking and providing the linguistic tools that underline them for application developers.
Standards
The algorithm to segment text into suitable chunks for checking is now based on the recommendation from the Unicode Consortium. The class will be extendable enough to provide special rules to conform to specific orthographic conventions. It has yet to be determined to what extent the end user will be allowed to customize their environment and what rules will be hidden in the implementation.
The language checking engines will be accessed from cross-platform libraries. Spelling engines will be provided by Enchant and grammar checking by Elixir. Enchant has been in use for some time by AbiWord. Elixir is currently being developed with the input of the developers of An Gramadóir, LanguageTool, and the maintainers of the AbiWord port of Link Grammar. The standard interfaces of both Enchant and Elixir will become part of a new Freedesktop.org spec that is being developed concurrently with Elixir. Once the spec is available, OpenOffice.org will consider allowing spec-conforming plugins to be used.
Which Language?
Sonnet will provide a new set of heuristics to determine which language a particular segment of text is written in. Global settings will provide a default list of languages most likely to be used in KDE. Application-specific settings will refine that list. Furthermore, applications can opt for language detection which will attempt to guess the languages in use. A language will be selected on a per paragraph basis. The language will be determined based on a statistical model of the language and its proximity to other languages using the same script. The likelihood of each of match will be weighted by both user settings and the language determined by the previous and next paragraph.
User Interface
The GUI is still under discussion. Work to be done includes crafting the Standard Checking Dialogs & Widgets (dialogs that appear when checking text and which allow you to iterate through errors) and Highlightling (the automatic highlighting of misspelled words, etc. within applications). Any suggestions, especially from usability exports is encouraged.
An early screenshot:
This shows a test app for An Gramadóir. The error is shown in bold. The tooltip has an explanation of the error. The small green box below is part of a WeaverThreadGrid, and shows the thread activity for background language checking.
Philosophy
KDE should support all languages for which users exist. This includes supporting diglossic languages in such countries where this is discouraged. Sonnet will provide additional facilties to assist application developers where such support is not provided by Qt.
Read more about the recent technical progress of Sonnet at http://jrideout.blogspot.com/2006/12/how-is-sonnet-stacking-up.html