I’ve been working with Fran Tyers and the Apertium people over the past few months, and one of the issues for any MT system is dealing with the source language text that is fed into it. For interest, I decided to look at how an agglutinative language like Quechua might be dealt with, and the result is a very basic Quechua segmenter – there’s more info on the page. This needs much more work on the code (eg the ability to input connected, punctuated text) and a much bigger dictionary, but it actually works quite well.
-
About me
- I'm Kevin Donnelly, and I live in Llanfairpwllgwyngylch gogerychwyrndrobwllllantysiliogogogoch. Most of my projects relate to linguistics in some form or other (largely Welsh in the past), or to stuff like audio, electronics, typesetting, etc that I find interesting, and that I can work with on GNU/Linux. You can contact me directly on my first name, plus dotmon, and then add a com at the end ...