Archive for August, 2007

Quechua segmenter

August 24th, 2007

I’ve been working with Fran Tyers and the Apertium people over the past few months, and one of the issues for any MT system is dealing with the source language text that is fed into it. For interest, I decided to look at how an agglutinative language like Quechua might be dealt with, and the result is a very basic Quechua segmenter – there’s more info on the page. This needs much more work on the code (eg the ability to input connected, punctuated text) and a much bigger dictionary, but it actually works quite well.