Archive for May, 2013

Eurfa v3.0

May 23rd, 2013

In 2003 I started putting together a Welsh wordlist to help with KDE translation, since we were barred from using output from any of the publicly-funded lexical projects (!). In 2005 I put together a verb conjugator (Konjugator) to generate the inflected forms of around 4,000 verbs, and combined those with the wordlist to produce the first version of Eurfa in 2006, with a second edition following in 2007.

At the time it was published, Eurfa was the first Celtic dictionary to list mutated words and verb inflections (though others have copied that idea since). It is still the largest free (GPL) dictionary in Welsh (over 10,000 lemmas at the minute), and was used for the Apertium Welsh-English gist translator and for tagging 900k words of multilingual spoken conversations (BangorTalk).

The original 2007 website was still up until around 3 weeks ago, when server changes meant it stopped working. So I’ve given it a complete makeover using Joshua Gatcke’s very attractive HTML Kickstart. This included streamlining the contents. The old website had a lot of space devoted to proselytising openness, but that battle is pretty much won (except where Welsh language resources are concerned!) – the new Government Service Design Manual mandates a preference for open-source software, the UK Research Councils now have a policy of open access for the outputs of funded research, the Government has set up an open data website giving access to 9,000 public sector datasets, an open operating system (Android) is whuppin’ the ass of the proprietary operating systems, and so on. So the only extraneous bit of the old site I kept was the poem on Pangur Bán, which I think is as fresh and relevant now as it was when it was written by an Irish monk some 1,100 years ago!

I did take the opportunity of folding the conjugator into the new version of Eurfa – the Konjugator site went down some years ago, and I never bothered setting it up again. The current incarnation is much better, and perhaps I have learnt a little bit in the meantime, because the code for printing out the inflected tenses is only 15% the length of the previous code, but also handles periphrastic tenses (with auxiliary verbs)!

Rhymer is still there, allowing you to get lists of rhyming words – again, another feature that has been copied (but not bettered!) since.

I’ve continued work on Eurfa over the years, though much of that hasn’t made it into the wild. But I hope to add some nice features to the new site over the next 6-8 months.