Free software and languages, not necessarily in that order…
Fings wot I have wrote
Stuff I've had a hand in writing or publishing in one form or another.
Go to the websiteFree (GPL) Welsh dictionary
The largest Welsh dictionary under a free license, and includes verbal inflections and mutated forms.
Go to the websiteWrite Swahili in Arabic script
Tools to make Swahili in Arabic script as easy to use as Swahili in Roman script, with provision for traditional manuscript poetry.
Go to the websiteFree (GPL) Korean dictionary
16,000 words searchable in hangeul, English, or romanisation.
Go to the websiteTraditional Swahili ballad
Annotated edition of a previously unpublished ballad, using Andika! to produce the Arabic-script text
Go to the websiteCorpus of Welsh plays
Showcases plays from Wales, no matter their period or language.
Go to the websiteBilingual conversational corpora
Welsh-English, Welsh-Spanish and Spanish-English corpora for linguistic research on code-switching.
Go to the websiteJan Deloof's Breton-Dutch dictionary
Detailed dictionary with 40,000 entries
Go to the websiteAutomated glossing for Welsh
New, faster version of the Bangor Autoglosser, aimed at POS-tagging written Welsh text rather than conversational multilingual text.
Go to the websiteTerry Duval's Māori gainword corpus
120,000 words (around 6,000 tokens) drawn from citations of gainwords (loanwords or borrowings) in Māori-language publications printed between 1815 and 1899.
Go to the websiteWelsh/English corpus of Assembly Proceedings
360,000 aligned sentences in Welsh and English
Go to the websiteCorpus of conversational Welsh
Searchable version of the BangorTalk Siarad corpus
Go to the websiteCorpus of Patagonian Welsh
Searchable version of the BangorTalk Patagonia corpus
Go to the websiteSpanish-English conversational corpus
Searchable version of the BangorTalk Miami corpus
Go to the websiteProof-of-concept Gàidhlig autoglosser
Two small POS-tagged corpora, and a small GPLed dictionary
Go to the websiteWelsh Wikipedia corpus
4m-word corpus drawn from the Welsh Wikipedia as it was on 30 December 2013
Go to the websiteWelsh/English corpus of software translations
43,000 aligned items drawn from projects to translate free or open software into Welsh.
Go to the websiteBob Morris Jones's language acquisition corpora
Web interface to the CIG1 and CIG2 corpora, which focus on child language acquisition in Welsh
Go to the websiteTagger for Welsh, Spanish and English
Collection of tools used to POStag the BangorTalk corpora.
Go to the websiteProof-of-concept Māori autoglosser
A small POS-tagged corpus, and a small GPLed dictionary
Go to the websiteSwahili Wikipedia corpus
A 2.8m-word corpus drawn from the Swahili Wikipedia as of December 2015.
Go to the websiteSwahili verb segmenter
Allows Swahili verbforms to be segmented for use in parsers or taggers.
Go to the websitePitch contours in LaTeX
Gives a visual indication of pitch patterns.
Go to the websiteWelsh-English translator
Experimental translator aiming to give at least the gist of a Welsh text in English.
Go to the websiteSwitch languages on webpages
Provides a dropdown the reader can use to select the language of the webpage.
Go to the websiteWelsh rhyming dictionary
Uses Eurfa to produce lists of rhyming words in order of length, with shorter words at the top of the list.
Go to the website