Words

Fings wot I have wrote

Stuff I've had a hand in writing or publishing in one form or another.

Go to the website

Eurfa ^v3

Free (GPL) Welsh dictionary

The largest Welsh dictionary under a free license, and includes verbal inflections and mutated forms.

Go to the website

Andika!

Write Swahili in Arabic script

Tools to make Swahili in Arabic script as easy to use as Swahili in Roman script, with provision for traditional manuscript poetry.

Go to the website

KoSajeon

Free (GPL) Korean dictionary

19,000 words searchable in hangeul, English, or romanisation.

Go to the website

KoSeg

Free (GPL) Korean word segmenter

Assists learning by breaking down words into components.

Go to the website

Utenzi wa Jaafari

Traditional Swahili ballad

Annotated edition of a previously unpublished ballad, using Andika! to produce the Arabic-script text

Go to the website

Dramâu Cymru

Corpus of Welsh plays

Showcases plays from Wales, no matter their period or language.

Go to the website

BangorTalk

Bilingual conversational corpora

Welsh-English, Welsh-Spanish and Spanish-English corpora for linguistic research on code-switching.

Go to the website

Deloof

Jan Deloof's Breton-Dutch dictionary

Detailed dictionary with 40,000 entries

Go to the website

Autoglosser ^v2

Automated glossing for Welsh

New, faster version of the Bangor Autoglosser, aimed at POS-tagging written Welsh text rather than conversational multilingual text.

Go to the website

Duval

Terry Duval's Māori gainword corpus

120,000 words (around 6,000 tokens) drawn from citations of gainwords (loanwords or borrowings) in Māori-language publications printed between 1815 and 1899.

Go to the website

Kynulliad3

Welsh/English corpus of Assembly Proceedings

360,000 aligned sentences in Welsh and English

Go to the website

SiarCorp

Corpus of conversational Welsh

Searchable version of the BangorTalk Siarad corpus

Go to the website

PatCorp

Corpus of Patagonian Welsh

Searchable version of the BangorTalk Patagonia corpus

Go to the website

MiCorp

Spanish-English conversational corpus

Searchable version of the BangorTalk Miami corpus

Go to the website

Gàidhlig

Proof-of-concept Gàidhlig autoglosser

Two small POS-tagged corpora, and a small GPLed dictionary

Go to the website

Kwici

Welsh Wikipedia corpus

4m-word corpus drawn from the Welsh Wikipedia as it was on 30 December 2013

Go to the website

Korrect

Welsh/English corpus of software translations

43,000 aligned items drawn from projects to translate free or open software into Welsh.

Go to the website

Kig

Bob Morris Jones's language acquisition corpora

Web interface to the CIG1 and CIG2 corpora, which focus on child language acquisition in Welsh

Go to the website

Autoglosser ^v1

Tagger for Welsh, Spanish and English

Collection of tools used to POStag the BangorTalk corpora.

Go to the website

Māori

Proof-of-concept Māori autoglosser

A small POS-tagged corpus, and a small GPLed dictionary

Go to the website

Swwiki

Swahili Wikipedia corpus

A 2.8m-word corpus drawn from the Swahili Wikipedia as of December 2015.

Go to the website

Swaseg

Swahili verb segmenter

Allows Swahili verbforms to be segmented for use in parsers or taggers.

Go to the website

tikz-pitch-contour

Pitch contours in LaTeX

Gives a visual indication of pitch patterns.

Go to the website

apertium-cy

Welsh-English translator

Experimental translator aiming to give at least the gist of a Welsh text in English.

Go to the website

langswitcher

Switch languages on webpages

Provides a dropdown the reader can use to select the language of the webpage.

Go to the website

Rhymer

Welsh rhyming dictionary

Uses Eurfa to produce lists of rhyming words in order of length, with shorter words at the top of the list.

Go to the website

Words

Eurfa v3

Andika!

KoSajeon

KoSeg

Utenzi wa Jaafari

Dramâu Cymru

BangorTalk

Deloof

Autoglosser v2

Duval

Kynulliad3

SiarCorp

PatCorp

MiCorp

Gàidhlig

Kwici

Korrect

Kig

Autoglosser v1

Māori

Swwiki

Swaseg

tikz-pitch-contour

apertium-cy

langswitcher

Rhymer

Eurfa ^v3

Autoglosser ^v2

Autoglosser ^v1