Gloss parts of speech in Welsh

Page language:

You can test Autoglosser2 by using the web interface below, which handles up to 300 characters of text (on average, around 50 words) - anything over that will not be processed. Just type your Welsh text into the box, and press Gloss it!. If you can't think of any Welsh to use, try some of the samples below the box. The glossing works best if you enter "written" Welsh, i.e. with correct spelling and punctuation. Words that are not in Eurfa yet (which may also affect the glossing accuracy) are marked in red. If you need to tag large amounts of text, it's best to install Autoglosser2 on your local machine - see the manual - Appendix A has full installation instructions.


Layout format:






Samples for testing

Mae Lois yn gwneud cacen. Mae Steffan wedi mynd i'r siop. Mae Owain yn darllen llyfr. Lle mae Rhian? Ydy hi yn yr ardd?

Dim ond lleuad borffor
Ar fin y mynydd llwm,
A sŵn hen afon Prysor
Yn canu yn y cwm.

Tra'n astudio yno, mynychodd sioe gan yr hypnotydd Martin Taylor a chafodd ei ysbrydoli i ddilyn gyrfa mewn hypnosis a lledrith.

Cadwyd at ffiniau siroedd Lloegr ond yng Nghymru cymhlethwyd y sefyllfa drwy ychwanegu cynghorau dosbarth at rai siroedd a rhannu eraill.

Mae'r polisi wedi cael ei gysylltu â chynnydd yn y nifer o erthyliadau gorfodol, babanladdiad benywaidd, a than-adrodd genedigaethau benywaidd, ac awgryma rhai taw dyma yw'r rheswm tu ôl anghydbwysedd rhyw Tsieina.

Fe'i lleolir mewn ardal ffrwythlon, diolch i ddyfroedd yr Ewffrates; tyfu ffrwythau, grawnfwyd a chynhyrchu brethyn yw'r prif ddiwydiannau.

Y tro hwn, bydd Dai Jones, Llanilar yn cael cwmni'r gantores Linda Griffiths, wrth iddi droedio bro ei mebyd yn Sir Drefaldwyn. Byddant hefyd yn mwynhau'r golygfeydd ac awyr iach o gwmpas ei chartref presennol yn ardal Penybont.

The glossed/tagged text will appear here.


Background

Autoglosser2 is a glosser/tagger for Welsh, using the Eurfa Welsh dictionary. The Autoglosser2 code (GPL) is in a Bitbucket repository, and a detailed manual is available. Autoglosser2 is a heavily revised version of the Bangor Autoglosser, which was developed to gloss the Bangor corpora of multilingual conversational text. Autoglosser2, on the other hand, is aimed at written rather than spoken Welsh text, and has been refactored to tidy the code and make it far faster (over 22,000 glosses per minute).

References:

Broersma, M., D. Carter, and K. Donnelly (2020). Triggered codeswitching: Lexical processing and conversational dynamics. Bilingualism: Language and Cognition. 23(2):295-308.

Deuchar, M., P. Webb-Davies, and K. Donnelly (2018). Building and Using the Siarad Corpus: Bilingual conversations in Welsh and English. Studies in Corpus Linguistics 81. John Benjamins.

Carter, D., M. Broersma, and K. Donnelly (2016). Applying computing innovations to bilingual corpus analysis. In: A. Alba de la Fuente, E. Valenzuela, and C. Martínez-Sanz (Eds.), Language Acquisition Beyond Parameters: Studies in Honour of Juana M. Liceras, Number 51 in Studies in Bilingualism, pp. 281–301. John Benjamins.

Deuchar, M., K. Donnelly, and C. Piercy (2016). Mae pobl monolingual yn minority: Factors favouring the production of code-switching by Welsh-English speakers. In: M. Durham and J. Morris (Eds.), Sociolinguistics in Wales, pp. 209–239. Palgrave Macmillan.

Donnelly, K. and M. Deuchar (2011). Using constraint grammar in the Bangor Autoglosser to disambiguate multilingual spoken text. In: Constraint Grammar Applications: Proceedings of the NODALIDA 2011 Workshop, Riga, Latvia, NEALT Proceedings Series, Tartu, pp. 17–25.