Kig is a web interface to the CIG1 and CIG2 corpora, which focus on child language acquisition in Welsh. They were assembled by Bob Morris Jones and colleagues. Detailed information about CIG1 and CIG2 is available at the Child Language Databases website, and the transcriptions are available from the CHILDES website.
The search boxes above allow you to search for a word across all files in the CIG1 and CIG2 corpora - when you enter a word, 20 utterances in the corpus containing that word will be shown. For readability, most of the transcription marking is removed.
You can search for words used by a child, or for words used by an adult, "non-child" being defined here as any speaker who is not identified as a child, target child, or playmate.
CIG1, created in 1996, consists of 84 hours of transcribed recordings from children aged 18-30 months, 4 from North Wales (Alaw, Dewi, Elin and Rhys) and 3 from Mid Wales (Bethan, Melisa and Rhian).
CIG2 consists of 120 hours of transcribed recordings from 469 children from across Wales aged 3-7. The recordings were collected in 1974-7, and transcribed in 1999-2000.
Other key parameters of the corpora are set out in the following table:
CIG1 | CIG2 | |
---|---|---|
Files | 168 | 239 |
Total utterances | 78766 | 151422 |
Total tokens | 304846 | 566140 |
Total types | 5498 | 12206 |
Non-child utterances | 25286 | 40237 |
Non-child utterances % | 32% | 27% |
Non-child tokens | 222390 | 103755 |
Non-child tokens % | 73% | 18% |
Non-child types | 4869 | 4043 |
Non-child types % | 89% | 33% |