Papers of BAS. Humanities and Social Sciences

Vol. 5, 2018, No. 1

THE CORPUS OF SPOKEN BULGARIAN

Yovka Tisheva, Marina Dzhonova, Kjetil Rå Hauge

Abstract. The paper presents the features of a corpus compiled exclusively of data on spoken Bulgarian. The strategy for building the corpus is motivated by the features of spoken communication and the aim is to preserve the characteristics of spoken language when transcribed into a text file. The transcription and annotation systems provide a clear and accessible representation of language varieties used in formal and informal contexts. The pragmatic and socio-cultural information given about the speakers and the settings makes the corpus data applicable also in a wider field of humanitarian studies.

Keywords: Corpus, spoken communication, spoken Bulgarian, transcription