Show simple item record

Frequency lists of word parts from the GOS 1.0 corpus

CreatorDobrovoljc, Kaja
CreatorKrek, Simon
CreatorČibej, Jaka
CreatorArhar Holdt, Špela
Date2019-11-13T08:52:43Z
dc.date.accessioned2021-07-24T21:27:33Z
dc.date.available2021-07-24T21:27:33Z
Identifierhttp://hdl.handle.net/11356/1270
dc.identifier.urihttps://linghub.org/handle/123456789/924939
DescriptionFrequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or normalized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms, 3) 10 lists for initial or final word parts extracted from normalized word forms. In addition, 30 lists were extracted from all words (regardless of their part-of-speech category).
PublisherJožef Stefan Institute
PublisherCentre for Language Resources and Technologies, University of Ljubljana
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectfinal part of the word
Subjectspoken corpus
Subjectword parts
Subjectmorphology
Subjectinitial part of the word
TitleFrequency lists of word parts from the GOS 1.0 corpus
TypelexicalConceptualResource
TypeText
dcterms.available2019-11-13T08:52:43Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1270
dcterms.creatorDobrovoljc, Kaja
dcterms.creatorKrek, Simon
dcterms.creatorČibej, Jaka
dcterms.creatorArhar Holdt, Špela
dcterms.date2019-11-13T08:52:43Z
dcterms.descriptionFrequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or normalized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms, 3) 10 lists for initial or final word parts extracted from normalized word forms. In addition, 30 lists were extracted from all words (regardless of their part-of-speech category).
dcterms.identifierhttp://hdl.handle.net/11356/1270
dcterms.isReplacedByhttp://hdl.handle.net/11356/1366
dcterms.publisherJožef Stefan Institute
dcterms.publisherCentre for Language Resources and Technologies, University of Ljubljana
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectfinal part of the word
dcterms.subjectspoken corpus
dcterms.subjectword parts
dcterms.subjectmorphology
dcterms.subjectinitial part of the word
dcterms.titleFrequency lists of word parts from the GOS 1.0 corpus
dcterms.typelexicalConceptualResource
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.