Creator | Dobrovoljc, Kaja | |
Creator | Krek, Simon | |
Creator | Čibej, Jaka | |
Creator | Arhar Holdt, Špela | |
Date | 2019-11-13T08:52:43Z | |
dc.date.accessioned | 2021-07-24T21:27:33Z | |
dc.date.available | 2021-07-24T21:27:33Z | |
Identifier | http://hdl.handle.net/11356/1270 | |
dc.identifier.uri | https://linghub.org/handle/123456789/924939 | |
Description | Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or normalized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted:
1) 10 lists for initial or final word parts extracted from lemmas,
2) 10 lists for initial or final word parts extracted from lower-case word forms,
3) 10 lists for initial or final word parts extracted from normalized word forms.
In addition, 30 lists were extracted from all words (regardless of their part-of-speech category). | |
Publisher | Jožef Stefan Institute | |
Publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
Rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
Rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
Subject | final part of the word | |
Subject | spoken corpus | |
Subject | word parts | |
Subject | morphology | |
Subject | initial part of the word | |
Title | Frequency lists of word parts from the GOS 1.0 corpus | |
Type | lexicalConceptualResource | |
Type | Text | |
dcterms.available | 2019-11-13T08:52:43Z | |
dcterms.bibliographicCitation | http://hdl.handle.net/11356/1270 | |
dcterms.creator | Dobrovoljc, Kaja | |
dcterms.creator | Krek, Simon | |
dcterms.creator | Čibej, Jaka | |
dcterms.creator | Arhar Holdt, Špela | |
dcterms.date | 2019-11-13T08:52:43Z | |
dcterms.description | Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or normalized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted:
1) 10 lists for initial or final word parts extracted from lemmas,
2) 10 lists for initial or final word parts extracted from lower-case word forms,
3) 10 lists for initial or final word parts extracted from normalized word forms.
In addition, 30 lists were extracted from all words (regardless of their part-of-speech category). | |
dcterms.identifier | http://hdl.handle.net/11356/1270 | |
dcterms.isReplacedBy | http://hdl.handle.net/11356/1366 | |
dcterms.publisher | Jožef Stefan Institute | |
dcterms.publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
dcterms.rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
dcterms.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
dcterms.subject | final part of the word | |
dcterms.subject | spoken corpus | |
dcterms.subject | word parts | |
dcterms.subject | morphology | |
dcterms.subject | initial part of the word | |
dcterms.title | Frequency lists of word parts from the GOS 1.0 corpus | |
dcterms.type | lexicalConceptualResource | |
dcterms.type | Text | |
odrl.Policy | http://purl.org/net/rdflicense/cc-by-sa4.0 | |