Creator | Dobrovoljc, Kaja | |
Creator | Krek, Simon | |
Creator | Čibej, Jaka | |
Creator | Arhar Holdt, Špela | |
Date | 2019-11-13T11:00:28Z | |
dc.date.accessioned | 2021-07-24T21:27:36Z | |
dc.date.available | 2021-07-24T21:27:36Z | |
Identifier | http://hdl.handle.net/11356/1275 | |
dc.identifier.uri | https://linghub.org/handle/123456789/924944 | |
Description | Frequency lists of words split into word parts were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas or lower-case word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest of the word. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 20 lists were extracted:
1) 10 lists for initial or final word parts extracted from lemmas,
2) 10 lists for initial or final word parts extracted from lower-case word forms.
In addition, 20 lists were extracted from all words (regardless of their part-of-speech category). For easier processing in statistical analysis software, shortened versions of longer lists were made containing the first 150,000 lines. | |
Publisher | Jožef Stefan Institute | |
Publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
Rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
Rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
Subject | standard language | |
Subject | final part of the word | |
Subject | word parts | |
Subject | morphology | |
Subject | initial part of the word | |
Subject | frequency list | |
Title | Frequency lists of word parts from the Gigafida 2.0 corpus | |
Type | lexicalConceptualResource | |
Type | Text | |
dcterms.available | 2019-11-13T11:00:28Z | |
dcterms.bibliographicCitation | http://hdl.handle.net/11356/1275 | |
dcterms.creator | Dobrovoljc, Kaja | |
dcterms.creator | Krek, Simon | |
dcterms.creator | Čibej, Jaka | |
dcterms.creator | Arhar Holdt, Špela | |
dcterms.date | 2019-11-13T11:00:28Z | |
dcterms.description | Frequency lists of words split into word parts were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas or lower-case word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest of the word. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 20 lists were extracted:
1) 10 lists for initial or final word parts extracted from lemmas,
2) 10 lists for initial or final word parts extracted from lower-case word forms.
In addition, 20 lists were extracted from all words (regardless of their part-of-speech category). For easier processing in statistical analysis software, shortened versions of longer lists were made containing the first 150,000 lines. | |
dcterms.identifier | http://hdl.handle.net/11356/1275 | |
dcterms.publisher | Jožef Stefan Institute | |
dcterms.publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
dcterms.rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
dcterms.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
dcterms.subject | standard language | |
dcterms.subject | final part of the word | |
dcterms.subject | word parts | |
dcterms.subject | morphology | |
dcterms.subject | initial part of the word | |
dcterms.subject | frequency list | |
dcterms.title | Frequency lists of word parts from the Gigafida 2.0 corpus | |
dcterms.type | lexicalConceptualResource | |
dcterms.type | Text | |
odrl.Policy | http://purl.org/net/rdflicense/cc-by-sa4.0 | |