Creator | Dobrovoljc, Kaja | |
Creator | Krek, Simon | |
Creator | Čibej, Jaka | |
Creator | Arhar Holdt, Špela | |
Date | 2020-11-02T12:35:03Z | |
dc.date.accessioned | 2021-07-24T21:29:57Z | |
dc.date.available | 2021-07-24T21:29:57Z | |
Identifier | http://hdl.handle.net/11356/1363 | |
dc.identifier.uri | https://linghub.org/handle/123456789/925004 | |
Description | Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
Character-level n-grams were extracted from lemmas (5 files), lower-case word forms (5 files), and standardized word forms (5 files).
Compared to the previous version (http://hdl.handle.net/11356/1268), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project). | |
Publisher | Jožef Stefan Institute | |
Publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
Rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
Rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
Subject | characters | |
Subject | n-grams | |
Subject | spoken corpus | |
Subject | frequency list | |
Title | Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1 | |
Type | lexicalConceptualResource | |
Type | Text | |
dcterms.available | 2020-11-02T12:35:03Z | |
dcterms.bibliographicCitation | http://hdl.handle.net/11356/1363 | |
dcterms.creator | Dobrovoljc, Kaja | |
dcterms.creator | Krek, Simon | |
dcterms.creator | Čibej, Jaka | |
dcterms.creator | Arhar Holdt, Špela | |
dcterms.date | 2020-11-02T12:35:03Z | |
dcterms.description | Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.
Character-level n-grams were extracted from lemmas (5 files), lower-case word forms (5 files), and standardized word forms (5 files).
Compared to the previous version (http://hdl.handle.net/11356/1268), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project). | |
dcterms.identifier | http://hdl.handle.net/11356/1363 | |
dcterms.publisher | Jožef Stefan Institute | |
dcterms.publisher | Centre for Language Resources and Technologies, University of Ljubljana | |
dcterms.replaces | http://hdl.handle.net/11356/1268 | |
dcterms.rights | https://creativecommons.org/licenses/by-sa/4.0/ | |
dcterms.rights | Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) | |
dcterms.subject | characters | |
dcterms.subject | n-grams | |
dcterms.subject | spoken corpus | |
dcterms.subject | frequency list | |
dcterms.title | Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1 | |
dcterms.type | lexicalConceptualResource | |
dcterms.type | Text | |
odrl.Policy | http://purl.org/net/rdflicense/cc-by-sa4.0 | |