Show simple item record

Frequency lists of character-level n-grams from the Gigafida 2.0 corpus

CreatorDobrovoljc, Kaja
CreatorKrek, Simon
CreatorČibej, Jaka
CreatorArhar Holdt, Špela
Date2019-11-13T08:54:49Z
dc.date.accessioned2021-07-24T21:27:34Z
dc.date.available2021-07-24T21:27:34Z
Identifierhttp://hdl.handle.net/11356/1272
dc.identifier.urihttps://linghub.org/handle/123456789/924941
DescriptionFrequency lists of character-level n-grams were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files) and lower-case word forms (5 files).
PublisherJožef Stefan Institute
PublisherCentre for Language Resources and Technologies, University of Ljubljana
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectcharacters
Subjectstandard language
Subjectn-grams
Subjectfrequency list
TitleFrequency lists of character-level n-grams from the Gigafida 2.0 corpus
TypelexicalConceptualResource
TypeText
dcterms.available2019-11-13T08:54:49Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1272
dcterms.creatorDobrovoljc, Kaja
dcterms.creatorKrek, Simon
dcterms.creatorČibej, Jaka
dcterms.creatorArhar Holdt, Špela
dcterms.date2019-11-13T08:54:49Z
dcterms.descriptionFrequency lists of character-level n-grams were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. Character-level n-grams were extracted from lemmas (5 files) and lower-case word forms (5 files).
dcterms.identifierhttp://hdl.handle.net/11356/1272
dcterms.publisherJožef Stefan Institute
dcterms.publisherCentre for Language Resources and Technologies, University of Ljubljana
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectcharacters
dcterms.subjectstandard language
dcterms.subjectn-grams
dcterms.subjectfrequency list
dcterms.titleFrequency lists of character-level n-grams from the Gigafida 2.0 corpus
dcterms.typelexicalConceptualResource
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.