Show simple item record

Frequency lists of word parts from the Gigafida 2.0 corpus

CreatorDobrovoljc, Kaja
CreatorKrek, Simon
CreatorČibej, Jaka
CreatorArhar Holdt, Špela
Date2019-11-13T11:00:28Z
dc.date.accessioned2021-07-24T21:27:36Z
dc.date.available2021-07-24T21:27:36Z
Identifierhttp://hdl.handle.net/11356/1275
dc.identifier.urihttps://linghub.org/handle/123456789/924944
DescriptionFrequency lists of words split into word parts were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas or lower-case word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest of the word. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 20 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms. In addition, 20 lists were extracted from all words (regardless of their part-of-speech category). For easier processing in statistical analysis software, shortened versions of longer lists were made containing the first 150,000 lines.
PublisherJožef Stefan Institute
PublisherCentre for Language Resources and Technologies, University of Ljubljana
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectstandard language
Subjectfinal part of the word
Subjectword parts
Subjectmorphology
Subjectinitial part of the word
Subjectfrequency list
TitleFrequency lists of word parts from the Gigafida 2.0 corpus
TypelexicalConceptualResource
TypeText
dcterms.available2019-11-13T11:00:28Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1275
dcterms.creatorDobrovoljc, Kaja
dcterms.creatorKrek, Simon
dcterms.creatorČibej, Jaka
dcterms.creatorArhar Holdt, Špela
dcterms.date2019-11-13T11:00:28Z
dcterms.descriptionFrequency lists of words split into word parts were extracted from the Gigafida 2.0 Corpus of Written Standard Slovene (https://viri.cjvt.si/gigafida/) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas or lower-case word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest of the word. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 20 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms. In addition, 20 lists were extracted from all words (regardless of their part-of-speech category). For easier processing in statistical analysis software, shortened versions of longer lists were made containing the first 150,000 lines.
dcterms.identifierhttp://hdl.handle.net/11356/1275
dcterms.publisherJožef Stefan Institute
dcterms.publisherCentre for Language Resources and Technologies, University of Ljubljana
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectstandard language
dcterms.subjectfinal part of the word
dcterms.subjectword parts
dcterms.subjectmorphology
dcterms.subjectinitial part of the word
dcterms.subjectfrequency list
dcterms.titleFrequency lists of word parts from the Gigafida 2.0 corpus
dcterms.typelexicalConceptualResource
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.