Show simple item record

Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus

CreatorRobnik-Šikonja, Marko
CreatorKrek, Simon
CreatorDobrovoljc, Kaja
CreatorGantar, Apolonija
CreatorKosem, Iztok
CreatorKrsnik, Luka
CreatorLaskowski, Cyprian
CreatorGorjanc, Vojko
CreatorČibej, Jaka
CreatorKlemenc, Bojan
CreatorBrank, Janez
CreatorArhar Holdt, Špela
Date2021-03-26T09:24:28Z
dc.date.accessioned2021-07-24T21:30:06Z
dc.date.available2021-07-24T21:30:06Z
Identifierhttp://hdl.handle.net/11356/1421
dc.identifier.urihttps://linghub.org/handle/123456789/925048
DescriptionThe MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples. MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense. MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense.
PublisherCentre for Language Resources and Technologies, University of Ljubljana
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectcomputational lexicography
Subjectsyntactic structures
Subjectlexicon
Subjectmultiword expressions
TitleMultiword Expressions lexicon extracted from the Gigafida 2.1 corpus
TypelexicalConceptualResource
TypeText
dcterms.available2021-03-26T09:24:28Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1421
dcterms.creatorRobnik-Šikonja, Marko
dcterms.creatorKrek, Simon
dcterms.creatorDobrovoljc, Kaja
dcterms.creatorGantar, Apolonija
dcterms.creatorKosem, Iztok
dcterms.creatorKrsnik, Luka
dcterms.creatorLaskowski, Cyprian
dcterms.creatorGorjanc, Vojko
dcterms.creatorČibej, Jaka
dcterms.creatorKlemenc, Bojan
dcterms.creatorBrank, Janez
dcterms.creatorArhar Holdt, Špela
dcterms.date2021-03-26T09:24:28Z
dcterms.descriptionThe MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples. MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense. MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense.
dcterms.identifierhttp://hdl.handle.net/11356/1421
dcterms.publisherCentre for Language Resources and Technologies, University of Ljubljana
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectcomputational lexicography
dcterms.subjectsyntactic structures
dcterms.subjectlexicon
dcterms.subjectmultiword expressions
dcterms.titleMultiword Expressions lexicon extracted from the Gigafida 2.1 corpus
dcterms.typelexicalConceptualResource
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.