Show simple item record

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.0

CreatorLjubešić, Nikola
Date2020-08-12T16:31:54Z
dc.date.accessioned2021-07-24T21:27:57Z
dc.date.available2021-07-24T21:27:57Z
Identifierhttp://hdl.handle.net/11356/1338
dc.identifier.urihttps://linghub.org/handle/123456789/924984
DescriptionThe model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86.
PublisherJožef Stefan Institute
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectlemmatisation
Subjectcomputer-mediated communication
Subjectlanguage model
TitleThe CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.0
TypetoolService
TypeSoftware
dcterms.available2020-08-12T16:31:54Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1338
dcterms.creatorLjubešić, Nikola
dcterms.date2020-08-12T16:31:54Z
dcterms.descriptionThe model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86.
dcterms.identifierhttp://hdl.handle.net/11356/1338
dcterms.isReplacedByhttp://hdl.handle.net/11356/1350
dcterms.publisherJožef Stefan Institute
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectlemmatisation
dcterms.subjectcomputer-mediated communication
dcterms.subjectlanguage model
dcterms.titleThe CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.0
dcterms.typetoolService
dcterms.typeSoftware
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.