Show simple item record

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1

CreatorLjubešić, Nikola
Date2020-04-29T10:03:09Z
dc.date.accessioned2021-07-24T21:27:47Z
dc.date.available2021-07-24T21:27:47Z
Identifierhttp://hdl.handle.net/11356/1312
dc.identifier.urihttps://linghub.org/handle/123456789/924965
DescriptionThis model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
PublisherJožef Stefan Institute
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectlanguage model
Subjectpart-of-speech tagging
TitleThe CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1
TypetoolService
TypeSoftware
dcterms.available2020-04-29T10:03:09Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1312
dcterms.creatorLjubešić, Nikola
dcterms.date2020-04-29T10:03:09Z
dcterms.descriptionThis model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
dcterms.identifierhttp://hdl.handle.net/11356/1312
dcterms.isReplacedByhttp://hdl.handle.net/11356/1391
dcterms.publisherJožef Stefan Institute
dcterms.replaceshttp://hdl.handle.net/11356/1251
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectlanguage model
dcterms.subjectpart-of-speech tagging
dcterms.titleThe CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1
dcterms.typetoolService
dcterms.typeSoftware
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.