Show simple item record

Tweet comma corpus Janes-Vejica 1.0

CreatorZupan, Katja
CreatorKavčič, Teja
CreatorLogar, Polona
CreatorPopič, Damjan
CreatorFišer, Darja
CreatorErjavec, Tomaž
Date2017-02-16T12:28:26Z
dc.date.accessioned2021-07-24T21:26:29Z
dc.date.available2021-07-24T21:26:29Z
Identifierhttp://hdl.handle.net/11356/1088
dc.identifier.urihttps://linghub.org/handle/123456789/924812
DescriptionJanes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation, and word normalisation, and automatically for morphosyntactic descriptions and lemmas. The corpus is further described in: POPIČ, Damjan, FIŠER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniških spletnih vsebinah. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/
PublisherJožef Stefan Institute
Rightshttps://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Subjectcomputer-mediated communication
Subjectmanual annotation
SubjectTwitter
SubjectTEI
Subjectcomma placement
TitleTweet comma corpus Janes-Vejica 1.0
Typecorpus
TypeText
dcterms.available2017-02-16T12:28:26Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11356/1088
dcterms.creatorZupan, Katja
dcterms.creatorKavčič, Teja
dcterms.creatorLogar, Polona
dcterms.creatorPopič, Damjan
dcterms.creatorFišer, Darja
dcterms.creatorErjavec, Tomaž
dcterms.date2017-02-16T12:28:26Z
dcterms.descriptionJanes-Vejica is a corpus of Slovene tweets where commas are annotated with the reason for their (in)correct use, according to the supplied typology. The corpus was sampled from the Janes-Norm corpus (http://hdl.handle.net/11356/1084), which was manually annotated for tokenisation, sentence segmentation, and word normalisation, and automatically for morphosyntactic descriptions and lemmas. The corpus is further described in: POPIČ, Damjan, FIŠER, Darja, ZUPAN, Katja, LOGAR, Polona. Raba vejice v uporabniških spletnih vsebinah. Proceedings of the Conference on Language Technologies & Digital Humanities, Ljubljana, Slovenia. 2016, pp. 149-153. http://www.sdjt.si/wp/dogodki/konference/jtdh-2016/zbornik/
dcterms.identifierhttp://hdl.handle.net/11356/1088
dcterms.publisherJožef Stefan Institute
dcterms.rightshttps://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectcomputer-mediated communication
dcterms.subjectmanual annotation
dcterms.subjectTwitter
dcterms.subjectTEI
dcterms.subjectcomma placement
dcterms.titleTweet comma corpus Janes-Vejica 1.0
dcterms.typecorpus
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.