Show simple item record

C4Corpus (CC BY-SA part)

CreatorGurevych, Iryna
CreatorZayed, Omnia
CreatorHabernal, Ivan
Date2017-06-07T13:09:38Z
dc.date.accessioned2021-07-25T12:07:37Z
dc.date.available2021-07-25T12:07:37Z
Identifierhttp://hdl.handle.net/11372/LRT-2208
dc.identifier.urihttps://linghub.org/handle/123456789/1042864
DescriptionA large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
PublisherTechnische Universität Darmstadt
Rightshttp://creativecommons.org/licenses/by-sa/4.0/
RightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
SubjectWeb corpus
SubjectCreative Commons
SubjectCommonCrawl
SubjectAmazon Web Services
TitleC4Corpus (CC BY-SA part)
Typecorpus
TypeText
dcterms.available2017-06-07T13:09:38Z
dcterms.bibliographicCitationhttp://hdl.handle.net/11372/LRT-2208
dcterms.creatorGurevych, Iryna
dcterms.creatorZayed, Omnia
dcterms.creatorHabernal, Ivan
dcterms.date2017-06-07T13:09:38Z
dcterms.descriptionA large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
dcterms.identifierhttp://hdl.handle.net/11372/LRT-2208
dcterms.publisherTechnische Universität Darmstadt
dcterms.rightshttp://creativecommons.org/licenses/by-sa/4.0/
dcterms.rightsCreative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dcterms.subjectWeb corpus
dcterms.subjectCreative Commons
dcterms.subjectCommonCrawl
dcterms.subjectAmazon Web Services
dcterms.titleC4Corpus (CC BY-SA part)
dcterms.typecorpus
dcterms.typeText
odrl.Policyhttp://purl.org/net/rdflicense/cc-by-sa4.0


Check resource access

Authorized
Reason

Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • OLAC
    Main data from the OLAC dataset

Show simple item record


Copyright  © 2020 All Rights Reserved by Prêt-à-LLOD Project.

Horizon 2020

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 825182.