Collocation and Term Extractor

See Also http://metashare.elda.org/repository/browse/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75/
Title Collocation and Term Extractor
Type Tool Service

Contact Point

Affiliation Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#affiliation
Communication Info Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#communication Info
Given Name Nikola
Position Assistant Professor
Surname Ljubešić
Type Person

Distribution Info

Availability Available-unrestricted Use
Ipr Holder
Organization Info University of Zagreb, Faculty of Humanities and Social Sciences FFZG Department/Institute of Linguistics, Department of Information Sciences zzl@ffzg.hr http://hnk.ffzg.hr/ Ivana Lučića 3 10000 Zagreb Croatia +385 1 6002 323 +385 1 6156 879
Person Info Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#person Info
License
Delivery Channel Downloadable
Distribution Rights Holder
Organization Info University of Zagreb, Faculty of Humanities and Social Sciences zzl@ffzg.hr http://hnk.ffzg.hr/ Ivana Lučića 3 10000 Zagreb Croatia +385 1 6120 066 +385 1 6156 879
Permission
Duty
Action http://creativecommons.org/nsNotify
Type Distribution

Identification Info

Description CollTerm is a language independent tool for collocation and term extraction. It is an application that collects collocation and term candidates based on five different co occurrence measures for multiword units (i.e. collocations) or distributional differences from large representative corpus by application of the TF-IDF measurement on singleword units. The language dependent part consists of stop-word list and list of MWU MSD-patterns that can be coded with regular expressions as well. The application is describe in the paper presented at TKE2012 by Pinnis, M., Ljubešić, N., Ştefănescu, D., Skadiņa, I, Tadić, Gornostay, T. Term Extraction, Tagging, and Mapping Tools for Under-Resourced Languages. The first version of this application is available as an integral part of ACCURAT Toolkit that is available under Apache 2.0 license (http://www.accurat-project.eu/index.php?p=accurat-toolkit). In this version of the tool a calibration of MWU MSD-patterns has been provided for Croatian thus enhancing the usability of the tool. The plan is to provide calibration for other CESAR languages as well.
Distribution
Access URL http://www.accurat-project.eu
http://www.cesar-project.net
http://www.nljubesic.net/
http://hnk.ffzg.hr
http://www.nljubesic.net/resources/tools/collterm/
Identifier 312
Meta Share Id NOT_DEFINED_FOR_V2
Resource Short Name CollTerm
Title Collocation and Term Extractor

Resource Creation

Creation Start Date 2011-04-01 Date
Creator
Organization Info Univ. of Zagreb, Faculty of Humanities and Social Sciences, Depts. of Linguistics & Information Sci. zzl@ffzg.hr http://hnk.ffzg.hr Ivana Lučića 3 10000 Zagreb Croatia +385 1 6120 066 +385 1 6156 879
Funding Project
Distribution Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#Dist URL
Funder University of Zagreb, Faculty of Humanities and Social Sciences (25%)
European Commission (50%)
University of Zagreb, Faculty of Humanities and Social Sciences (50%)
European Commission (75%)
Funding Type National Funds
Eu Funds
Project End Date 2013-01-31 Date
2012-06-30 Date
Project Name Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation
Central and South-East European Resources
Project Short Name ACCURAT
CESAR
Project Start Date 2010-01-01 Date
2011-02-01 Date

Tool Service Info

Input Info
Media Type Text
Modality Type Written Language
Resource Type languageDescription
Language Dependent false Boolean
Output Info
Media Type Text
Modality Type Written Language
Resource Type lexicalConceptualResource
Resource Type toolService
Tool Service Creation Info
Implementation Language Python
Tool Service Evaluation Info
Evaluated true Boolean
Evaluation Criteria Intrinsic
Evaluation Level Diagnostic
Evaluation Measure Human
Evaluation Type Black Box
Evaluator
Person Info
Affiliation
Communication Info
Address Ivana Lučića 3
City Zagreb
Country Croatia
Distribution Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#Dist URL
Email marko.tadic@ffzg.hr
nljubesi@ffzg.hr
Fax Number +385 1 6156 879
Telephone Number +385 1 6002 323
+385 1 6120 066
Zip Code 10000
Communication Info Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#communication Info
Given Name Nikola
Marko
Surname Ljubešić
Tadić
Type Person
Tool Service Operation Info
Operating System Linux
Running Environment Info
Required Software Python (version 2.6 or higher)
Tool Service Type Tool

Version Info

Has Version 1.0
Modified 2012-07-30 Date

Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#Header

Instance of: Catalog Record
Issued 2014-09-23T00:19:21Z Date
Primary Topic Collocation and Term Extractor
Set Spec toolService:tool
toolService

Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#metadata Info

Instance of: Catalog Record
Created 2012-07-30 Date
Creator Metashare/a89c02f4663d11e28a985ef2e4e6c59e76428bf02e394229a70428f25a839f75#metadata Creator
Language en
Modified 2013-02-04 Date
Primary Topic Collocation and Term Extractor