Hungarian Language Processing Tools in NooJ

Instance of: Resource Info
Description The Hungarian NooJ contains a morphological dictionary (based on the more than 60 000 lemmata found in the Concise Dictionary of Hungarian Language morphological information based on the work of Laszlo Elekfi). From the base forms amd the morphological information contained in the .DIC files using the inflectional rules described in the .FLX files complex inflected forms of nouns and verbs are generated with the help of Nooj compile dictionary function. The result of the compilation can be found in the .NOD files. With the aid of the NOD files complex inflected forms can be recognised in the running texts, including derived and further inflected running words, as well as non inflected forms, naturally. Separate dictionaries contain words which cannot be inflected. As the result of this, complex suffixed words and/or compounds can also be recognised when analysing a text. With the aid of the compiled dictionaries and the language specific syntactic graphs the tool performs sentence- and clause-segmentation, POS-tagging NP-recognition, predicate-identification and the identification of the other sentence constituents (eg. adverbials). The input text may be any Hungarian raw text or any xml-text compatible with NooJ, and the output may also be exported in xml-format. NooJ is widely used in Hungarian linguistics and language technology: its usage covers a broad scale of morphological, syntactic, lexical, semantic and psychological content analyses. The Hungarian NooJ toools are consisiting of a range of scpecific dictionaries (basic .dic files for disctionaries, .nog files for compilled didctionries and .flx files for morphological rules). Each of them is created for scpecific analyses. Below is a short description for each of them: noun.dic Hungarian nouns supplied with morhpological information -- 55000 units, verb_00.dic Hungarian verbs supplied with morhpological information -- 10000 units, topabbr.dic Most frequent Hungarian abbreviations -- 11 tokens, noaffix-nins.dic Hungarian words which cannot be inflected -- 1870 units, topprop.dic Most frequent proper names -- 28 units, noun.nod Compiled Nooj dictionary of Hungarian nouns -- 96777513 words, verb_00.nod Compiled Nooj dictionary of Hungarian verbs -- 19059644, topabbr.nod Most frequent Hungarian abbreviations -- 11 words, noaffix.nod Compiled Nooj dictionary of Hungarian words which cannot be inflected. -- 1870 words, topprop.nod Compiled Nooj dictionary of the most frequent Hungarian proper names -- 28 words, noun.flx Inflectional rules of Hungarian nouns according to their morphological category -- 33 000 rules, verb.flx Inflectional rules of Hungarian verbs according to their morphological category -- 27900 rules
Language hu
Rights GPL
See Also http://metashare.elda.org/repository/browse/1fcf74c863d711e2aa7c68b599c26a06aa081596f9f4458e8ffee18bf8a44780/
Source META-SHARE
Title Hungarian Language Processing Tools in NooJ
Type Dataset
Type Lexical Conceptual Resource

Contact Point

Affiliation
Communication Info
Address Benczur utca 33.
City Budapest
Distribution
Access URL http://www.nytud.hu/depts/corpus/index.html
Type Distribution
URL
Email pinter.tibor@nytud.mta.hu
Fax Number +36 1 322 9297
Telephone Number + 36 1 321 4830 ext.119
Type Communication Info
Zip Code 1068
Department Name Department of Language Technology
Organization Name Research Institute for Linguistics, Hungarian academy of Sciences
Organization Short Name RIL HAS
Type Organization Info Type
Communication Info
Address Benczur utca 33.
City Budapest
Distribution
Access URL http://www.nytud.hu/oszt/korpusz/Pajzs_Julia.html
Type Distribution
URL
Email pajzs.julia@nytud.mta.hu
Fax Number +36 1 322 9297
Telephone Number + 36 1 321 4830 ext.202
Type Communication Info
Zip Code 1068
Given Name JĂșlia
Position senior research fellow
Surname Pajzs
Type Contact Person
Person
Person Info Type

Distribution Info

Availability Available-unrestricted Use
License
Delivery Channel Downloadable
Permission
Action http://creativecommons.org/ns/ShareALike
Duty Metashare/1fcf74c863d711e2aa7c68b599c26a06aa081596f9f4458e8ffee18bf8a44780#permission
Type Duty
Permission
Restrictions Of Use
Same As http://www.gnu.org/copyleft/gpl.html
Type Licence Info
Type Distribution
Distribution Info

Identification Info

Description The Hungarian NooJ contains a morphological dictionary (based on the more than 60 000 lemmata found in the Concise Dictionary of Hungarian Language morphological information based on the work of Laszlo Elekfi). From the base forms amd the morphological information contained in the .DIC files using the inflectional rules described in the .FLX files complex inflected forms of nouns and verbs are generated with the help of Nooj compile dictionary function. The result of the compilation can be found in the .NOD files. With the aid of the NOD files complex inflected forms can be recognised in the running texts, including derived and further inflected running words, as well as non inflected forms, naturally. Separate dictionaries contain words which cannot be inflected. As the result of this, complex suffixed words and/or compounds can also be recognised when analysing a text. With the aid of the compiled dictionaries and the language specific syntactic graphs the tool performs sentence- and clause-segmentation, POS-tagging NP-recognition, predicate-identification and the identification of the other sentence constituents (eg. adverbials). The input text may be any Hungarian raw text or any xml-text compatible with NooJ, and the output may also be exported in xml-format. NooJ is widely used in Hungarian linguistics and language technology: its usage covers a broad scale of morphological, syntactic, lexical, semantic and psychological content analyses. The Hungarian NooJ toools are consisiting of a range of scpecific dictionaries (basic .dic files for disctionaries, .nog files for compilled didctionries and .flx files for morphological rules). Each of them is created for scpecific analyses. Below is a short description for each of them: noun.dic Hungarian nouns supplied with morhpological information -- 55000 units, verb_00.dic Hungarian verbs supplied with morhpological information -- 10000 units, topabbr.dic Most frequent Hungarian abbreviations -- 11 tokens, noaffix-nins.dic Hungarian words which cannot be inflected -- 1870 units, topprop.dic Most frequent proper names -- 28 units, noun.nod Compiled Nooj dictionary of Hungarian nouns -- 96777513 words, verb_00.nod Compiled Nooj dictionary of Hungarian verbs -- 19059644, topabbr.nod Most frequent Hungarian abbreviations -- 11 words, noaffix.nod Compiled Nooj dictionary of Hungarian words which cannot be inflected. -- 1870 words, topprop.nod Compiled Nooj dictionary of the most frequent Hungarian proper names -- 28 words, noun.flx Inflectional rules of Hungarian nouns according to their morphological category -- 33 000 rules, verb.flx Inflectional rules of Hungarian verbs according to their morphological category -- 27900 rules
Distribution
Access URL http://corpus.nytud.hu/nooj
Type Distribution
URL
Identifier 118
Meta Share Id NOT_DEFINED_FOR_V2
Resource Short Name NooJ
Title Hungarian Language Processing Tools in NooJ
Type Identification Info

Lexical Conceptual Resource Info

Lexical Conceptual Resource Media Type
Lexical Conceptual Resource Text Info
Character Encoding Info
Character Encoding UTF-8
Size Per Character Encoding
Size see: description
Size Unit Tokens
Type Size Info Type
Type Character Encoding Info
Language Info
Language hu
Language Name Hungarian
Language Script latin
Type Language Info
Linguality Info
Linguality Type Monolingual
Type Linguality Info
Media Type Text
Modality Info
Modality Type Written Language
Type Modality Info
Size Info
Size see: description
Size Unit Tokens
Type Size Info Type
Text Format Info
Mime Type text
Size Per Text Format
Size see: description
Size Unit Tokens
Type Size Info Type
Type Text Format Info
Type Lexical Conceptual Resource Text Info
Type Lexical Conceptual Resource Media Type
Lexical Conceptual Resource Type Computational Lexicon
Resource Type Lexical Conceptual Resource
Type Lexical Conceptual Resource Info

Resource Creation Info

Funding Project
Distribution
Access URL http://www.cesar-project.net
Type Distribution
URL
Funder European Commission (50%)
Research Institute for Linguistics, Hungarian academy of Sciences (50%)
Funding Country Hungary
Funding Type Own Funds
Eu Funds
Project End Date 2013-01-31 Date
Project Name Central and South-East European Resources
Project Short Name CESAR
Project Start Date 2011-02-01 Date
Type Project Info Type
Type Resource Creation Info

Metashare/1fcf74c863d711e2aa7c68b599c26a06aa081596f9f4458e8ffee18bf8a44780#Header

Instance of: Catalog Record
Issued 2014-09-23T00:16:08Z Date
Primary Topic Hungarian Language Processing Tools in NooJ
Set Spec lexicalConceptualResource:computationalLexicon
lexicalConceptualResource

Metashare/1fcf74c863d711e2aa7c68b599c26a06aa081596f9f4458e8ffee18bf8a44780#metadata Info

Instance of: Catalog Record
Created 2012-06-25 Date
Language en
Language English
Modified 2012-06-25 Date
Primary Topic Hungarian Language Processing Tools in NooJ
Source CESAR
Type Metadata Info

Creator

Type Actor