Linguistic Research

Somali-Italian Bilingual Computational Lexicon Research Project.

Rut Foundation has launched a research project in collaboration with the Italian Geographical Society for the construction of a bilingual Somali-Italian computational lexicon dedicated to the era of Italy’s colonialist expansion in Africa.

The first phase of the project is focusing on the extraction and systematization of terms taken from the notebooks written by the explorer Ugo Ferrandi. The notebooks constitute a veritable mine of linguistic information: to describe objects, concepts and phenomena typical of Somali culture, Ferrandi reports a plethora of indigenous terms, thus portraying an ancient phase of the Somali language as it was spoken by shepherds and nomadic farmers in the period pre-colonial, before the advent of the European powers.

The first phase of the project is focusing on the extraction and systematization of the terms reported by Ferrandi in his notebooks. When possible, the description of words is enriched with information taken from the Somali lexicon incorporated into the Somali Corpus created by Jama Musse Jama (2006). The annotated and balanced corpus (includes texts of both poetry and literary and scientific prose) incorporates, in fact, a Somali lexicon with linguistic information relating to the words present in the corpus, such as frequency, collocations, etymology, synonyms and antonyms, spelling variants, definitions taken from a list of reference dictionaries as well as translations into English, Italian, French and Swedish.

However, since the corpus lexicon is encoded in a proprietary format, the connection with our termino-ontological resource cannot be made without the previous conversion of the corpus lexicon to the lemon model. The conversion process includes an intermediate phase in which the proprietary format is transformed into the CONLL-U (Conference on Natural Language Learning) format. A technological application was therefore developed to automatically convert linguistic annotations in CoNLL format into linked data (OntoLex – Lemon model). This program will be applied to the annotated Somali corpus to structure the terms of interest into a computational terminology.

Furthermore, to allow an initial annotation phase of the project’s reference corpora, a temporary user interface has been developed, also useful for supporting the process of defining user requirements that will lead to the development of the definitive annotation tool. The technologies in use are consistent with what is expected for the subsequent development phases. The tool allows scholars to annotate a first group of Jewish texts with lexical information, both in their original language and in their Italian translation.

Analysis of the Hebrew language:

Construction of a bilingual Hebrew-Italian terminology resource taken from Rut's book.

Rut Foundation has started work for the construction of a digital terminological resource starting from the computational analysis of the book of Rut, a text contained in the Hebrew and Christian Bible, written in Hebrew and its drafting, by unknown authors, is placed by various scholars between the 5th and 2nd century BC..

The semantic fields in the Book of Rut are analyzed (Agriculture and nutrition, Society and politics, Family, Unit of measurement, Body and its parts, Geullà/Redemption, Yibbùm/Levirate, Time, Space, God) and subsequently the related words are identified to each of them, methodically noting every verse in which they appear.

The terms belonging to the fields Family, Unit of measurement, Body and its parts were then selected for an initial analysis. Having defined the main semantic fields, we then proceeded with the formal description of the semantics of some Hebrew terms according to the theory of the Explanatory and Combinatorial Dictionary (DEC) developed by Igor Mel’chuk within the Sense-Text model.

According to this theory, the entries of a lexicon can be conceived as trilateral entities that include:
  • A sense, a phonetic or graphic form;
  • Combinatorial features (e.g., syntactic distribution).

Each entry in our lexicon was therefore associated with:
  • Semantic information: a definition generally accompanied by a propositional form, in which the actants introduced by the lexical unit are made explicit;
  • Syntactic information (regime): all the possible syntactic combinations in which each of the semantic actants identified in the definition of the term in question can appear are specified;
  • Combinatorial lexical information (lexical functions): through lexical functions the semantic relationships that exist between a topic lexeme X (the keyword) and other lexemes Y of the lexicon are specified.

In this phase of the work, the analysis focused on the study of some terms relating to the Family. The semantic sphere, although limited, allowed us to investigate the concept of marriage in ancient Jewish civilization and its similarities with the legal institution of matrimonium in the Roman world of the time. The terminological repertoire was also organized into a conceptual map useful for the ontological formalization of the domain of interest.

“The Divine Disease” metatextual research project.

Rut Foundation has activated a collaboration with the “Associazione Teatro Patologico” for the creation of a theatrical piece which aims to stimulate the interest and attention of the public and institutions on issues related to the fight against marginality and to encourage social regeneration.

The show is inspired by Dante Alighieri’s Divine Comedy and it will be directed by the founder of the Association, Dario D’Ambrosi and performed by differently abled girls and boys.

The performance of the show took place on 23 and 24 September 2023 in Naples.

Drafting of the "Data Management Plan".

The project’s data management is supported by the CLARIN infrastructure and its national repository, ILC4CLARIN. ILC4CLARIN will host the corpora and lexicons of the project and will support the team in their description and publication, in line with the FAIR and open science principles.

The Data Management Plan (DMP) will be a constantly updated document, but a first draft is expected this year, describing the datasets in terms of provenance, legal issues, formats and standards, preservation during the project phases and at its end, accessibility, reusability. The DMP drafting plan was developed during two meetings with the project members.

The following elements have been identified:
  • Data description model, opting for the one proposed by Science Europe, towards which many projects at European level are converging;
  • The tool used for the drafting, Argos, also connected to the Scientific Knowledge Graph of the OpenAIRE platform.

Within the project on the analysis of the Somali language, the researchers collaborating on the Rut Foundation project were invited to present their work and scientific contribution to the prestigious international conference on anthologies “TOTh – Terminology & Ontology: Theories and applications” Chambery – France.

Financing of research doctorates with "Suor Orsola Benincasa University".

Rut Foundation joins the National School of Doctorates in Religious Sciences by remunerating two doctoral scholarships at the Suor Orsola Benincasa University of Naples in the Transdisciplinarity curriculum.
