CLASiK, a European project designed to overcome language barriers in science

The technology they are going to develop will be tested in the field of climatology, so that experts from different countries can share and understand critical data on droughts, floods, or heat waves, regardless of the original language of the information
GrupoClasik

A group of researchers from the I3A (Aragon Engineering Research Institute) are leading the European CLASiK (Multilingual Access to Scientific Knowledge) project, an international initiative designed to overcome the language barriers that fragment global research. Although English is the predominant language in science, a great deal of valuable information is published only in local languages, creating “islands of knowledge” that are inaccessible to those who do not speak those languages.

Based on this idea, this European project has been launched with the aim of enabling anyone, from research groups to society in general, to search, read, and interact with complex scientific data and documents using their own language. To achieve this, they will create bridges between languages that allow information to be translated accurately, using neurosymbolic artificial intelligence (combining language models and knowledge graphs).

This technology will initially be tested in the field of climatology, allowing experts from different countries to share and understand critical data on extreme events such as droughts, floods, or heat waves, regardless of the original language of that information.

CLASiK is a three-year European project of the CHIST-ER programm , coordinated by Jorge Gracia, researcher at I3A, in a consortium comprising the University of Zaragoza, the University of Grenoble-Alpes (France), the University of Tartu (Estonia), and the Pyrenean Institute of Ecology-CSIC.

Its starting point is to open the door to knowledge regardless of the language used. “Although English has been adopted as the lingua franca in almost all scientific disciplines, there is a significant amount of scientific output in other languages, for example, in French in biomedicine or in Spanish in ecology. Their use allows for the inclusion of diverse cultural and epistemological perspectives, which enriches the process of knowledge generation,” explains Jorge Gracia.

Local languages can offer unique concepts and ways of understanding natural phenomena that are not found in dominant languages such as English. However, the predominance of English limits the visibility of research conducted in other languages, which, in turn, affects international collaboration and the dissemination of results. “The lack of resources to access scientific results in multiple languages hinders access to global knowledge and discourages authors from publishing in their own language,” notes the I3A researcher and CLASiK coordinator.

 

Use case: weather conditions

The aim of this project is to facilitate seamless and interoperable access to monolingual and multilingual scientific and technological data hubs and repositories. Interested users will be able to interact with them in their own language and access knowledge expressed in other languages. These techniques will be demonstrated through their implementation in the use case of climatology, in particular the study of extreme weather events.

They will create and apply neurosymbolic AI approaches, in combination with various linguistic services, for the extraction and semantic annotation of scientific data and documents in several languages, with the aim of constructing an interconnected multilingual scientific knowledge graph for the field of climatology; interlingual access to this knowledge graph and its annotated data, regardless of the natural language used, and the translation of the retrieved data and documents into the user's language.

Ultimately, CLASiK seeks to ensure that the scientific knowledge needed to address global challenges does not remain inaccessible behind language barriers.

Explanatory video of the project (created with AI): https://youtu.be/E_Vp-4-HJIs

Caption: Meeting in Zaragoza of the CLASiK project research team.