Research in Language Technology (LT)
Within the framework of the project, research in the field of LT will be carried out, foreseeing the development in all five priority areas of applied knowledge and skills for the recovery and transformation of the national economy:
-
national language resources and platforms for their use and analysis;
-
language models and toolkits for automatic text analysis and synthesis;
-
language technologies for processing audiovisual materials;
-
technologies, tools and infrastructure languages for learning technology skills and promoting inclusive education;
-
a translation process automation platform for the development and use of translation technologies for training.
National language resources and platforms for their use and analysis
The aim of the research is to provide the language resources (data sets) necessary for learning the skills of developing and using LT, taking into account the specifics of Latvia, as well as a software platform for quantitative and qualitative analysis of the language in these resources.
The research will be conducted in three main directions, developing and expanding:
-
Representative corpora of Latvian text and speech.
-
Machine-readable and computational lexical resources of wide coverage of Latvian language.
-
Open access platforms (sets of software tools and workflows) for the use and analysis of language data.
Language models and toolkits for automatic text analysis and synthesis
The aim of the research is to develop widely used computational Latvian language models, grammars and lexicons (in a multilingual context), as well as universally combinable software bases.
Three main directions are planned in the study:
-
Using deep machine learning methods, develop language models and LT software components adapted for text analysis and synthesis.
-
Improvement and expansion of technological support.
-
Developing a scalable software platform.
Language technologies for processing audiovisual materials
The aim of the research is to develop innovative language technologies for processing monolingual and multilingual audiovisual data, which would promote the acquisition of LT skills and thus also the wider use of LT in the development of various products and services.
Two main directions are planned in the study:
-
Development of Latvian language speech recognition and speech synthesis models.
-
The translation and localisation tools of foreign language audiovisual materials and their creation methods, technologies for localisation of educational materials, evaluation of different localisation strategies with empirical methods.
Technologies, tools and infrastructure languages for learning technology skills and promoting inclusive education
The aim of the research is to create and accumulate language resources and tools that would provide support for the successful learning of language technologies, using modern content and skills learning tools, promoting inclusive education and providing the infrastructure of language resources and tools necessary for education and research.
Three main directions are planned in the study:
-
Principles and technologies of creating virtual assistants.
-
Language technologies for students with special needs.
-
Integration of language resources and tools created in research activities and the study process into the European research infrastructure, promoting the learning of language technologies and ensuring the sustainability of project results.
Translation process automation platform for training the development and use of translation technologies
The aim of the research is to create technological means that allow students to create machine translation systems and serve as a training environment for the use of translation technologies.
Research is planned in three main directions:
-
Tools and technologies for building neural machine translation systems.
-
Computer-aided translation tools, machine translation and terminology support in computer-aided machine translation.
-
A unified translation memory system that enables translation data to be stored, shared, and reused.