Founded in 2017 TEXTA is the first
language technology start up in Estonia
Founded in 2017 by Silver Traat and Raul Sirel. Their ambition was to build a language technology company using state-of-the-art AI capability to service customers in ever-thinkable text analytics verticals.
Having contributed to text analytics research already before founding TEXTA and continuing this work every single day now - TEXTA holds mastery in language technology.
Silver Traat is the co-founder and CEO of TEXTA. He has worked as Head of Business Development in STACC and International Projects Manager in Eurecat Competence Center in Spain.
Raul Sirel is the co-founder and CTO of TEXTA. He has obtained MA in Computational Linguistics from University of Tartu. Raul has worked as visiting researcher in University of Western Sydney and NICTA Canberra Research Lab and as researcher and project leader in STACC.
Our team core competence is in text analytics, natural language processing, machine learning, and artificial intelligence.
Our AI-based and language independent products enable our customers to improve customer experience, increase efficiency, extract value from unstructured data, manage compliance risks, automate processes, and build online trust and safety.
Hybrid Tagger – An Industry-driven Solution for Extreme Multi-label Text Classification
This paper presents an industry-driven solution for extreme multi-label classification with a massive label collection. The proposed approach incorporates a large number of binary classification models with label pre-filtering and employs methods and technologies shown to be applicable in industrial scenarios where high-end computational hardware is limited. The system is evaluated on an Estonian newspaper article dataset which contains almost 2000 unique labels and has shown to perform over 80 times faster than applying all the binary models of the entire label set without negative impact on prediction scores.
Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
EMBEDDIA Tools, Datasets and Challenges: Resources and Hackathon Contributions
This paper presents tools and data sources collected and released by the EMBEDDIA project, supported by the European Union’s Horizon 2020 research and innovation program. The collected resources were offered to participants of a hackathon organized as part of the EACL Hackashop on News Media Content Analysis and Automated Report Generation in February 2021. The hackathon had six participating teams who addressed different challenges, either from the list of proposed challenges or their own news-industry-related tasks. This paper goes beyond the scope of the hackathon, as it brings together in a coherent and compact form most of the resources developed, collected and released by the EMBEDDIA project. Moreover, it constitutes a handy source for news media industry and researchers in the fields of Natural Language Processing and Social Science.