TEXTA TOOLKIT

TEXTA Toolkit is a set of tools meant for completing most common text analytics tasks. Its main components are:

SEARCHER

The Searcher application is responsible for both creating the searches for Toolkit’s other applications and browsing-summarizing the data. In addition to exploring data, users can also cluster documents together, build aggregations, visualize the connections between different facts found in the index and look for similar documents.

LEXICON MINER

In order to expand searches, users can train language models and automatically extract terminology from a dataset through the lexicon miner app. Given a base word, the app suggest users other similar term candidates which can then be added to a lexicon. Lexicon miner is 100% language independent.

TAGGER

Application enables automatic categorisation of documents into the predefined classes. A user can dictate the appropriate classes and training data.

ENTITY EXTRACTOR

Users can manually mark parts of text as “facts” and train statistical models so that other similar cases would be found and extracted.

TEXTA Toolkit has some built in preprocessors and new ones can be added based on clients needs.

MULTILINGUAL PREPROCESSOR

TEXTA Multilingual Processor, which enables to automatically extract names, e-mails, phone numbers, addresses and locations from text. It works in Estonian, Russian and English and the output can be analyzed in TEXTA Toolkit.

SENTIMENT ANALYZER

The preprocessor provides Lexicon-based and Model-based approach for sentiment analysis. Lexicon based method is simpler and more naive compared to the Model-based method, but could nonetheless be useful and provide decent results, especially in the lack of sufficient training data for Model-based approach.