DiscoverText

About our text analysis data science software

Collaborative text analytics for human and machine-learning

We provide dozens of multilingual, text mining, data science, human annotation, and machine-learning features. DiscoverText offers a range of simple to advanced cloud-based software tools empowering users to quickly and accurately evaluate large amounts of text data. Our users work via a point and click graphical user interface in web browsers to sort unstructured free text common in market research, as well as associated metadata, also found in customer feedback platforms, CRMs, chats, email, large scale HR, customer satisfaction, or other open-ended answers on surveys, public comment to government agencies, X/Twitter, RSS feeds, and other forms of text data. Students and professors get free access, training, and project support directly from the founder.

Collect, clean, and analyze text data

Unstructured text data is messy

Data scientists working on text analytics and machine-learning know cleaning data can be time consuming. Users of DiscoverText build reusable custom machine classifiers or “sifters” to find the most (or least) relevant items before using other classifiers for sorting items into topic, sentiment, and other categories. DiscoverText combines hybrid data science methods (ex., crowdsourcing, measurement, adjudication, iteration, replication, annotator ranking) along with established e-discovery and information retrieval text analytics tools, to shorten a process that used to last weeks or months when words get sorted in spreadsheets. Our machine-learning sifters are created in hours or even just a few minutes working alone or using crowdsourcing. Academics trust DiscoverText to help them do better and more transparent scientific research resulting in scholarly publications. Legal teams use our document redaction capability to remove names, metadata, email addresses, and other sensitive information to produce Bates-stamped and spreadsheet-indexed PDF collections.

Humans and machines classify text

Point-and-click software anyone can master

We have been doing this work in groups since 2005. Humans are good at some things and computers are good at others. A consistent back and forth between humans and machines increases the ability of both to learn. Our text analytics software and data science methods originate in a decade of National Science Foundation-funded research into the measurements that accelerate machine-learning. Text classification is an old, hard problem, according to no less than Plato. Our unique and proven method of adjudication creates gold standard training sets for machine-learning by ranking human annotators over time. A patented CoderRank approach is critical for ensuring accurate, reliable results when the work of humans or machines is finally evaluated. DiscoverText machine-learning is powered by uClassify.

eDiscovery tools that work

Advanced search and sampling techniques

Deduplication and automated clustering of near-duplicates gives users a high level sense of the data landscape. With Twitter data, these groupings are a roadmap to the digital footprint of viral Tweets. With public comment data, these groupings are form letters and modified forms. In large-scale surveys, duplicates and near duplicates are frequently held but independently expressed opinions among customers or employees. Our interactive machine classifier histograms allow data science teams to identify the items in a collection that add the most value when coded by humans. These text analytics tools enable purposive sampling that further accelerates the process of training machine classifiers.