tōku

tōku is a text mining demo by Chris Emmery. It builds on techniques demonstrated in this paper.

What does it do?

tōku pulls the maximum amount of recent tweets from the timeline of the provided @handle, on which applies multiple text mining techniques (e.g. sentiment analyis, author profiling).

What's so special about that?

tōku only relies on distant supervision. There was never a time-intensive annotation procedure required for telling the algorithms the correct labels for classification.

How does that even work?

tōku uses simple heuristics to gather this information from the Twitter crowd. This is not always reliable, but it is demonstrated to work on par with human-provided labels ― despite noise and inaccuracies introduced by the heuristics. Moreover, given that it finds relevant data itself, it's also quite cheap (in money, computation- and human time).

Why should I care?

tōku demonstrates that gathering some rough intuitions about you and your use of social media only requires other people to share details about themselves. As such, it's possible to make inferences about you without you explicitly sharing this information. Twitter is a great example of a social media platform where information like age, gender, and 'likes' are generally not directly provided by its users. This demo shows that they can be uncovered regardless, without investing a significant amount of resources.

Hah, it doesn't work for <some handle>!

tl;dr: ¯\_(ツ)_/¯ ― if you're brave enough, you can read the limitations here.