Spark research and algorithm - Scala. Extreme Machine Learning
To classify the tweets messages by Life Events.
The key strategy adopted in this work is to obtain a relatively clean training dataset from large quantity of Twitter data by relying on minimum efforts of human supervision, and sometimes is at the sacrifice of recall. To achieve this goal, we rely on a couple of restrictions and manual screenings, such as relying on replies, LDA topic identification and seed screening. Each part of system depends on the early steps. As result we have got the class with Life Event for each twitter message.