UN #LinksSDGs. Natural language processing and data visualization challenge
To build an automated tool that extracts from a set of UN publications all the messages that relate to the relationships between urban development (SDG 11), and all the other SDG areas, and then visualize the results.
We have followed the next algorithm:
1. Extract text data from pdf
2. Add missing punctuation to ease splitting by sentences
3. Split text by sentences
4. For every sentence:
4.1. Apply lemmatizer and stemmer to sentence and keywords to get base form of the words
4.2. Search for SDG keywords in sentence
4.3. Add all found matches to the result list
5. Classify sentences from result list by 3 types: causal, constraint and recommendation. And detect direction (A causes B).