Big data processing of Wikipedia data
- to calculate the number of Wikipedia pages views per last year and per last 90 days;
- create a General table with the rank of each page;
- pages with the same number of views should have the same rank.
Our main steps were to establish proper big data pipeline:
- collecting data;
- punctuation removing;
- date format recogniting;
- language filtering;
- aggregating by needed pages;
- table building;
- rank calculating.