Back to Case Studies Next

Machine Learning Spark Project

Challenge

Identification, classification, and categorization of the data in any language
Encrypting and protecting the data
Monitoring, tracking and controlling the data
Detection of anomalies
Governing the data and addressing GDPR

Solution

Application of the unsupervised, semi-supervised and supervised methods
Implementation of deep learning approaches to reach the high precision and process big amounts of data
Text summarization methods in combination with statistical solutions allowing to interpret each step
Using Spark Streaming to analyze data in real-time

Technology stack

Hadoop, Apache Spark, HBase, Apache Tika, Play, StanfordNLP, Deeplearning4J, Spray/Akka, Ambari

Result

As a result, our experts made it possible to analyze every single electronic data file, understand its content in 70 languages and predict its level of confidentiality and business category. The targeted data-centric approach was applied in this case. The solution was to allow the platform to automatically correlate the knowledge with the user access rights to draw a clear picture.