Machine Learning Spark Project

Machine Learning Spark Project

Challenge

  • Identification, classification, and categorization of the data in any language
  • Encrypting and protecting the data
  • Monitoring, tracking and controlling the data
  • Detection of anomalies
  • Governing the data and addressing GDPR

Solution

  • Application of the unsupervised, semi-supervised and supervised methods
  • Implementation of deep learning approaches to reach the high precision and process big amounts of data
  • Text summarization methods in combination with statistical solutions allowing to interpret each step
  • Using Spark Streaming to analyze data in real-time

Technology stack

Hadoop, Apache Spark, HBase, Apache Tika, Play, StanfordNLP, Deeplearning4J, Spray/Akka, Ambari

Result

As a result, our experts made it possible to analyze every single electronic data file, understand its content in 70 languages and predict its level of confidentiality and business category. The targeted data-centric approach was applied in this case. The solution was to allow the platform to automatically correlate the knowledge with the user access rights to draw a clear picture.