Back to Case Studies Next

Healthcare AI Startup Project

Challenge

Developing a high-load and effective system architecture for writing, reading and processing logs data
Understanding the meaning of the data fields and their importance
Preparing a list of all typical scenarios of abnormal behavior able to be retrieved from the data
Implementing anomaly detection
Distracting data from the storage (written in text files)

Solution

Application of Amazon Redshift for storing fast increasing amounts of data that provided effective access to the data
Preparation of the list of different types of suspicious users` activity that should be detected either right after it appeared or in the scheduled mode
Implementation of various approaches based on time series anomaly detection, classification algorithms such as isolation forest and one-class-SVM, clustering and dimension reduction techniques
Design of two unique approaches based on factorization machines. For hypothesis testing and model building, some of the data were saved both in parquet files and BigQuery tables for the fast processing in Apache Spark and other libraries
Proposition and development of the approach based on clustering algorithms and time series statistical methods for searching patterns of abnormal users` behavior

Technology stack

Python, PySpark, Parqeut, BigQuery, Amazon Redshift, Libfm & Libffm, Scikit-learn, Plotly, Matplotlib

Result

We have developed a system for monitoring the use of data from hospital patients, allowing identification of suspicious actions of users with a very small amount of false negatives; notifying security officers about any abnormal action right after it was done; investigation and analysis of the users’ activity visualized as dashboards; searching for suspicious patterns in the behavior of system users for unknown use-cases.