Intrusion Detection System
The task had a research background. Architectures for a real-time intrusion detection system (IDS) using a powerful big data technology Apache Spark together with Spark Streaming, Spark MLlib, Apache Kafka and Hbase / Cassandra should be developed. For the aim of capabilities, obtained results and performance comparison, the Naive Bayesian classifier was used in both the stack of the mentioned above tools and in the stack of Apache Hadoop, HStreaming and Apache Hive. For attack type prediction in KDD’99, NSL-KDD, CSIC 2010 HTTP, UNB ISCX, DARPA datasets were used few ML algorithms such as Naive Bayes, random forest, logistic regression, gradient boosting trees, SVM, etc. provided by Spark MLlib together with the best ML data processing and training techniques for selection the best model for a dataset. The proposed system architecture is evaluated with respect to accuracy in terms of true positive (TP) and false positive (FP), with respect to efficiency in terms of processing time and by comparing results with traditional techniques. The bunch of Apache Kafka and Spark Streaming serves as a distributed, fault tolerant, real time big data stream processor. Results of prediction (classification) were saved in Hbase for generating of the IDS’s work statistics. A web-based management console renders a lot of visualizations using D3.js library that help users to quickly assess threats and analyze network traffic.
Our main steps were to establish proper big data pipeline:
1. Investigate the current stage in the field of problem statement.
2. Define conditions for implementing of a real-time application.
3. Choose stacks of technologies.
4. Build architectures.
5. Prepare ways of architecture implementations and approaches comparison.