Spark Binary Data processing
We have tdms files generated out of sound which is in binary format, also we have python program to convert that program into graph.
What we are looking for is to upload multiple tdms files into HDFS, proccess the same using python using Spark.
Main steps were:
1. Create AWS cluster
2. Load tdms files into the AWS cluster
3. Convert Python reader into Pyspark
4. Clean the data with RDDs
5. Transform data to graph building
6. Define properties of graph
7. Building graph
8. Create tables and store data analysis
9. Convert the results to needed format
10. send an email alert to user based on data thresholds
11. Ensure all the source data, intermediate data, results, graphs are stored in cluster