- Amirgodshi, S., Rajendran, M., Hall, B. & Mei, S. (2017). Mastering Machine Learning with Apache Spark 2.x. Packt Publishing.
- Teller, A., Pumperla, M. & Malohlava, M. (2015). Advanced Analytics with Spark: Patterns for Learning from Data at Scale. O’Reilly
- A collection of scientific articles will be used in addition to the above literature..
EduSinglePage
This course is offered as part of coursepackage:
Course content
The aim of this course is for the student to develop in in-depth understanding of big data analytics on cloud computing infrastructures, and how software is made available in cloud services. In group projects, the student will also develop their ability to handle big data processing using tools such as Apache Spark.
The course contains the following elements:
- Ecosystem for big data processing
- Large-scale data storage (including cloud file systems, cloud object stores, archival storage)
- Data analytics with Apache Spark
- Spark’s programming model with RDD
- Spark applications with Hadoop/AWS
- Spark SQL
- Alternatives to SQL-based databases for big data
- Streaming with Spark
- Machine learning with Spark MLlib
- Advanced real-world applications with Spark
Entry requirements
- CD122A Databases (7.5 credits)
- CD631E Artificial Intelligence for Data Science (15 credits)
- CM661E Exploratory Data Analysis, Visualization and Storytelling (7.5 credits)
Course literature
Course evaluation
Malmö University provides students who participate in, or who have completed a course, with the opportunity to express their opinions and describe their experiences of the course by completing a course evaluation administered by the University. The University will compile and summarise the results of course evaluations. The University will also inform participants of the results and any decisions relating to measures taken in response to the course evaluations. The results will be made available to the students (HF 1:14).