Advanced Analytics and Machine Learning (AAML)
Block Lecture in the Summer Semester 2021
Prof. Dr. D. Kranzlmüller
Dr. Andre Luckow
Maximilian Höb
News
- 13.05.2021: The oral exam will take place on Friday, May 28, 1pm to 6pm CEST. Approximate schedule. Please be available during the entire slot.
- 13.05.2021: Project Papers
- 01.04.2021: Project Topic Assignments have been posted.
- 20.03.2021: The first lecture will take place on Saturday, March 27, 9 am s.t., Zoom Link
- 10.02.2021: Dates: The lecture will take place remote via Zoom on March, 27, 29 - April 1, and April 6, 10, 2021
Content
The ongoing data deluge driven by the increasing digitalization of science, society and industry, leads to a significant increase in demand for data storage, processing and analytics within several industrial domains. Sciences and industry are overwhelmed by the need to store large amounts of transactional and machine-generated data resulting from the customer, service and manufacturing processes. Examples of machine-generated data are server logs as well as sensor data that is generated in finer granularities and frequencies. Further, datasets are often enriched with web and open data from social media, blogs or other open data sources. The Internet of Things (IoT) will further blur the boundaries between the physical and the digital world causing an even further increase in the digital footprint of the world. In this course, we will learn about data applications and their requirements. Further, we will discuss the core infrastructure necessary to handle the large data volumes and analytical problems. As part of the exercises students will utilize different frameworks, e.g., MapReduce, Spark and Tensorflow/Keras, to implement different algorithms.
This class will cover the following topics:
- Data Applications in Industry and Sciences
- Distributed Resource Management: YARN, Mesos and Kubernetes
- Data Processing Engines: Spark, Flink
- SQL Query Engines: Hive, Spark-SQL, Presto
- Stream Processing: Kafka, Spark Streaming, Flink, Heron
- Machine Learning (Methods & Tools, Scikit-Learn, MLLib)
- Deep Learning: Convolutional Neural Networks (Tensorflow, Keras)
- Natural Language Processing: Word Embeddings, Language Models (RNNs, LSTMs, Transformers)
- Scalable Machine Learning: Distributed Training
- AI Ethics
The course will be offered as a block lecture. The lecture will be held in English.
Audience
The lecture is aimed at master's and bachelor's degree students in the computer science and data science programs.
Rules for Online Teaching
While LMU is closed, most teaching happens currently online. As teachers, we ask you to be forgiving if things should not work perfectly right away, and we hope for your constructive participation. In this situation, we would also like to explicitly point out some rules, which would be self-evident in real life:
- In live meetings, we ask you to responsibly deal with audio (off by default) and bandwidth (video as needed).
- Recording or redirecting streams by participants is not allowed.
- Distributing content (video, audio, images, PDFs, etc.) in other channels than those foreseen by the author is not allowed.
If you violate one of these rules, you can expect to be expelled from the respective course, and we reserve the right for further action. With all others, we are looking forward to the joint experiment of an "online semester".
Exercises
Exercises and code for the exercise are under:
https://github.com/scalable-infrastructure/exercise-students-2021 verfügbar.
Scope and Exam
Die Vorlesung ist zweistündig und besitzt eine Übungen (6 ECTS).
The final grade of the event is determined based on a project work and an oral examination. In order to be admitted, the exercise must be passed. For the lecture to be successful, a grade of at least 4 must be achieved.
Pre-Requisites
Attendance of the lectures on computer networks and distributed systems, operating systems, computer architecture or comparable knowledge
required. Programming knowledge in Python and handling Linux command line required.
Time and Location
Time / Dates : March, 27, 29 - April 1, and April 6, 10, 2021
Location: Zoom (Invite will be send to all participants)
Enrollment: The places will be allocated via UniWorX: Uni2Work-Application.
We ask you to describe your previous knowledge in your application and to motivate your participation.
Downloads
Introduction, HPC, Hadoop
Distributed Execution Engines: Spark and SQL, Introduction Machine Learning
Deep Learning (Computer Vision)
NLP and Scalable ML
Benchmarks, MLOps, Responsible AI
Exercise Solutions
Contact
For questions or inquiries please contact Andre Luckow.