PCS951 Distributed Computing Systems for High Volume Data Processing

Course description for academic year 2021/2022

Contents and structure

Access to high data volumes is a common feature of research projects in several fields of engineering and science. These amounts of data require novel approaches to extract key contents in a timely and efficient manner. Grid computing is an approach to handle heterogeneous processing in several distributed computing facilities. Cloud computing provides seamless access to remote facilities. The course concentrates on challenges related to safe and efficient utilisation of computing resources managed by heterogeneous operators, including protection concerns between the project owner and facility management. Software technology used for such systems is applied and configured.

This course covers technology and principles of grid and cloud computing, and gives practical introduction to grid middleware. The course also covers topics from current research in development and use of modern systems for distributed computing, including the use of cloud resources for grid computing. Virtualization is covered as a method to obtain task distribution on a global scale.

Learning Outcome

Upon completion of the course the candidate should be able to:

Knowledge

  • discuss challenges and solutions for high volume data processing.
  • explain the philosophy of cloud and grid computing.
  • identify tasks well suited for the different distributed computing models.
  • assess selected research papers in the field of high volume data processing.
  • explain the different cloud service models.
  • describe the different hypervisor models used for virtualization.
  • explain the MapReduce programming model.

Skills

  • define and monitor job management, storage management and security in a grid system.
  • design and implement applications of Service Oriented Computing at a global scale.
  • design, implement and run applications on a MapReduce framework.
  • design, implement and run tasks through a computer clustering management platform.

General competence

  • evaluate and apply distributed computing computing resources using textual and graphical interfaces.
  • revise application software to make it suitable for distributed computing.

Entry requirements

General admission criteria for the PhD programme.

Recommended previous knowledge

Experience with using a Unix/Linux operating system.

Teaching methods

There will be 2-4 smaller exercises that must be approved in order to enroll of the exam. In addition, there will be a larger project covering problems to be solved using modern systems for distributed computing, cloud computing or virtualization. The project should include both a theoretic study and a practical problem solution. The theoretic study should be presented as a lecture and the practical solution in a shorter oral presentation. The project should also be documented in a written report, covering both the theoretic study and the practical problem solution.

Compulsory learning activities

There will be 2-4 smaller exercises that must be approved in order to enroll of the exam.

The assignments must be submitted within set deadlines and must be approved before examination can take place.

Approved assignments are valid for the examination semester and 2 following semesters.

Assessment

Grading according to the A-F scale based on an oral exam and the project report. The project report will have a weight of 40 % in the final grade. The oral exam counts for 60% of the final grade.

Both parts must get a passing grade in order to get a final grade for the course. In case one of the parts gets a failing grade, that part can be taken as a re-sitting/postponed exam.

Course reductions

  • DAT351 (1) - Distributed Computing Resources for High Volume Data Processing - Reduction: 10 studypoints