Big Data Analytics: Distributed Computing Environment & Map Reduce

kr 303,07 NOK

Big Data Analytics: Distributed Computing Environment & Map Reduce

Course Overview
This study material delves into the essentials of distributed computing and the Map-Reduce paradigm. It covers the underlying technology stack, the need for computational models, and explores parallel computing concepts and their applications using Map-Reduce. Learn how Map-Reduce processes data efficiently and how it is implemented in Hadoop.

Key Topics Covered:

  • Introduction:

    • Technology Stack: Overview of the technology stack used in distributed computing environments.
    • Computational Models: Understanding why computational models are essential for managing complex data processing tasks.
    • Shared Memory Infrastructure: Explore the shared memory model and its role in parallel computing.
    • Distributed Infrastructure: Learn about distributed computing infrastructure and how it supports scalable and efficient data processing.
  • Parallel Computing Speedup:

    • Parallel Computing / Speedup: Concepts of speedup in parallel computing and how parallel tasks can reduce computation time.
    • Parallel Computing / Efficiency: Discussion on efficiency in parallel computing and factors affecting it.
    • Considerations: Key considerations when implementing parallel computing solutions.
  • Example: Counting Words:

    • Word Count Example: Practical example of a word count problem, demonstrating different paradigms.
    • Paradigms:
      • Shared Memory: Implementation of the word count using the shared memory paradigm.
      • Message Passing: Implementation of the word count using the message-passing paradigm.
  • Map-Reduce:

    • Overview: Introduction to the Map-Reduce paradigm and its significance in distributed data processing.
    • Key-Value Input Data: Understanding the key-value input data format used in Map-Reduce.
    • Map-Reduce Idea: Basic concepts and principles behind the Map-Reduce model.
    • The Paradigm - Formally: Formal description of the Map-Reduce paradigm and its components.
    • Map-Reduce Driver Algorithm: Detailed explanation of the Map-Reduce driver algorithm and its workflow.
    • Word Count Example: Step-by-step example of implementing word count using Map-Reduce.
    • Hadoop Example:
      • Map: Understanding the map phase in Hadoop’s Map-Reduce framework.
      • Reduce: Understanding the reduce phase in Hadoop’s Map-Reduce framework.
      • Main: Overview of the main components and execution process in a Hadoop Map-Reduce job.
    • Execution and Fault Tolerance: Insights into how Map-Reduce handles job execution and ensures fault tolerance.
    • Parallel Efficiency of Map-Reduce: Evaluation of the parallel efficiency of Map-Reduce and factors affecting its performance.

Why Choose This Material?

  • Comprehensive understanding of distributed computing and Map-Reduce.
  • Practical examples and exercises to illustrate concepts and implementations.
  • Ideal for students, data engineers, and Big Data professionals who want to master Map-Reduce and its application in distributed environments.

This material is perfect for individuals seeking to understand and apply distributed computing techniques and Map-Reduce for efficient data processing.

Dropdown