Components of Apache Hadoop 2.6.X and later

Apache Hadoop System has been classified in components with their specific role to have loose coupling architecture.

Following are the components :

  • Commons :
    1. Apache hadoop provides the web support for monitoring the HDFS via proxy server.
    2. Direct writing capability to the graphite.
  • HDFS
    1. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run oncommodity hardware. It has many similarities with existing distributed file systems.However, the differences from other distributed file systems are significant.
    2. HDFS is highlyfault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides highthroughput access to application data and is suitable for applications that have large datasets.
    3. HDFS relaxes a few POSIX requirements to enable streaming access to file systemdata.
    4. HDFS was originally built as infrastructure for the Apache Nutch web search engineproject. HDFS is now an Apache Hadoop subproject.
  • Map Reduce

    1. The key technology for Hadoop is the MapReduce programming model and HadoopDistributed File System.
    2. The operation on large data is not possible in serial programming paradigm.
    3. Map Reduce do task parallel to accomplish work in less time which is the maintain of this technology. Map Reduce require special file system. In the real scenario,data which are in terms on pera-bytes. To store and maintain this much data on distributed commodity hardware, Hadoop Distributed File System is invented. It is basically inspiredby Google File System.
    4. MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster (ifall nodes use the same hardware) or a grid (if the nodes use different hardware).
    5. Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).
  • Yarn

    1. Rest API introduced to submit and monitor ongoing job.
    2. Yarn support task scheduling among nodes for provided Job. By default, FIFO Schedulor is allocate the task to the node.
    3. Fair and Capacity Schedulor are available for scheduling the task which are introduced by facebook and yahoo respectively for their specific business requirement
    4. Capability to develop the schedulor as per business requirement and easily pluggable to the Hadoop system as a part of YARN.
    5. Resource management and Job scheduling/monitoring are major functioanlity of yarn.This idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM).

    More information in detail is available in respective component articles. 

Comments

Post a Comment

Popular posts from this blog

WES Verification Process for GTU Students

Introduction to Apache Hadoop

HDFS Architecture