Running Hadoop MapReduce jobs with ScaleOut hServer

ScaleOut hServer executes MapReduce jobs without using the Hadoop job tracker/task tracker infrastructure. The operations are performed through an invocation grid (IG), that is, a set of worker JVMs, each of which is started by its corresponding IMDG grid service. The intermediate data between mappers and reducers are stored in the IMDG. If the input or output format specified for the MapReduce job does not use HDFS as a data store, it is not required to install the Apache (or any other) Hadoop distribution or start Hadoop processes on the IMDG servers. If the job uses HDFS for input and/or output, the name nodes and data nodes must be running for the job to complete.

Requirements

The following requirements apply to MapReduce applications executed using ScaleOut hServer:

  • The job must use the new MapReduce API (org.apache.hadoop.mapreduce).
  • If a combiner is specified, it must emit no more than one key value pair per call, and the emitted key must be the same as the parameter key.
  • The input/output keys and values of the mapper and the reducer must implement Writable or Serializable. If sorting is enabled, the mapper output key must also implement WritableComparable or Comparable.