@InterfaceAudience.Public @InterfaceStability.Stable public interface Reducer<K2,V2,K3,V3> extends JobConfigurable, Closeable
The number of Reducers for the job is set by the user via 
 JobConf.setNumReduceTasks(int). Reducer implementations 
 can access the JobConf for the job via the 
 JobConfigurable.configure(JobConf) method and initialize themselves. 
 Similarly they can use the Closeable.close() method for
 de-initialization.
Reducer has 3 primary phases:
Reducer is input the grouped output of a Mapper.
   In the phase the framework, for each Reducer, fetches the 
   relevant partition of the output of all the Mappers, via HTTP. 
   
The framework groups Reducer inputs by keys 
   (since different Mappers may have output the same key) in this
   stage.
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySortIf equivalence rules for keys while grouping the intermediates are 
   different from those for grouping keys before reduction, then one may 
   specify a Comparator via 
   JobConf.setOutputValueGroupingComparator(Class).Since 
   JobConf.setOutputKeyComparatorClass(Class) can be used to 
   control how intermediate keys are grouped, these can be used in conjunction 
   to simulate secondary sort on values.
In this phase the 
   reduce(Object, Iterator, OutputCollector, Reporter)
   method is called for each <key, (list of values)> pair in
   the grouped inputs.
The output of the reduce task is typically written to the 
   FileSystem via 
   OutputCollector.collect(Object, Object).
The output of the Reducer is not re-sorted.
Example:
     public class MyReducer<K extends WritableComparable, V extends Writable> 
     extends MapReduceBase implements Reducer<K, V, K, V> {
     
       static enum MyCounters { NUM_RECORDS }
        
       private String reduceTaskId;
       private int noKeys = 0;
       
       public void configure(JobConf job) {
         reduceTaskId = job.get(JobContext.TASK_ATTEMPT_ID);
       }
       
       public void reduce(K key, Iterator<V> values,
                          OutputCollector<K, V> output, 
                          Reporter reporter)
       throws IOException {
       
         // Process
         int noValues = 0;
         while (values.hasNext()) {
           V value = values.next();
           
           // Increment the no. of values for this key
           ++noValues;
           
           // Process the <key, value> pair (assume this takes a while)
           // ...
           // ...
           
           // Let the framework know that we are alive, and kicking!
           if ((noValues%10) == 0) {
             reporter.progress();
           }
         
           // Process some more
           // ...
           // ...
           
           // Output the <key, value> 
           output.collect(key, value);
         }
         
         // Increment the no. of <key, list of values> pairs processed
         ++noKeys;
         
         // Increment counters
         reporter.incrCounter(NUM_RECORDS, 1);
         
         // Every 100 keys update application-level status
         if ((noKeys%100) == 0) {
           reporter.setStatus(reduceTaskId + " processed " + noKeys);
         }
       }
     }
 Mapper, 
Partitioner, 
Reporter, 
MapReduceBase| 修飾子とタイプ | メソッドと説明 | 
|---|---|
| void | reduce(K2 key,
      Iterator<V2> values,
      OutputCollector<K3,V3> output,
      Reporter reporter)Reduces values for a given key. | 
configurevoid reduce(K2 key, Iterator<V2> values, OutputCollector<K3,V3> output, Reporter reporter) throws IOException
The framework calls this method for each 
 <key, (list of values)> pair in the grouped inputs.
 Output values must be of the same type as input values.  Input keys must 
 not be altered. The framework will reuse the key and value objects
 that are passed into the reduce, therefore the application should clone
 the objects they want to keep a copy of. In many cases, all values are 
 combined into zero or one value.
 
Output pairs are collected with calls to  
 OutputCollector.collect(Object,Object).
Applications can use the Reporter provided to report progress 
 or just indicate that they are alive. In scenarios where the application 
 takes a significant amount of time to process individual key/value 
 pairs, this is crucial since the framework might assume that the task has 
 timed-out and kill that task. The other way of avoiding this is to set 
 
 mapreduce.task.timeout to a high-enough value (or even zero for no 
 time-outs).
key - the key.values - the list of values to reduce.output - to collect keys and combined values.reporter - facility to report progress.IOExceptionCopyright © 2016 Apache Software Foundation. All rights reserved.