hadoop - How to run mapreduce on Hbase Exported table -

- June 24, 2014

i ran following command export hbase table , transfer hdfs

hbase org.apache.hadoop.hbase.mapreduce.export "financiallineitem" "export/output"   hadoop fs -copytolocal export/output/part-m-00000  /home/cloudera/trf/sequence

above command has kicked in mapreduce , transferred table data hdfs in sequence file format .

now want write map reduce read key ,value sequence file ,but getting following error .

17/08/21 20:43:38 warn mapred.localjobrunner: job_local386751553_0001 java.lang.exception: java.io.ioexception: not find deserializer value class: 'org.apache.hadoop.hbase.client.result'. please ensure configuration 'io.serializations' configured, if you're using custom serialization.at org.apache.hadoop.mapred.localjobrunner$job.run(localjobrunner.java:406) caused by: java.io.ioexception: not find deserializer value class: 'org.apache.hadoop.hbase.client.result'. please ensure configuration 'io.serializations' configured, if you're using custom serialization. @ org.apache.hadoop.io.sequencefile$reader.init(sequencefile.java:1973

this driver code .

public class driver extends configured implements tool {     private static configuration hbaseconf = null;       private configuration gethbaseconfiguration() {         try {             if (hbaseconf == null) {                 hbaseconf = hbaseconfiguration.create();              }         } catch (exception e) {             e.printstacktrace();         }         return hbaseconf;     }      public static void main(string[] args) throws exception {         int exitcode = toolrunner.run(new driver(), args);         system.exit(exitcode);     }      public int run(string[] args) throws exception {         if (args.length != 2) {             //system.err.printf("usage: %s needs 2 arguments   files\n",getclass().getsimplename());             return -1;         }          string outputpath = args[1];          if (hbaseconf == null)             hbaseconf = gethbaseconfiguration();          filesystem hfs = filesystem.get(getconf());         job job = new job();         job.setjarbyclass(driver.class);         job.setjobname("sequencefilereader");          hdfsutil.removehdfssubdirifexists(hfs, new path(outputpath), true);          fileinputformat.addinputpath(job, new path(args[0]));         fileoutputformat.setoutputpath(job, new path(args[1]));          job.setinputformatclass(sequencefileinputformat.class);           job.setoutputkeyclass(immutablebyteswritable.class);         job.setoutputvalueclass(result.class);           job.setmapperclass(map.class);          job.setnumreducetasks(0);          int returnvalue = job.waitforcompletion(true) ? 0 : 1;          if (job.issuccessful()) {             system.out.println("successful");         } else if (!job.issuccessful()) {             system.out.println("failed");         }          return returnvalue;     } }

and below mapper code .

public class map extends mapper <immutablebyteswritable, result, text, text>{      private text text = new text();      @override     public void map(immutablebyteswritable key, result value, context context) throws ioexception, interruptedexception {         context.write(text, text);     }   }

i don't know missing here.

here needed make work

because use hbase store our data , reducer outputs result hbase table, hadoop telling doesn’t know how serialize our data. why need it. inside setup set io.serializations variable

hbaseconf.setstrings("io.serializations", new string[]{hbaseconf.get("io.serializations"), mutationserialization.class.getname(), resultserialization.class.getname()});

wiki

Search This Blog

tL

hadoop - How to run mapreduce on Hbase Exported table -

Comments

Post a Comment

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

Asterisk AGI Python Script to Dialplan does not work -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -