hadoop - How to run mapreduce on Hbase Exported table -
i ran following command export hbase table , transfer hdfs
hbase org.apache.hadoop.hbase.mapreduce.export "financiallineitem" "export/output" hadoop fs -copytolocal export/output/part-m-00000 /home/cloudera/trf/sequence
above command has kicked in mapreduce , transferred table data hdfs in sequence file format .
now want write map reduce read key ,value sequence file ,but getting following error .
17/08/21 20:43:38 warn mapred.localjobrunner: job_local386751553_0001 java.lang.exception: java.io.ioexception: not find deserializer value class: 'org.apache.hadoop.hbase.client.result'. please ensure configuration 'io.serializations' configured, if you're using custom serialization.at org.apache.hadoop.mapred.localjobrunner$job.run(localjobrunner.java:406) caused by: java.io.ioexception: not find deserializer value class: 'org.apache.hadoop.hbase.client.result'. please ensure configuration 'io.serializations' configured, if you're using custom serialization. @ org.apache.hadoop.io.sequencefile$reader.init(sequencefile.java:1973
this driver code .
public class driver extends configured implements tool { private static configuration hbaseconf = null; private configuration gethbaseconfiguration() { try { if (hbaseconf == null) { hbaseconf = hbaseconfiguration.create(); } } catch (exception e) { e.printstacktrace(); } return hbaseconf; } public static void main(string[] args) throws exception { int exitcode = toolrunner.run(new driver(), args); system.exit(exitcode); } public int run(string[] args) throws exception { if (args.length != 2) { //system.err.printf("usage: %s needs 2 arguments files\n",getclass().getsimplename()); return -1; } string outputpath = args[1]; if (hbaseconf == null) hbaseconf = gethbaseconfiguration(); filesystem hfs = filesystem.get(getconf()); job job = new job(); job.setjarbyclass(driver.class); job.setjobname("sequencefilereader"); hdfsutil.removehdfssubdirifexists(hfs, new path(outputpath), true); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job, new path(args[1])); job.setinputformatclass(sequencefileinputformat.class); job.setoutputkeyclass(immutablebyteswritable.class); job.setoutputvalueclass(result.class); job.setmapperclass(map.class); job.setnumreducetasks(0); int returnvalue = job.waitforcompletion(true) ? 0 : 1; if (job.issuccessful()) { system.out.println("successful"); } else if (!job.issuccessful()) { system.out.println("failed"); } return returnvalue; } }
and below mapper code .
public class map extends mapper <immutablebyteswritable, result, text, text>{ private text text = new text(); @override public void map(immutablebyteswritable key, result value, context context) throws ioexception, interruptedexception { context.write(text, text); } }
i don't know missing here.
here needed make work
because use hbase store our data , reducer outputs result hbase table, hadoop telling doesn’t know how serialize our data. why need it. inside setup set io.serializations variable
hbaseconf.setstrings("io.serializations", new string[]{hbaseconf.get("io.serializations"), mutationserialization.class.getname(), resultserialization.class.getname()});
wiki
Comments
Post a Comment