Query github data using Hadoop -
i trying query github data provided ghtorrent api using hadoop. how can inject data(4-5 tb) hdfs? also, databases real time. possible process real time data in hadoop using tools such pig, hive, hbase?
go through this presentation . has described way can connect mysql or mongodb instance , fetch data. have share public key, add key repository , can ssh. alternative can download periodic dumps this link
imp link :
for processing real time data, cannt uisng pig, hive. batch processing tools. consider using apache spark.
wiki
Comments
Post a Comment