Query github data using Hadoop -




i trying query github data provided ghtorrent api using hadoop. how can inject data(4-5 tb) hdfs? also, databases real time. possible process real time data in hadoop using tools such pig, hive, hbase?

go through this presentation . has described way can connect mysql or mongodb instance , fetch data. have share public key, add key repository , can ssh. alternative can download periodic dumps this link

imp link :

for processing real time data, cannt uisng pig, hive. batch processing tools. consider using apache spark.





wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -