Query github data using Hadoop -




i trying query github data provided ghtorrent api using hadoop. how can inject data(4-5 tb) hdfs? also, databases real time. possible process real time data in hadoop using tools such pig, hive, hbase?

go through this presentation . has described way can connect mysql or mongodb instance , fetch data. have share public key, add key repository , can ssh. alternative can download periodic dumps this link

imp link :

for processing real time data, cannt uisng pig, hive. batch processing tools. consider using apache spark.





wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -