Python: Merge Pandas Data Frame with Pivot Table -
i have 2 following dataframes want merge.
df1: id time station 0 22.08.2017 12:00:00 a1 1 b 22.08.2017 12:00:00 a3 2 22.08.2017 13:00:00 a2 ... pivot: station a1 a2 a3 0 time 1 22.08.2017 12:00:00 10 12 11 2 22.08.2017 13:00:00 9 7 3 3 22.08.2017 14:00:00 2 3 4 4 22.08.2017 15:00:00 3 2 7 ...
it should like:
merge: id time station value 0 22.08.2017 12:00:00 a1 10 1 b 22.08.2017 12:00:00 a3 11 2 22.08.2017 13:00:00 a2 7 ...
now want add column in data frame right value pivot table. failed including column labels merge. constructed that, not work:
merge = pd.merge(df1, pivot, how="left", left_on=["time", "station"], right_on=["station", pivot.columns])
any help?
edit:
as advised, instead of pivot table tried use following data:
df2: time station value 22.08.2017 12:00:00 a1 10 22.08.2017 12:00:00 a2 12 22.08.2017 12:00:00 a3 11 ... 22.08.2017 13:00:00 a1 9 22.08.2017 13:00:00 a2 7 22.08.2017 13:00:00 a3 3
the table contains 1300 different stations every timestamp. in have more 115.000.000 rows. df1 have 5.000.000 rows.
now tried merge df1.head(100) , df2, in result values nan. therefore used this:
merge = pd.merge(df1.head(100), df2, how="left", on=["time", "station"])
another problem merge takes few minutes expect whole df1 take several days.
i guess got dataframe pivot
using either pivot
or pivot_table
in pandas, if can perform merge using dataframe had before pivot should work fine.
otherwise have reverse pivot using melt
before merging:
melt = pd.concat([pivot[['time']],pivot[['a1']].melt()],axis = 1) melt = pd.concat([melt,pd.concat([pivot[['time']],pivot[['a2']].melt()],axis = 1)]) melt = pd.concat([melt,pd.concat([pivot[['time']],pivot[['a3']].melt()],axis = 1)]) melt.columns = ['time','station','value']
then perform merge expected it:
my_df.merge(melt,on = ['time','station']) id time station value 0 time1 a1 10 1 b time1 a3 11 2 time2 a2 7
edit:
if dataframes big in edit, indeed have perform merges on chunks of them. try reduce chunk both dataframes.
first, sort df1
in order have close values of time:
df1.sort_values('time',inplace = true)
then chunk it, chunk second dataframe in way sure have rows might need, , merge chunks:
chunk1 = df1.head(100) chunk2 = df2.loc[df2.time.between(chunk1.time.min(),chunk1.time.max())] merge = chunk1.merge(chunk2,on = ['time','station'],how = 'left')
wiki
Comments
Post a Comment