Python: Merge Pandas Data Frame with Pivot Table -




i have 2 following dataframes want merge.

df1:      id   time                  station 0       22.08.2017 12:00:00   a1 1     b   22.08.2017 12:00:00   a3 2       22.08.2017 13:00:00   a2 ...  pivot:       station               a1     a2     a3 0     time 1     22.08.2017 12:00:00   10     12     11 2     22.08.2017 13:00:00   9      7      3 3     22.08.2017 14:00:00   2      3      4 4     22.08.2017 15:00:00   3      2      7 ... 

it should like:

merge:       id   time                  station   value 0       22.08.2017 12:00:00   a1        10 1     b   22.08.2017 12:00:00   a3        11 2       22.08.2017 13:00:00   a2        7 ... 

now want add column in data frame right value pivot table. failed including column labels merge. constructed that, not work:

merge = pd.merge(df1, pivot, how="left", left_on=["time", "station"], right_on=["station", pivot.columns]) 

any help?

edit:

as advised, instead of pivot table tried use following data:

df2: time                 station   value 22.08.2017 12:00:00  a1        10 22.08.2017 12:00:00  a2        12 22.08.2017 12:00:00  a3        11               ... 22.08.2017 13:00:00  a1        9 22.08.2017 13:00:00  a2        7 22.08.2017 13:00:00  a3        3 

the table contains 1300 different stations every timestamp. in have more 115.000.000 rows. df1 have 5.000.000 rows.

now tried merge df1.head(100) , df2, in result values nan. therefore used this:

merge = pd.merge(df1.head(100), df2, how="left", on=["time", "station"]) 

another problem merge takes few minutes expect whole df1 take several days.

i guess got dataframe pivot using either pivot or pivot_table in pandas, if can perform merge using dataframe had before pivot should work fine.

otherwise have reverse pivot using melt before merging:

melt = pd.concat([pivot[['time']],pivot[['a1']].melt()],axis = 1) melt = pd.concat([melt,pd.concat([pivot[['time']],pivot[['a2']].melt()],axis = 1)]) melt = pd.concat([melt,pd.concat([pivot[['time']],pivot[['a3']].melt()],axis = 1)]) melt.columns = ['time','station','value'] 

then perform merge expected it:

my_df.merge(melt,on = ['time','station'])      id  time    station value 0     time1   a1      10 1   b   time1   a3      11 2     time2   a2      7 

edit:

if dataframes big in edit, indeed have perform merges on chunks of them. try reduce chunk both dataframes.

first, sort df1 in order have close values of time:

df1.sort_values('time',inplace = true) 

then chunk it, chunk second dataframe in way sure have rows might need, , merge chunks:

chunk1 = df1.head(100) chunk2 = df2.loc[df2.time.between(chunk1.time.min(),chunk1.time.max())] merge = chunk1.merge(chunk2,on = ['time','station'],how = 'left') 




wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -