python - How to keep column MultiIndex values when merging pandas DataFrames -




i have 2 pandas dataframes, below:

df1 = pd.dataframe({('q1', 'subq1'):[1, 2, 3], ('q1', 'subq2'):[1, 2, 3], ('q2', 'subq1'):[1, 2, 3]}) df1['id'] = ['a', 'b', 'c']  df2 = pd.dataframe({'item_id': ['a', 'b', 'c'], 'url':['a.com', 'blah.com', 'company.com']}) 

df1:

     q1          q2 id   subq1 subq2 subq1    0     1     1     1  1     2     2     2  b 2     3     3     3  c 

df2:

  item_id          url 0              a.com 1       b     blah.com 2       c  company.com 

note df1 has columns hierarchical indexing (eg. ('q1', 'subq1')) , normal indexing (eg. id).

i want merge these 2 data frames on id , item_id fields. using:

result = pd.merge(df1, df2, left_on='id', right_on='item_id') 

gives:

   (q1, subq1)  (q1, subq2)  (q2, subq1) (id, ) item_id          url 0            1            1            1                   a.com 1            2            2            2      b       b     blah.com 2            3            3            3      c       c  company.com 

as can see, merge works fine, multiindex has been lost , has reverted tuples. i've tried recreate multiindex using pd.multiindex.from_tuples, in:

result.columns = pd.multiindex.from_tuples(result) 

but causes problems item_id , url columns, taking first 2 characters of names:

     q1          q2 id             u   subq1 subq2 subq1     t            r 0     1     1     1          a.com 1     2     2     2  b  b     blah.com 2     3     3     3  c  c  company.com 

converting columns in df2 one-element tuples (ie. ('item_id',) rather 'item_id') makes no difference.

how can merge these 2 dataframes , keep multiindex properly? or alternatively, how can take result of merge , columns proper multiindex without mucking names of item_id , url columns?

if can't beat 'em, join 'em. (make both dataframes have same number of index levels before merging):

import pandas pd  df1 = pd.dataframe({('q1', 'subq1'):[1, 2, 3], ('q1', 'subq2'):[1, 2, 3], ('q2', 'subq1'):[1, 2, 3]}) df1['id'] = ['a', 'b', 'c']  df2 = pd.dataframe({'item_id': ['a', 'b', 'c'], 'url':['a.com', 'blah.com', 'company.com']})  df2.columns = pd.multiindex.from_product([df2.columns, ['']]) result = pd.merge(df1, df2, left_on='id', right_on='item_id') print(result) 

yields

     q1          q2 id item_id          url   subq1 subq2 subq1                         0     1     1     1               a.com 1     2     2     2  b       b     blah.com 2     3     3     3  c       c  company.com 

this avoids userwarning:

pandas/core/reshape/merge.py:551: userwarning: merging between different levels can give unintended result (2 levels on left, 1 on right)





wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -