python - Combining columns in pandas -

- July 25, 2011

i have script output of multiple columns put beneath each other. columns merged , drop duplicates. i've tried merge, combine, concatenate , joining, can't seem figure out. tried merge list, doesn't seem well. below code:

import pandas pd data = pd.excelfile('path') newlist = [x x in data.sheet_names if x.startswith("zzz")]  x in newlist:     sheets = pd.read_excel(data, sheetname = x)     column = sheets.loc[:,'yyy']

any appreciated!

edit

some more info code: data excelfile loaded. @ newlist, sheetnames start zzz shown. in for-loop, these sheets called. @ column, columns named yyy called. these columns put beneath each other, aren't merged yet. example: here output of columns now , them 1 list 1 17.

i hope more clear now!

edit 2.0

here tried concat method mentioned below. however, still output picture above shows instead of list 1 17.

my_concat_series = pd.series() x in newlist:     sheets = pd.read_excel(data, sheetname = x)     column = sheets.loc[:,'yyy']     my_concat_series = pd.concat([my_concat_series,column]).drop_duplicates()     print(my_concat_series)

i don't see how pandas.concat doesn't work, let's try example corresponding data picture posted:

import pandas pd col1 = pd.series(np.arange(1,12)) 0      1 1      2 2      3 3      4 4      5 5      6 6      7 7      8 8      9 9     10 10    11 dtype: int64  col2 = pd.series(np.arange(7,18)) 0      7 1      8 2      9 3     10 4     11 5     12 6     13 7     14 8     15 9     16 10    17 dtype: int64

and use pd.concat , drop_duplicates

pd.concat([col1,col2]).drop_duplicates()  0      1 1      2 2      3 3      4 4      5 5      6 6      7 7      8 8      9 9     10 10    11 5     12 6     13 7     14 8     15 9     16 10    17 dtype: int64

you can reshape data way want them, instance if don't want duplicate index:
pd.concat([col1,col2]).drop_duplicates().reset_index(drop = true),

or if want values numpy array instead of pandas series:

pd.concat([col1,col2]).drop_duplicates()

note in last case can use numpy arrays begginning, faster:

import numpy np np.unique(np.concatenate((col1.values,col2.values)))

if want them list:

list(pd.concat([col1,col2]).drop_duplicates())

wiki

Search This Blog

tL

python - Combining columns in pandas -

Comments

Post a Comment

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

Asterisk AGI Python Script to Dialplan does not work -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -