python - Combining columns in pandas -
i have script output of multiple columns put beneath each other. columns merged , drop duplicates. i've tried merge, combine, concatenate , joining, can't seem figure out. tried merge list, doesn't seem well. below code:
import pandas pd data = pd.excelfile('path') newlist = [x x in data.sheet_names if x.startswith("zzz")] x in newlist: sheets = pd.read_excel(data, sheetname = x) column = sheets.loc[:,'yyy']
any appreciated!
edit
some more info code: data excelfile loaded. @ newlist, sheetnames start zzz shown. in for-loop, these sheets called. @ column, columns named yyy called. these columns put beneath each other, aren't merged yet. example: here output of columns now , them 1 list 1 17.
i hope more clear now!
edit 2.0
here tried concat method mentioned below. however, still output picture above shows instead of list 1 17.
my_concat_series = pd.series() x in newlist: sheets = pd.read_excel(data, sheetname = x) column = sheets.loc[:,'yyy'] my_concat_series = pd.concat([my_concat_series,column]).drop_duplicates() print(my_concat_series)
i don't see how pandas.concat
doesn't work, let's try example corresponding data picture posted:
import pandas pd col1 = pd.series(np.arange(1,12)) 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 dtype: int64 col2 = pd.series(np.arange(7,18)) 0 7 1 8 2 9 3 10 4 11 5 12 6 13 7 14 8 15 9 16 10 17 dtype: int64
and use pd.concat
, drop_duplicates
pd.concat([col1,col2]).drop_duplicates() 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 5 12 6 13 7 14 8 15 9 16 10 17 dtype: int64
you can reshape data way want them, instance if don't want duplicate index:
pd.concat([col1,col2]).drop_duplicates().reset_index(drop = true)
,
or if want values numpy array instead of pandas series:
pd.concat([col1,col2]).drop_duplicates()
note in last case can use numpy
arrays begginning, faster:
import numpy np np.unique(np.concatenate((col1.values,col2.values)))
if want them list:
list(pd.concat([col1,col2]).drop_duplicates())
wiki
Comments
Post a Comment