python - Pandas conditional creating columns issues -
i have sample data set,
import pandas pd df = { 'columa':['1a','ws rank','rank','ws rank','rank','drank'], 'value': [ 1, 12, 34, 50, 3,2] } df = pd.dataframe(df)
1. want create column 'hp', columna rows 'ws rank' , 'rank' , 'drank', if value 1 hp 25, if value 2 hp 24...etc.
first created smaller dataset contain rows because real data set big. concatenate dataset , original dataset include 'hp' column. when concatenated datasets there duplicated rows. there must easier way.
my code:
dfrank=df[df["columa"].str.contains('ws rank|rank')] dfrank['value'] = dfrank['value'].astype(int) dfrank.loc[dfrank.value == 1, 'hp'] = 25 dfrank.loc[dfrank.value == 2, 'hp'] = 24 dfrank.loc[dfrank.value == 3, 'hp'] = 23 dfrank.loc[dfrank.value == 4, 'hp'] = 22 dfrank.loc[dfrank.value == 5, 'hp'] = 21 dfrank.loc[dfrank.value == 6, 'hp'] = 20 dfrank.loc[dfrank.value == 7, 'hp'] = 19 dfrank.loc[dfrank.value == 8, 'hp'] = 18 dfrank.loc[dfrank.value == 9, 'hp'] = 17 dfrank.loc[dfrank.value == 10, 'hp'] = 16 dfrank.loc[dfrank.value == 11, 'hp'] = 15 dfrank.loc[dfrank.value == 12, 'hp'] = 14 dfrank.loc[dfrank.value == 13, 'hp'] = 13 dfrank.loc[dfrank.value == 14, 'hp'] = 12 dfrank.loc[dfrank.value == 15, 'hp'] = 11 dfrank.loc[dfrank.value == 16, 'hp'] = 10 dfrank.loc[dfrank.value == 17, 'hp'] = 9 dfrank.loc[dfrank.value == 18, 'hp'] = 8 dfrank.loc[dfrank.value == 19, 'hp'] = 7 dfrank.loc[dfrank.value == 20, 'hp'] = 6 dfrank.loc[(dfrank.value > 20)&(dfrank.value <= 50), 'hp'] = 5 df2=pd.concat([df, dfrank])
is there easier way conditions? keep getting error message, but think i'm using form it's suggesting : settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_indexer,col_indexer] = value instead
see caveats in documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy dfrank['value'] = dfrank['value'].astype(int) h:/code/pythonscripts/python_work/dataset1.py:20: settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_indexer,col_indexer] = value instead
see caveats in documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy dfrank.loc[dfrank.value == 1, 'hp'] = 25 c:\users\amywang\appdata\local\continuum\anaconda3\lib\site-packages\pandas\core\indexing.py:477: settingwithcopywarning: value trying set on copy of slice dataframe. try using .loc[row_indexer,col_indexer] = value instead
see caveats in documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[item] = s
2. want create 'hppoint' column groups 'columa' values , sums 'hp' values, didn't work , returned null
df2['hppoint']=df2.groupby('columa')['hp'].sum()
in pandas, indexing dataframe returns reference initial dataframe
when selecting data , storing in new variable. should copy
dataframe use .loc
new dataframe i.e
dfrank=df[df["columa"].str.contains('ws rank|rank')].copy()
this create new index , indexing new dataframe.
since want map data can rid of lot lines creating dictionary
, a mask
, .loc
, can fill nan values using fillna
i.e
dicct = {1:25,2:24,3:23,4:22,5:21,6:20,7:19,8:18,9:17,10:16,11:15,12:14,13:13,14:12,15:11,16:10,17:9,18:8,19:7,20:6} df['hp'] = 0 mask=df["columa"].str.contains('ws rank|rank') df.loc[mask,'hp'] = df.loc[mask,'value'].map(dicct).fillna(5)
output :
columa value hp 0 1a 1.0 0.0 1 ws rank 14.0 12.0 2 rank 5.0 21.0 3 ws rank 5.0 21.0 4 rank 23.0 5.0 5 drank 24.0 5.0 in [ ]:
if want fill new column groupby sum
can use transform
i.e
df['hppoint']=df.groupby('columa')['hp'].transform(sum)
output :
columa value hp hppoint 0 1a 1.0 0.0 0.0 1 ws rank 14.0 12.0 33.0 2 rank 5.0 21.0 26.0 3 ws rank 5.0 21.0 33.0 4 rank 23.0 5.0 26.0 5 drank 24.0 5.0 5.0
hope helps
wiki
Comments
Post a Comment