python - Sort rows and get column IDs in a pandas dataframe -

- February 25, 2011

with given pandas dataframe, i'd create new columns highest, second highest, third highest, etc... values in row. , create column corresponding column name of each of those. code below max value of row, not follow.

adapted find column name has maximum value each row

import pandas pd  df = pd.dataframe({'a': (23, 24, 55, 77, 33, 66),                    'b': (12, 33, 0.2, 44, 23.5, 66),                    'c': (1, 33, 66, 44, 5, 62),                    'd': (9, 343, 4, 64, 24, 63),                    'e': (123, 33, 2.2, 42, 2, 99)})  # determine max value , column name , add columns df df['max1'] = df.max(axis=1) df['col_max1'] = df.idxmax(axis=1)  # determine 2nd , 3rd max pr , threshold levels , add columns # ???????????  print(df)

this produces:

        b   c    d      e   max1 col_max1 0  23  12.0   1    9  123.0  123.0        e 1  24  33.0  33  343   33.0  343.0        d 2  55   0.2  66    4    2.2   66.0        c 3  77  44.0  44   64   42.0   77.0        4  33  23.5   5   24    2.0   33.0        5  66  66.0  62   63   99.0   99.0        e  process finished exit code 0

only caveat possible have large number of columns, if matters performance. guys.

one approach using underlying array data focus on performance -

a = df.values c = df.columns idx = a.argsort(1)[:,::-1] vals = a[np.arange(idx.shape[0])[:,none], idx] ids = c[idx]  names_vals = ['max'+str(i+1) in range(a.shape[1])] names_ids = ['col_max'+str(i+1) in range(a.shape[1])]  df_vals = pd.dataframe(vals, columns=names_vals) df_ids = pd.dataframe(ids, columns=names_ids) df_out = pd.concat([df, df_vals, df_ids], axis=1)

sample input, output -

in [40]: df out[40]:          b   c    d      e 0  23  12.0   1    9  123.0 1  24  33.0  33  343   33.0 2  55   0.2  66    4    2.2 3  77  44.0  44   64   42.0 4  33  23.5   5   24    2.0 5  66  66.0  62   63   99.0  in [41]: df_out out[41]:          b   c    d      e   max1  max2  max3  max4  max5 col_max1 col_max2  \ 0  23  12.0   1    9  123.0  123.0  23.0  12.0   9.0   1.0        e           1  24  33.0  33  343   33.0  343.0  33.0  33.0  33.0  24.0        d        e    2  55   0.2  66    4    2.2   66.0  55.0   4.0   2.2   0.2        c           3  77  44.0  44   64   42.0   77.0  64.0  44.0  44.0  42.0               d    4  33  23.5   5   24    2.0   33.0  24.0  23.5   5.0   2.0               d    5  66  66.0  62   63   99.0   99.0  66.0  66.0  63.0  62.0        e        b       col_max3 col_max4 col_max5   0        b        d        c   1        c        b          2        d        e        b   3        c        b        e   4        b        c        e   5               d        c

if need values , ids in sequence, need modify last few steps there -

df0 = pd.dataframe(np.dstack((vals, ids)).reshape(a.shape[0],-1)) df0.columns = np.vstack((names_vals, names_ids)).t.ravel() df_out = pd.concat([df, df0], axis=1)

sample output -

in [62]: df_out out[62]:          b   c    d      e max1 col_max1 max2 col_max2  max3 col_max3 max4  \ 0  23  12.0   1    9  123.0  123        e   23           12        b    9    1  24  33.0  33  343   33.0  343        d   33        e    33        c   33    2  55   0.2  66    4    2.2   66        c   55            4        d  2.2    3  77  44.0  44   64   42.0   77          64        d    44        c   44    4  33  23.5   5   24    2.0   33          24        d  23.5        b    5    5  66  66.0  62   63   99.0   99        e   66        b    66          63       col_max4 max5 col_max5   0        d    1        c   1        b   24          2        e  0.2        b   3        b   42        e   4        c    2        e   5        d   62        c

wiki

Search This Blog

tL

python - Sort rows and get column IDs in a pandas dataframe -

Comments

Post a Comment

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

Asterisk AGI Python Script to Dialplan does not work -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -