python - How to use dropna on dataframes with dtype=str? -
when have dataframe this:
import pandas pd import numpy np df = pd.dataframe(np.nan, index=list('abc'), columns=list('def'), dtype=float) df.set_value('a', 'd', 4.0) df.set_value('b', 'e', 10.0) d e f 4.0 nan nan b nan 10.0 nan c nan nan nan i can rid of rows contain nans calling:
df = df.dropna(how='all') which yields
d e f 4.0 nan nan b nan 10.0 nan how 1 same on dataframe initialized dtype=str? following not work:
df2 = pd.dataframe(np.nan, index=list('abc'), columns=list('def'), dtype='str') df2.set_value('a', 'd', 'foo') df2.set_value('b', 'e', 'bar') d e f foo n n b n bar n c n n n then command
df2 = df2.dropna(how='all') returns unmodified dataframe.
call df.replace first, , df.dropna:
in [1576]: df2.replace('n', np.nan).dropna(how='all') out[1576]: d e f foo nan nan b nan bar nan this seems straightforward option. see, you've lost nans once initialise dataframe dtype=str, more of best guess replacement (you have legitimate non-nan entries n flagged false positives , removed).
here's similar solution john galt, keeps nans:
in [1584]: df2[~df2.eq('n')].dropna(how='all') out[1584]: d e f foo nan nan b nan bar nan expanding on andrew l's comment, don't need convert dtype=str set values. can use .loc based indexing instead:
in [1586]: df2 = pd.dataframe(np.nan, index=list('abc'), columns=list('def')) ...: df2.loc['a', 'd'] = 'foo' ...: df2.loc['b', 'e'] = 'bar' ...: in [1587]: df2 out[1587]: d e f foo nan nan b nan bar nan c nan nan nan and now,
in [1588]: df2.dropna(how='all') out[1588]: d e f foo nan nan b nan bar nan wiki
Comments
Post a Comment