python - How to use dropna on dataframes with dtype=str? -
when have dataframe this:
import pandas pd import numpy np df = pd.dataframe(np.nan, index=list('abc'), columns=list('def'), dtype=float) df.set_value('a', 'd', 4.0) df.set_value('b', 'e', 10.0) d e f 4.0 nan nan b nan 10.0 nan c nan nan nan
i can rid of rows contain nans
calling:
df = df.dropna(how='all')
which yields
d e f 4.0 nan nan b nan 10.0 nan
how 1 same on dataframe initialized dtype=str
? following not work:
df2 = pd.dataframe(np.nan, index=list('abc'), columns=list('def'), dtype='str') df2.set_value('a', 'd', 'foo') df2.set_value('b', 'e', 'bar') d e f foo n n b n bar n c n n n
then command
df2 = df2.dropna(how='all')
returns unmodified dataframe.
call df.replace
first, , df.dropna
:
in [1576]: df2.replace('n', np.nan).dropna(how='all') out[1576]: d e f foo nan nan b nan bar nan
this seems straightforward option. see, you've lost nan
s once initialise dataframe dtype=str
, more of best guess replacement (you have legitimate non-nan
entries n
flagged false positives , removed).
here's similar solution john galt, keeps nan
s:
in [1584]: df2[~df2.eq('n')].dropna(how='all') out[1584]: d e f foo nan nan b nan bar nan
expanding on andrew l's comment, don't need convert dtype=str
set values. can use .loc
based indexing instead:
in [1586]: df2 = pd.dataframe(np.nan, index=list('abc'), columns=list('def')) ...: df2.loc['a', 'd'] = 'foo' ...: df2.loc['b', 'e'] = 'bar' ...: in [1587]: df2 out[1587]: d e f foo nan nan b nan bar nan c nan nan nan
and now,
in [1588]: df2.dropna(how='all') out[1588]: d e f foo nan nan b nan bar nan
wiki
Comments
Post a Comment