python - using an if statement over two columns in pandas -
i trying calculate distance between shotpoints in seismic navigation file multiple lines. current code follows:
def delimiter(filename, a, b, c, d, e, f): data = pd.read_fwf(filename, names=[a, b ,c ,d ,e ,f ], header=none) data['lineshift'] = data['line'].shift(-1) data['bool'] = data['lineshift'] == data['line'] _, row in data.iterrows(): data['spdif'] = np.abs(data['sp'].astype(float) - data['sp'].astype(float).shift(-1)) data['xdiff'] = data['x'] - data['x'].shift(-1) data['ydiff'] = data['y'] - data['y'].shift(-1) data['xydiff'] = np.sqrt(data['xdiff']**2 + data['ydiff']**2) data['spdist'] = data['xydiff']/data['spdif'] if row['line'] != row['lineshift']: data['spdif'] = data['spdif'].replace({0: np.nan}) data['xdiff'] = data['xdiff'].replace({0: np.nan}) data['ydiff'] = data['ydiff'].replace({0: np.nan}) data['xydiff'] = data['xydiff'].replace({0: np.nan}) data['spdist'] = data['spdist'].replace({0: np.nan}) data.info() print data delimiter(os.path.splitext(x)[0] + ".csv", "line", "sp", "xcoord", "ycoord", "x", "y")
this code loads csv shotpoint data pandas dataframe. however, want check if code not calculating distance between 2 shotpoints of different line. if 'line' column different 'lineshift' column of same row, want display n/a. if it's same should calculate 5 new columns specific row.
however when run code, gives following error:
valueerror: truth value of series ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all().
if possible, need add make code run , check every row?
an example of data in csv file:
line sp ycoord xcoord x y lineshift 8 761298 1080 521754.1n 65132.6e 255355 479838 761298 true 9 761298 1090 5218 2.5n 65154.3e 255760 480107 761298 true 10 761298 1100 521812.1n 65216.0e 256165 480410 761298 true 11 761298 1110 521820.7n 65236.8e 256554 480685 771022 false 12 771022 1020 521835.8n 65238.3e 256573 481153 771022 true 13 771022 1030 521841.0n 65245.2e 256700 481315 771022 true 14 771022 1040 521845.8n 65252.2e 256830 481466 771022 true
this: data['lineshift'] == data['line']
series, not boolean, if data['lineshift'] == data['line']
ambiguous.
i think meant test current row in loop, like:
_, row in data.iterrows(): if row['lineshift'] == row['line']: # ...
edit: fixes error reported, should not use loop here.
def delimiter(filename, a, b, c, d, e, f): data = pd.read_fwf(filename, names=[a, b ,c ,d ,e ,f ], header=none) data['lineshift'] = data['line'].shift(-1) data['bool'] = data['lineshift'] == data['line'] # calculate once data['spdif'] = np.abs(data['sp'].astype(float) - data['sp'].astype(float).shift(-1)) data['xdiff'] = data['x'] - data['x'].shift(-1) data['ydiff'] = data['y'] - data['y'].shift(-1) data['xydiff'] = np.sqrt(data['xdiff']**2 + data['ydiff']**2) data['spdist'] = data['xydiff'] / data['spdif'] data.loc[~data['bool'], ['spdif', 'xdiff', 'ydiff', 'xydiff', 'spdist']] = np.nan data.info() print data
wiki
Comments
Post a Comment