pandas - How to lookup values for multiple columns in one dataframe from another -
i have dataframe (df1) movie titles:
movie1 movie2 desired dinosaur planet screamers favorite brunette immortal beloved strange relations chump change clifford lady chatterley invader zim
and dataframe (df2) vector representations of each movie:
id year title genre word vector 1 2003.0 dinosaur planet documentary [-0.55423898, -0.72544044, 0.33189204, -0.1720... 2 2004.0 isle of man sports & fitness [-0.373265237, -1.07549703, -0.469254494, -0.4... 3 1997.0 character foreign [-1.57682264, -0.91265768, 2.43038678, -0.2114... 4 1994.0 & dance sports & fitness [0.3096168, -0.57186663, 0.39008939, 0.2868615...
my goal sum movie1 + movie2 in df1 , find closest matches in df2 using cosine similarity , see if matches desired result in in df1. this, need find vector representation of each movie in df1 , sum first 2 columns.
i've written function looks movie vectors df2
def find_movie_vec(movie_str, df): ''' given movie string , dataframe 'title' feature, find movies titles containing string assumes no ambiguity within movie string title ''' row = df[df['title'].str.contains(movie_str)] return row.iloc[0]['word vector']
i'm unsure how apply across dataframe , make separate column vector representation of movie1, movie2, , desired movie. tried:
df1.apply(find_movie_vec(...))
however apply takes in 1 argument functions thinking along lines of:
df1.apply(lambda x, y: find_movie_vec(df1['movie1'], df2) + find_movie_vec(df1['movie2']), df2))
but i'm not sure if cleanest or correct approach. suggestions appreciated!
wiki
Comments
Post a Comment