pandas - How to lookup values for multiple columns in one dataframe from another -




i have dataframe (df1) movie titles:

movie1              movie2              desired dinosaur planet     screamers           favorite brunette immortal beloved    strange relations   chump change clifford            lady chatterley     invader zim 

and dataframe (df2) vector representations of each movie:

id  year    title              genre            word vector 1   2003.0  dinosaur planet    documentary      [-0.55423898, -0.72544044, 0.33189204, -0.1720... 2   2004.0  isle of man        sports & fitness [-0.373265237, -1.07549703, -0.469254494, -0.4... 3   1997.0  character          foreign          [-1.57682264, -0.91265768, 2.43038678, -0.2114... 4   1994.0  & dance     sports & fitness [0.3096168, -0.57186663, 0.39008939, 0.2868615... 

my goal sum movie1 + movie2 in df1 , find closest matches in df2 using cosine similarity , see if matches desired result in in df1. this, need find vector representation of each movie in df1 , sum first 2 columns.

i've written function looks movie vectors df2

def find_movie_vec(movie_str, df): ''' given movie string , dataframe 'title' feature, find movies titles containing string assumes no ambiguity within movie string title ''' row = df[df['title'].str.contains(movie_str)] return row.iloc[0]['word vector']  

i'm unsure how apply across dataframe , make separate column vector representation of movie1, movie2, , desired movie. tried:

df1.apply(find_movie_vec(...)) 

however apply takes in 1 argument functions thinking along lines of:

df1.apply(lambda x, y: find_movie_vec(df1['movie1'], df2) + find_movie_vec(df1['movie2']), df2)) 

but i'm not sure if cleanest or correct approach. suggestions appreciated!





wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -