machine learning - Apache Spark user-user recommendation? -




i have data set of questions , answers users have completed choices. i'm trying build user-user recommendation engine find similar users based on answers quesitons. important point questions shuffled , not in order , data streaming.

so each user have data this:

user_1: {"question_1": "choice_1", ...} user_2: {"question_3": "choice_4", ...} user_3: {"question_1": "choice_3", ...} 

i have found tutorials user-item recommendations, nothing user-user recomenndations.

i've realized clustering , cosine similarity might options , i've found columnsimilarity efficient.

rows = sc.parallelize([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])  mat = rowmatrix(rows)  sims = mat.columnsimilarity() 

i have 2 questions:

is wise define each user column , question/choices rows result need?

and how should vectorize kind of data numbers? if need clustering.

thanks in advance :)

unfortunately, that's not way can done. it's true, isn't ?

columnsimilarity used skinny , tall matrices, if have user-user matrix on wish perform task, won't work. e.g if have 1m users)

from description, see have might have short , wide matrix, columnsimilarity won't work you.

if wish perform uucf, clustering way go. (among others, lsh approach.)





wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -