machine learning - Apache Spark user-user recommendation? -




i have data set of questions , answers users have completed choices. i'm trying build user-user recommendation engine find similar users based on answers quesitons. important point questions shuffled , not in order , data streaming.

so each user have data this:

user_1: {"question_1": "choice_1", ...} user_2: {"question_3": "choice_4", ...} user_3: {"question_1": "choice_3", ...} 

i have found tutorials user-item recommendations, nothing user-user recomenndations.

i've realized clustering , cosine similarity might options , i've found columnsimilarity efficient.

rows = sc.parallelize([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])  mat = rowmatrix(rows)  sims = mat.columnsimilarity() 

i have 2 questions:

is wise define each user column , question/choices rows result need?

and how should vectorize kind of data numbers? if need clustering.

thanks in advance :)

unfortunately, that's not way can done. it's true, isn't ?

columnsimilarity used skinny , tall matrices, if have user-user matrix on wish perform task, won't work. e.g if have 1m users)

from description, see have might have short , wide matrix, columnsimilarity won't work you.

if wish perform uucf, clustering way go. (among others, lsh approach.)





wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -