machine learning - Apache Spark user-user recommendation? -
i have data set of questions , answers users have completed choices. i'm trying build user-user recommendation engine find similar users based on answers quesitons. important point questions shuffled , not in order , data streaming.
so each user have data this:
user_1: {"question_1": "choice_1", ...} user_2: {"question_3": "choice_4", ...} user_3: {"question_1": "choice_3", ...}
i have found tutorials user-item recommendations, nothing user-user recomenndations.
i've realized clustering , cosine similarity might options , i've found columnsimilarity efficient.
rows = sc.parallelize([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]) mat = rowmatrix(rows) sims = mat.columnsimilarity()
i have 2 questions:
is wise define each user column , question/choices rows result need?
and how should vectorize kind of data numbers? if need clustering.
thanks in advance :)
unfortunately, that's not way can done. it's true, isn't ?
columnsimilarity
used skinny , tall matrices, if have user-user matrix on wish perform task, won't work. e.g if have 1m users)
from description, see have might have short , wide matrix, columnsimilarity
won't work you.
if wish perform uucf, clustering way go. (among others, lsh approach.)
wiki
Comments
Post a Comment