java - Joining 2 Spark dataframes, getting result as list -
i trying join 2 dataframes, have result list of rows of right dataframe (ddf in example) in column of left dataframe (cdf in example). made work 1 column, having issues adding more columns.
seq<string> joincolumns = new set2<>("c1", "c2").toseq(); dataset<row> alldf = cdf.join(ddf, joincolumns, "inner"); alldf.printschema(); alldf.show(); dataset<row> aggdf = alldf.groupby(cdf.col("c1"), cdf.col("c2")) .agg(collect_list(col("c50"))); aggdf.show();
output:
+--------+-------+---------------------------+ |c1 |c2 |collect_list(c50) | +--------+-------+---------------------------+ | 3744|1160242| [6, 5, 4, 3, 2, 1]| | 3739|1150097| [1]| | 3780|1159902| [5, 4, 3, 2, 1]| | 132|1200743| [4, 3, 2, 1]| | 3778|1183204| [1]| | 3766|1132709| [1]| | 3835|1146169| [1]| +--------+-------+---------------------------+
also, there way like:
dataset<row> aggdf = alldf.groupby(cdf.col("*")) .agg(collect_list(col("c50")));
for second part of question, can do:
string[] fields = cdf.columns(); column[] columns = new column[fields.length]; (int = 0; < fields.length; i++) { columns[i] = cdf.col(fields[i]); } dataset<row> sdf = alldf.groupby(columns).agg(...);
wiki
Comments
Post a Comment