java - Joining 2 Spark dataframes, getting result as list -




i trying join 2 dataframes, have result list of rows of right dataframe (ddf in example) in column of left dataframe (cdf in example). made work 1 column, having issues adding more columns.

    seq<string> joincolumns = new set2<>("c1", "c2").toseq();     dataset<row> alldf = cdf.join(ddf, joincolumns, "inner");     alldf.printschema();     alldf.show();      dataset<row> aggdf = alldf.groupby(cdf.col("c1"), cdf.col("c2"))             .agg(collect_list(col("c50")));     aggdf.show(); 

output:

+--------+-------+---------------------------+ |c1      |c2     |collect_list(c50)          | +--------+-------+---------------------------+ |    3744|1160242|         [6, 5, 4, 3, 2, 1]| |    3739|1150097|                        [1]| |    3780|1159902|            [5, 4, 3, 2, 1]| |     132|1200743|               [4, 3, 2, 1]| |    3778|1183204|                        [1]| |    3766|1132709|                        [1]| |    3835|1146169|                        [1]| +--------+-------+---------------------------+ 

also, there way like:

    dataset<row> aggdf = alldf.groupby(cdf.col("*"))             .agg(collect_list(col("c50"))); 

for second part of question, can do:

    string[] fields = cdf.columns();     column[] columns = new column[fields.length];     (int = 0; < fields.length; i++) {         columns[i] = cdf.col(fields[i]);     }     dataset<row> sdf = alldf.groupby(columns).agg(...); 




wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -