sql - Using Google Big Query to Organise an Edgelist for Spreadsheet Use -




i'm having trouble trying reorganise large edgelist have exported .csv gephi.

the structure of data is:

  • two columns, "source" , "target".
  • "source" contains user ids , target contains demographic information such city, country, , university.
  • due structure of data, user may duplicated in "source" if have multiple demographic records.

for example:

source      target  user142     citya  user532     countrya  user352     cityc  user532     citya  user143     countryc  user532     university1 

my desired output each unique user positioned on own row, separate columns "city", "country", , "university" - this:

unique user    city     country     university  user142        citya  user532        citya    countrya    university1  user352        cityc  user143                 countryc 

i've been able separate edgelist desired output in excel using number of steps using following index, match, , array formulae. lengthy, , excel crashes when large edgelists involved.

excel process:

step 1:

=index($a$2:$a$5819, match(0, countif($d$1:$d1, $a$2:$a$5819), 0)) 

this filters users unique records (cola) , places them in new column (cold).

step 2:

=arrayformula(iferror(index($b$2:$b$174, match(0, countif($d2:d2,$b$2:$b$174)+if($a$2:$a$174<>$d2, 1, 0), 0)), 0)) 

this looks down "target" column (b), against unique users in column created in step 1 (cold). not order output across columns, however. left right may read city, country, university, or country, university, city, etc each unique user.

step 3:

=transpose(sort(transpose(e2:h2))) 

this step orders output alphabetically across columns, left-right columns read city, country, university.

my main problem slow , manual process, i'm looking utilise sql in big query make flow efficient possible.

any information on how begin structuring query sort data desired output highly appreciated.

i have experience sql, has been in terms of text extraction , regex.

thanks!

in bigquery, conditional aggregation. this:

select source,        max(case when type = 'city' target end) city,        max(case when type = 'country' target end) country,        max(case when type = 'university' target end) university demographics d group source; 

your question doesn't specify column type. seems need 1 if have different types of information.





wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -