r - Randomly remove duplicated rows using dplyr() -




as follow-up question one: remove duplicated rows using dplyr, have following:

how randomly remove duplicated rows using dplyr() (among others)?

my command is

data.uniques <- distinct(data, keyvariable, .keep_all = true) 

but returns first occurrence of keyvariable. want behaviour random: anywhere between 1 , n occurrences of keyvariable.

for instance:

keyvariable bmi 1 24.2 2 25.3 2 23.2 3 18.9 4 19 4 20.1 5 23.0 

currently command returns

keyvariable bmi 1 24.2 2 25.3 3 18.9 4 19 5 23.0 

i want randomly return 1 of n duplicated rows, instance:

keyvariable bmi 1 24.2 2 23.2 3 18.9 4 19 5 23.0 

just shuffle rows before selecting first occurrence (using distinct).

library(dplyr) distinct(df[sample(1:nrow(df)), ],           keyvariable,           .keep_all = true) 




wiki

Comments

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -

Asterisk AGI Python Script to Dialplan does not work -