r - Randomly remove duplicated rows using dplyr() -
as follow-up question one: remove duplicated rows using dplyr, have following:
how randomly remove duplicated rows using dplyr() (among others)?
my command is
data.uniques <- distinct(data, keyvariable, .keep_all = true) but returns first occurrence of keyvariable. want behaviour random: anywhere between 1 , n occurrences of keyvariable.
for instance:
keyvariable bmi 1 24.2 2 25.3 2 23.2 3 18.9 4 19 4 20.1 5 23.0 currently command returns
keyvariable bmi 1 24.2 2 25.3 3 18.9 4 19 5 23.0 i want randomly return 1 of n duplicated rows, instance:
keyvariable bmi 1 24.2 2 23.2 3 18.9 4 19 5 23.0 
just shuffle rows before selecting first occurrence (using distinct).   
library(dplyr) distinct(df[sample(1:nrow(df)), ],           keyvariable,           .keep_all = true) wiki
Comments
Post a Comment