r - Randomly remove duplicated rows using dplyr() -
as follow-up question one: remove duplicated rows using dplyr, have following:
how randomly remove duplicated rows using dplyr() (among others)?
my command is
data.uniques <- distinct(data, keyvariable, .keep_all = true)
but returns first occurrence of keyvariable. want behaviour random: anywhere between 1
, n
occurrences of keyvariable.
for instance:
keyvariable bmi 1 24.2 2 25.3 2 23.2 3 18.9 4 19 4 20.1 5 23.0
currently command returns
keyvariable bmi 1 24.2 2 25.3 3 18.9 4 19 5 23.0
i want randomly return 1 of n
duplicated rows, instance:
keyvariable bmi 1 24.2 2 23.2 3 18.9 4 19 5 23.0
just shuffle rows before selecting first occurrence (using distinct
).
library(dplyr) distinct(df[sample(1:nrow(df)), ], keyvariable, .keep_all = true)
wiki
Comments
Post a Comment