r - dplyr filtering with lubridate::hhmm format with minute() -
answering question temperature curve in r came across weird behavior of dplyr::filter
- lubridate::minute
combination.
see test data dta
below. dta$time
lubridate::hhmm
format.
library(lubridate) library(dplyr) dta$time <- hm(dta$time)
to rows full hours (i.e. 0 minutes) 1 can subset using lubridate::minute
this:
dta[minute(dta$time) == 0,] # time temp1 temp2 # 1 0s 18.62800 18.54458 # 7 1h 0m 0s 18.45733 18.22625 # 13 2h 0m 0s 18.33258 18.04142
however, when using dplyr
's filter
, this
dta %>% filter(minute(time) == 0) # time temp1 temp2 # 1 0s 18.62800 18.54458 # 2 10m 0s 18.45733 18.22625 # 3 20m 0s 18.33258 18.04142
the result not fit expectation. (update: values of temp1
, temp2
correct, time
corrupt... @brian btw giving hint. )
additionally warning returned:
warning message: in format.data.frame(x, digits = digits, na.encode = false) : corrupt data frame: columns truncated or padded nas
this reported , somehow solved here, coercion, seems remove fun (and readable) part of lubridate.
question: there way (to date) dplyr::filter
lubridate::hhmm(ss)
formats without coercing character etc.?
update:
it seems vector created
minute(dta$time) # [1] 0 10 20 30 40 50 0 10 20 30 40 50 0
looks numeric vector, yet seems have mysterious characteristics.
furthermore, @lyngbakr pointed out comparison ==
not have usual characteristics "normal" logical vector.
tst <- minute(dta$time) == 0 dta %>% filter(tst)
will result in same strange time
column.
sample data:
dta <- read.table(text = " time temp1 temp2 1 00:00 18.62800 18.54458 2 00:10 18.60025 18.48283 3 00:20 18.57250 18.36767 4 00:30 18.54667 18.36950 5 00:40 18.51483 18.36550 6 00:50 18.48325 18.34783 7 01:00 18.45733 18.22625 8 01:10 18.43767 18.19067 9 01:20 18.41583 18.22042 10 01:30 18.39608 18.21225 11 01:40 18.37625 18.18658 12 01:50 18.35633 18.05942 13 02:00 18.33258 18.04142", header = t)
i don't know why works, does: time
column needs of type datetime
, not period
.
dta %>% mutate(time = as_datetime(hm(time))) %>% filter(minute(time) == 0)
time temp1 temp2 1 1970-01-01 00:00:00 18.62800 18.54458 2 1970-01-01 01:00:00 18.45733 18.22625 3 1970-01-01 02:00:00 18.33258 18.04142
this has side effect of adding time in time
column unix epoch, advise including actual date when you're using time-only data.
if minutes elapsed since start of experiment, doesn't matter much, don't have display 1970-01-01 part.
wiki
Comments
Post a Comment