r - dplyr filtering with lubridate::hhmm format with minute() -
answering question temperature curve in r came across weird behavior of dplyr::filter - lubridate::minute combination. 
see test data dta below. dta$time lubridate::hhmm format. 
library(lubridate) library(dplyr)  dta$time <- hm(dta$time) to rows full hours (i.e. 0 minutes) 1 can subset using lubridate::minute this: 
dta[minute(dta$time) == 0,] #        time    temp1    temp2 # 1        0s 18.62800 18.54458 # 7  1h 0m 0s 18.45733 18.22625 # 13 2h 0m 0s 18.33258 18.04142 however, when using dplyr's filter, this
dta %>% filter(minute(time) == 0) #     time    temp1    temp2 # 1     0s 18.62800 18.54458 # 2 10m 0s 18.45733 18.22625 # 3 20m 0s 18.33258 18.04142 the result not fit expectation. (update: values of temp1 , temp2 correct, time corrupt... @brian btw giving hint. )
additionally warning returned:
warning message: in format.data.frame(x, digits = digits, na.encode = false) : corrupt data frame: columns truncated or padded nas
this reported , somehow solved here, coercion, seems remove fun (and readable) part of lubridate.
question: there way (to date) dplyr::filter lubridate::hhmm(ss) formats without coercing character etc.?
update:
it seems vector created
minute(dta$time) # [1]  0 10 20 30 40 50  0 10 20 30 40 50  0 looks numeric vector, yet seems have mysterious characteristics.
furthermore, @lyngbakr pointed out comparison == not have usual characteristics "normal" logical vector. 
tst <- minute(dta$time) == 0  dta %>% filter(tst) will result in same strange time column. 
sample data:
dta <- read.table(text = "     time        temp1       temp2                            1  00:00     18.62800    18.54458                            2   00:10     18.60025    18.48283                            3   00:20     18.57250    18.36767                            4   00:30     18.54667    18.36950                            5   00:40     18.51483    18.36550                            6   00:50     18.48325    18.34783                            7   01:00     18.45733    18.22625                            8   01:10     18.43767    18.19067                            9   01:20     18.41583    18.22042                            10  01:30     18.39608    18.21225                            11  01:40     18.37625    18.18658                            12  01:50     18.35633    18.05942                            13  02:00     18.33258    18.04142", header = t) 
i don't know why works, does: time column needs of type datetime, not period.
dta %>%    mutate(time = as_datetime(hm(time))) %>%    filter(minute(time) == 0)  time temp1 temp2 1 1970-01-01 00:00:00 18.62800 18.54458 2 1970-01-01 01:00:00 18.45733 18.22625 3 1970-01-01 02:00:00 18.33258 18.04142
this has side effect of adding time in time column unix epoch, advise including actual date when you're using time-only data. 
if minutes elapsed since start of experiment, doesn't matter much, don't have display 1970-01-01 part.
wiki
Comments
Post a Comment