r - dplyr filtering with lubridate::hhmm format with minute() -

- August 24, 2014

answering question temperature curve in r came across weird behavior of dplyr::filter - lubridate::minute combination.

see test data dta below. dta$time lubridate::hhmm format.

library(lubridate) library(dplyr)  dta$time <- hm(dta$time)

to rows full hours (i.e. 0 minutes) 1 can subset using lubridate::minute this:

dta[minute(dta$time) == 0,] #        time    temp1    temp2 # 1        0s 18.62800 18.54458 # 7  1h 0m 0s 18.45733 18.22625 # 13 2h 0m 0s 18.33258 18.04142

however, when using dplyr's filter, this

dta %>% filter(minute(time) == 0) #     time    temp1    temp2 # 1     0s 18.62800 18.54458 # 2 10m 0s 18.45733 18.22625 # 3 20m 0s 18.33258 18.04142

the result not fit expectation. (update: values of temp1 , temp2 correct, time corrupt... @brian btw giving hint. )

additionally warning returned:

warning message: in format.data.frame(x, digits = digits, na.encode = false) : corrupt data frame: columns truncated or padded nas

this reported , somehow solved here, coercion, seems remove fun (and readable) part of lubridate.

question: there way (to date) dplyr::filter lubridate::hhmm(ss) formats without coercing character etc.?

update:

it seems vector created

minute(dta$time) # [1]  0 10 20 30 40 50  0 10 20 30 40 50  0

looks numeric vector, yet seems have mysterious characteristics.

furthermore, @lyngbakr pointed out comparison == not have usual characteristics "normal" logical vector.

tst <- minute(dta$time) == 0  dta %>% filter(tst)

will result in same strange time column.

sample data:

dta <- read.table(text = "     time        temp1       temp2                            1  00:00     18.62800    18.54458                            2   00:10     18.60025    18.48283                            3   00:20     18.57250    18.36767                            4   00:30     18.54667    18.36950                            5   00:40     18.51483    18.36550                            6   00:50     18.48325    18.34783                            7   01:00     18.45733    18.22625                            8   01:10     18.43767    18.19067                            9   01:20     18.41583    18.22042                            10  01:30     18.39608    18.21225                            11  01:40     18.37625    18.18658                            12  01:50     18.35633    18.05942                            13  02:00     18.33258    18.04142", header = t)

i don't know why works, does: time column needs of type datetime, not period.

dta %>%    mutate(time = as_datetime(hm(time))) %>%    filter(minute(time) == 0)

                 time    temp1    temp2 1 1970-01-01 00:00:00 18.62800 18.54458 2 1970-01-01 01:00:00 18.45733 18.22625 3 1970-01-01 02:00:00 18.33258 18.04142

this has side effect of adding time in time column unix epoch, advise including actual date when you're using time-only data.

if minutes elapsed since start of experiment, doesn't matter much, don't have display 1970-01-01 part.

wiki

Search This Blog

tL