Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_time_parse_UTC()? A more flexible date_time_parse_RFC_3339() #376

Open
DavisVaughan opened this issue Oct 1, 2024 · 0 comments
Open

Comments

@DavisVaughan
Copy link
Member

date_time_parse_RFC_3339() is unique because it parses the RFC 3339 format, like:

2019-01-01T00:00:00Z
2019-01-01T00:00:00+0430
2019-01-01T00:00:00+04:30

Notably it allows you to parse and use the +04:30 info. But it's pretty strict about the format itself. You can only customize separator = "T" and offset = "Z".

This is an example where the abstraction over sys-time leaks:
https://stackoverflow.com/questions/79043367/how-to-read-in-a-character-datetime-with-a-timezone-offset-in-r/79044161#79044161

2019-01-01 00:00+04:30

Notably, no seconds! We have to dip down to sys_time_parse() to parse this

library(clock)

x <- c(
  "2023-10-29 00:00+02:00",
  "2023-10-29 01:00+02:00",
  "2023-10-29 02:00+02:00",
  "2023-10-29 02:00+01:00",
  "2023-10-29 03:00+01:00",
  "2023-10-29 04:00+01:00"
)

# Parse into (roughly) UTC, respecting `%Ez`, i.e. the `+HH:MM` bit
x <- sys_time_parse(
  x,
  format = "%Y-%m-%d %H:%M%Ez",
  precision = "minute"
)
x
#> <sys_time<minute>[6]>
#> [1] "2023-10-28T22:00" "2023-10-28T23:00" "2023-10-29T00:00" "2023-10-29T01:00"
#> [5] "2023-10-29T02:00" "2023-10-29T03:00"

# Convert to POSIXct with your expected time zone
as_date_time(x, zone = "Europe/Paris")
#> [1] "2023-10-29 00:00:00 CEST" "2023-10-29 01:00:00 CEST"
#> [3] "2023-10-29 02:00:00 CEST" "2023-10-29 02:00:00 CET" 
#> [5] "2023-10-29 03:00:00 CET"  "2023-10-29 04:00:00 CET"

# Or UTC if you wanted that
as_date_time(x, zone = "UTC")
#> [1] "2023-10-28 22:00:00 UTC" "2023-10-28 23:00:00 UTC"
#> [3] "2023-10-29 00:00:00 UTC" "2023-10-29 01:00:00 UTC"
#> [5] "2023-10-29 02:00:00 UTC" "2023-10-29 03:00:00 UTC"

Possibly we need date_time_parse_UTC() as one more convenience parser to fill this gap. I think this has come up one other time in the past.


We'd end up with:

UTC offset YES UTC offset NO
Full TZ name YES date_time_parse_complete date_time_parse_abbrev*
Full TZ name NO date_time_parse_UTC date_time_parse

With:

  • date_time_parse_abbrev() being the UTC offset NO + Full TZ name YES combo, which isn't exactly accurate, but just the full tz name like America/New_York is not enough to disambiguate a time that sits in the "fall back" overlap, so we don't include that case and instead include the abbrev case, because an abbreviation combined with the supplied zone is enough to disambiguate
  • date_time_parse_RFC_3339() being a special case of date_time_parse_UTC() that is restricted to just the common RFC format

But then it seems like we actually cover the whole spectrum of possible formats to parse!

This table would probably be pretty useful to include in the help docs of ?date_time_parse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant