On 11/4/21 10:29 AM, Bryn Llewellyn wrote:
/adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx> wrote:/
Not sure how this can be handled in a totally predictable way given
the unpredictable ways in which datetime strings are formatted?
The only thing I can say it is it points out that when working with
datetimes settling on a standard format is your best defense against
unpredictable results.
Thank you very much, again, for your help with my seemingly endless
nitpicking questions on this matter, Adrian. Here's the most favorable
conclusion that I can draw:
3. The rules are hard to understand and the PG doc gives insufficient
detail to allow the outcomes in corner cases like you just showed us to
be predicted confidently. Some users seek to understand the rules by
reading PG's source code.
I would say that is because datetimes in string formats are often hard
to understand as a result of folks inventing their own formats.
As an example a commit message of mine from some years ago:
"
Fix issue with date parsing of Javascript dates coming from browser
on Windows. This occurred in both Firefox and Chrome. The issue being
that the date had a timezone of Pacific Standard Time instead of PST
like it does from a Linux machine. dateutils choked on that timezone and
therefore the date was not parsed. The fix was to add fuzzy=True to the
dateutils.parse(). This allows dateutils to skip over anything it does
not understand and parse the rest. This works as the date does include
the correct tz offset. For the record the date format of concern is-Tue
Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)
"
Can't remember what version of Windows this was. The dates ended up in a
Postgres database via Python code in a Django app. This fix is one of
the reasons I really like the Python dateutils library. The solution being:
from dateutil.parser import parse
parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)',
fuzzy=True)
datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800))
To see what it is doing:
parse('Tue Dec 08 2015 00:00:00 GMT-0800 (Pacific Standard Time)',
fuzzy_with_tokens=True)
(datetime.datetime(2015, 12, 8, 0, 0, tzinfo=tzoffset(None, 28800)),
(' ', ' ', ' ', ' (Pacific Standard Time)'))
where everything after the datetime are tokens it ignored.
I include this mainly as illustration that data clean up maybe more
practical before it ever hits the database and in a library that is
specialized for the task at hand.
4. Certainly, the rules set a significant parsing challenge. You hint
that they might even prevent a complete practical solution to be
implemented.
Yes that would depend on someone coding an AI that can fully understand
people and what there intentions where from a string value. Given the
answers I get when asking people directly what they intended, I'm not
holding my breath.
5. None of this matters when the app designer has the freedom to define
how date-time values will be provided, as text values, by user
interfaces or external systems. In these cases, the complexity can be
controlled by edict and correct solutions can be painlessly implemented
and tested. Not a day goes by that I don't have to enter a date value at
a UI. And in every case, a UI gadget constrains my input and makes its
individual fields available to the programmer without the need for
parsing—so there's reason to be optimistic. The programmer can easily
build the text representation of the date-time value to conform to the
simple rules that the overall application design specified.
Yep, constraining the imagination of the end user solves a lot of problems.
6. In some cases text data that's to be converted arrives in a format
that cannot be influenced. And it might contain locutions like we've
been discussing ("zero" meaning "1 BC", unfavorable paradigms for
separating tokens, and the like). In these cases, the diligent
programmer might, just, be able to use the full arsenal of available
tools to implement a scheme that faultlessly parses the input. But the
likelihood of bugs seems to be pretty big.
From my experience that pretty much defines all aspects of programming.
I'll say "case closed, now" — from my side, at least.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx