I have absolutely no control over the source file.
The source file is an xml file (er, sort of, it doesn't follow any
particular DTD) and has a tag called VERBATIM_DATE in each record -
looks to be required in their output as every record so far has it, but
w/o a DTD hard to know - time of day, on the other hand, is not required
and sometimes (usually) the tag missing.
Here's the beauty - VERBATIM_DATE in the same xml file uses multiple
different formats. IE -
12 March 1945
14 Mar 1967
Apr 1999
12-03-2005
Before 1904
Winter or Spring 1977
etc.
It does seem that if there is a day, the day is always first - but
sometimes it has a space as a delimiter, - as delimiter, and sometimes
it has both - IE
10-15 Dec 1934
12 March-03 April 1956
What I'm trying to do is write a preg matches for each case I come
across - if it matches the preg, it then parses according to the pattern
to get me an acceptable YYYY-MM-DD (not sure how I'll deal with the
season case yet ... but I'm serious, that kind of thing in there several
times)
To at least get started though, is there a wildcard defined that says
match a month?
IE
/^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/
where MONTH is some special magic that matches Mar March Apr April etc. ?
If you must know - it's data from a biology vertebrate museum. Thousands
of records may match a given query. Most of them look fairly easily
parsable and it does look like when a day is specified, it is always
first and year is always last.
The data is needed by me, so I'm planning on having the script die if it
comes across a date I don't have a regex to parse before it does
anything so I can add appropriate regex as necessary, but damn - you'd
think a vertebrate museum would have cleaned up their DB somewhat.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php