Andrew Ballard wrote:
On Wed, Mar 11, 2009 at 3:06 PM, Michael A. Peters <mpeters@xxxxxxx> wrote:
Andrew Ballard wrote:
On Wed, Mar 11, 2009 at 11:52 AM, Michael A. Peters <mpeters@xxxxxxx>
wrote:
If I'm manipulating a dom object, is there a way to change the tag name?
I know you manipulate just about everything else in a node - but is the
tagName really off limits?
from the documentation for DOMElement -
/* Properties */
readonly public bool $schemaTypeInfo ;
readonly public string $tagName ;
so if I really needed to change it, I'd have to create a virgin node with
the new name, identical attributes and children, and replace the existing
node with the new one?
Is there any other way to alter the tagName without doing all that?
If this is related to your earlier post about attributes, is XSLT not
an option? I hate to sound like a broken record, but PHP has support
for XSL transformations and it sounds like that is exactly what you
are trying to do.
Andrew
No.
XSLT is certainly one of the technologies I'm going to look into, but right
now I'm building a filter that (hopefully) will fully implement the Mozilla
developer Content Security Policy server side before the document gets sent
to the browser - by removing what would violate the specified CSP before it
is sent.
My primary interest in changing tag names is to ensure all tags are lower
case so I can then run the rest of the filter. They are all lower case if
you use loadHTML() but I don't want my class to assume it has a properly
created DOMDocument to start with, so I want to walk the DOM and change bad
tags/attribute names before I apply the CSP filtering.
How are you traversing the DOM if it is not already properly formed?
Every time I've ever tried to load a DOMDocument with xml that
wouldn't validate, it blew up and the DOMDocument was left empty. I
usually find loadHTML() to be more forgiving.
The problem isn't with xml that doesn't validate, the problem is that
HTML is not case sensitive. <script></script> and <scRIpT></scRIpT> are
both legal xml but are different tags to xml. In xhtml the first is a
script element, the second has no meaning and is discarded by the
browser. However when sending the document as html (necessary for IE
users, for example) - html is not case sensitive, the second is valid
script tag. So if the content security policy says no scripts are
allowed on the page, I need to catch both the second and the first.
I was actually doing it with regex by saving the document to a buffer
first but to avoid altering content, I had to make sure I was operating
inside tags etc. and then it dawned on me - it's structured data, use a
tool designed to work with structured data.
If the class was just for me it wouldn't matter, but if I put the class
out in the wild - $doc->createElement("SCriPt"); is legal and even can
be used to produce legal validating HTML 4.01 upon saveHTML() but it
would dodge how my class locates and checks for script elements. There
doesn't seem to be a case insensitive way to find tags/elements in the
php xml tools, so before my class does the filtering it needs to first
make sure the tags/attributes are all lower case.
How the potential users (if there ever are any) of my class get their
page into DOMDocument is up to them, not me. They can loadHTML() or
create it from scratch or import it from some other xml format.
If their source is an html file (or buffer), I will recommend they run
it through tidy first - tidy does wonders - but it's still up to them,
not me, so my class can't assume the tags are lower case.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php