I'm trying to strip out this text :- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> Using this regex :- $search = array( '@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<!DOCTYPE[^>]*?>.*?</DOCTYPE>@si', // Strip out doctype tags '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA ); $text = preg_replace($search, '', $text); Any idea what I've got wrong in the 'DOCTYPE' section? Rob. *********************************************************************************** Any opinions expressed in email are those of the individual and not necessarily those of the company. This email and any files transmitted with it are confidential and solely for the use of the intended recipient or entity to who they are addressed. It may contain material protected by attorney-client privilege. If you are not the intended recipient, or a person responsible for delivering to the intended recipient, be advised that you have received this email in error and that any use is strictly prohibited. Random House Group + 44 (0) 20 7840 8400 http://www.randomhouse.co.uk http://www.booksattransworld.co.uk http://www.kidsatrandomhouse.co.uk Generic email address - enquiries@xxxxxxxxxxxxxxxxx Name & Registered Office: THE RANDOM HOUSE GROUP LIMITED 20 VAUXHALL BRIDGE ROAD LONDON SW1V 2SA Random House Group Ltd is registered in the United Kingdom with company No. 00954009, VAT number 102838980 ***********************************************************************************