On Wed, Dec 17, 2008 at 6:28 AM, ioannes <ioannes@xxxxxxxxxxxxxx> wrote: > shiplu wrote: >> >> On Sun, Dec 7, 2008 at 8:22 AM, ioannes <ioannes@xxxxxxxxxxxxxx> wrote: >> >>> >>> shiplu wrote: >>> >>>> >>>> When you are dealing with curl, anything can be done as long as its a >>>> HTTP >>>> request.Its all about sending HTTP headers and content. >>>> >>>> To parse HTML content you can use HTML parser. Regular expression may >>>> not >>>> work each time. >>>> Pattern changes over time. >>>> >>>> Download Wireshark. Collect 2 sample request and response packet from >>>> there. >>>> Make a format and use it with CURL. >>>> Thats it. So Simple. You never gonna need to know who is generating the >>>> site, PHP or ASP.NET. >>>> >>>> >>>> >>>> >>> >>> I downloaded Wireshark onto Windows XP, got as far as Capture Options >>> from >>> Ethernet, Capture Filter is host <IP address of target page>, click >>> Start, >>> go to browser and access page, Stop Wireshark, Save captured file or >>> Export >>> as HTTP object which gives me the source of the page again. Is this what >>> you mean? What do you mean by make a format - do you mean for instance >>> parse the page with string finder functions etc. How is this helping >>> over >>> identifying the correct POST variables (using LiveHTTP etc) of the >>> request >>> and feeding into a curl function? What do you mean by 'make a format' >>> versus 'pattern changes over time' - is format a Wireshark function, if >>> so >>> where do I find it. Thanks, John >>> >>> >>> >> >> >> "make a format" is not like a button in wireshirk that has label "make >> a format" and it will do everything for you. You have to do it >> yourself. By wireshirk you'll see every type of headers and contents >> for almost every type of protocols. So you'll use this soft for >> analyzing the http conversation. Data will not only be in content but >> also in headers. so parse both if needed. then use the same data and >> make successive requests. >> If you are using regular expression it will fail to match if pattern >> changes. Your pattern '/<input type="hidden" name="__VIEWSTATE" >> id="__VIEWSTATE" value="([^"]*?)" \/>/ will match <input type="hidden" >> name="__VIEWSTATE" id="__VIEWSTATE" value="ABC7D5ACSE" /> but wont >> match <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" >> value="ABC7D5ACSE">. Do you see the difference?? It wont mach <input >> type="hidden" id="__VIEWSTATE" name="__VIEWSTATE" value="ABC7D5ACSE" >> /> too. Because the attributes order is changed. Your regex will not >> work but their website will render very well. to overcome this, you >> have to use html/xml parser. So you can go to input element. then look >> for name attribute and if the name attribute is "__VIEWSTATE" then >> fetch the value attributes content. To make any input element name, >> value attribute must be present. So your code will match every time. >> It wont fail in 99.99% case. >> >> Hope that make sense >> >> > > Yes, thanks. What HTML parser do you suggest? > > John > For php there is a dom extension. Documentation can be found in http://www.php.net/dom Thanks. -- A K M Mokaddim http://talk.cmyweb.net http://twitter.com/shiplu Stop Top Posting !! -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php