[Solved] DOMDocument library automatically adding document and html/body tags in the content

Sometimes, we need to use DomDocument Library to parse html content, but there is an issue occurs that it automatically adds <document> , <html></html> and <body></body> tags in the html content, if they are already not present in the content.

To get rid of the issue, we will have to use two flags while loading the HTML:

LIBXML_HTML_NOIMPLIED : it sets HTML_PARSE_NOIMPLIED flag, which turns off the automatic adding of implied html/body… elements.

LIBXML_HTML_NODEFDTD : it sets HTML_PARSE_NODEFDTD flag, which prevents a default doctype being added when one is not found.

Example :

$doc = new DOMDocument();

$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Thus, using these two flags, you can easily prevent document, html and body tags to add into your html content.

 

Leave a Reply