鍍金池/ 問(wèn)答/Python  網(wǎng)絡(luò)安全/ lxml怎么刪除namespaces

lxml怎么刪除namespaces

我讀取一個(gè)xhtml

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        <meta name="calibre:cover" content="true"/>
        <title>Cover</title>
        <style type="text/css" title="override_css">
            @page {padding: 0pt; margin:0pt}
            body { text-align: center; padding:0pt; margin: 0pt; }
        </style>
    </head>
    <body>
        <div>
            <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
                <image width="200" height="266" xlink:href="cover1.jpeg"/>
            </svg>
        </div>
    </body>
</html>

把一個(gè)xhtml里的body里的保存到新的html

from lxml import etree
with open("test.xhtml", 'r', encoding='utf8') as html:
    tree = etree.parse(html)
    body = tree.find(
        '//xmlns:body',
         namespaces={'xmlns': 'http://www.w3.org/1999/xhtml'}
    )
    nsmap = body.nsmap
    # 這里不加nsmap所有標(biāo)簽都會(huì)有namespaces
    page_xml = etree.Element('div', nsmap=nsmap)
    for child in body.iterchildren():
        page_xml.append(child)
    etree.ElementTree(page_xml).write(
        "new.html",
        pretty_print=True,
        encoding='utf-8',
        method='html'
    )

最后轉(zhuǎn)換出來(lái)new.html多了一個(gè)xmlns,問(wèn)題來(lái)了怎么去掉呢?

<div xmlns="http://www.w3.org/1999/xhtml">
    <div>
        <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
            <image width="200" height="266" xlink:href="cover1.jpeg"/>
        </svg>
    </div>
</div>
回答
編輯回答
命于你

HTML 處理就不會(huì)帶名字空間:

# -*- coding: utf-8 -*-

from lxml import etree

content = '''
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        <meta name="calibre:cover" content="true"/>
        <title>Cover</title>
        <style type="text/css" title="override_css">
            @page {padding: 0pt; margin:0pt}
            body { text-align: center; padding:0pt; margin: 0pt; }
        </style>
    </head>
    <body>
        <div>
            <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="100%" height="100%" viewBox="0 0 200 266" preserveAspectRatio="none">
                <image width="200" height="266" xlink:href="cover1.jpeg"/>
            </svg>
        </div>
    </body>
</html>
'''

print etree.tostring(etree.HTML(content).xpath('//body/*')[0])
2018年4月5日 03:46