这是我的输入file.xml

<?xml version="1.0" encoding="UTF-8" ?>
<project name="so_project" id="Project-9999">
    <schema name="database1">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details_resolution" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="user_id" type="string"/>
                <column name="user_name" type="string"/>
            </column>
            <column name="details_closure" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="auto_closure" type="bool"/>
            </column>
        </table>
    </schema>
    <schema name="database2">
        <table name="table1">
            <column name="foo" type="int"/>
            <column name="bar" type="string"/>
            <column name="details" type="array[object]">
                <column name="timestamp" type="timestamp"/>
                <column name="value" type="float"/>
            </column>
        </table>
    </schema>
</project>

..我正在尝试使这个经典的嵌套统计

{
    "database1": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details_resolution": {
                "timestamp": "timestamp",
                "user_id": "string",
                "user_name": "string"
            },
            "details_closure": {
                "timestamp": "timestamp",
                "auto_closure": "bool"
            }
        }
    },
    "database2": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details": {
                "timestamp": "timestamp",
                "value": "float"
            }
        }
    }
}

PS:每个数据库最终都可以拥有多个表。

我尝试了一些AI代码,但它们都没有给我预期的结果。
对不起,人们无法展示我的尝试!

因此,任何帮助将不胜感激。

分析解答

使用的解决方案:

from bs4 import BeautifulSoup

with open("your_file.xml", "r") as f_in:
    soup = BeautifulSoup(f_in.read(), "xml")


def parse_columns(t):
    out = {}
    for c in t.find_all("column", recursive=False):
        if c.find("column"):
            out[c["name"]] = parse_columns(c)
        else:
            out[c["name"]] = c["type"]
    return out


def parse_schema(sch):
    out = {}
    for t in sch.select("table"):
        out[t["name"]] = parse_columns(t)
    return out


out = {}
for sch in soup.select("schema"):
    out[sch["name"]] = parse_schema(sch)

print(out)

printing:

{
    "database1": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details_resolution": {
                "timestamp": "timestamp",
                "user_id": "string",
                "user_name": "string",
            },
            "details_closure": {"timestamp": "timestamp", "auto_closure": "bool"},
        }
    },
    "database2": {
        "table1": {
            "foo": "int",
            "bar": "string",
            "details": {"timestamp": "timestamp", "value": "float"},
        }
    },
}