xml4h is an ISC licensed library for Python to make working with XML a human-friendly activity.
This library exists because Python is awesome, XML is everywhere, and combining the two should be a pleasure. With xml4h, it can be.
xml4h is a simplification layer over existing Python XML processing libraries such as lxml and the minidom. It provides:
The xml4h abstraction layer also offers some other benefits, beyond a nice API and tool set:
Here is an example of parsing and reading data from an XML document using “magic” element and attribute lookups:
>>> import xml4h
>>> doc = xml4h.parse('tests/data/monty_python_films.xml')
>>> for film in doc.MontyPythonFilms.Film[:3]:
... print film['year'], ':', film.Title.text
1971 : And Now for Something Completely Different
1974 : Monty Python and the Holy Grail
1979 : Monty Python's Life of Brian
You can also use a more traditional approach to traverse the DOM:
>>> for film in doc.child('MontyPythonFilms').children('Film')[:3]:
... print film.attributes['year'], ':', film.children.first.text
1971 : And Now for Something Completely Different
1974 : Monty Python and the Holy Grail
1979 : Monty Python's Life of Brian
The xml4h builder makes programmatic document creation simple, with a method-chaining feature that allows for expressive but sparse code that mirrors the document itself:
>>> b = (xml4h.build('MontyPythonFilms')
... .attributes({'source': 'http://en.wikipedia.org/wiki/Monty_Python'})
... .element('Film')
... .attributes({'year': 1971})
... .element('Title')
... .text('And Now for Something Completely Different')
... .up()
... .elem('Description').t(
... "A collection of sketches from the first and second TV"
... " series of Monty Python's Flying Circus purposely"
... " re-enacted and shot for film.").up()
... .up()
... )
>>> # A builder object can be re-used
>>> b = (b.e('Film')
... .attrs(year=1974)
... .e('Title').t('Monty Python and the Holy Grail').up()
... .e('Description').t(
... "King Arthur and his knights embark on a low-budget search"
... " for the Holy Grail, encountering humorous obstacles along"
... " the way. Some of these turned into standalone sketches."
... ).up()
... .up()
... )
Pretty-print your XML document with the flexible write() and xml() methods:
>>> b.write_doc(indent=4, newline=True)
<?xml version="1.0" encoding="utf-8"?>
<MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
<Film year="1971">
<Title>And Now for Something Completely Different</Title>
<Description>A collection of sketches from ...</Description>
</Film>
<Film year="1974">
<Title>Monty Python and the Holy Grail</Title>
<Description>King Arthur and his knights embark ...</Description>
</Film>
</MontyPythonFilms>
Python has three popular libraries for working with XML, none of which are particularly easy to use:
Given these three options it can be difficult to choose which library to use, especially if you’re new to XML processing in Python and haven’t already used (struggled with) any of them.
In the past your best bet would have been to go with lxml for the most flexibility, even though it might be overkill, because at least then you wouldn’t have to rewrite your code if you later find you need XPath support or powerful DOM traversal methods.
This is where xml4h comes in. It provides an abstraction layer over the existing XML libraries, taking advantage of their power while offering an improved API and tool set.
This project is heavily inspired by the work of Kenneth Reitz such as the excellent Requests HTTP library.
Currently xml4h includes two adapter implementations that support key XML processing tasks, using either the minidom or lxml‘s ElementTree libraries.
The project is still at the alpha stage, where I am playing with ideas and tweaking the APIs to try and get them right before I build out the feature set.
This project is likely to be in flux for a while yet, so be aware that individual APIs and even broad approaches may change.