A few years back (for a very generous
few), we needed to parse a wide variety of XML strings. It was quite tedious to go from the XML to the native-language representations of the data (even from a DOM version). Furthermore, we needed to parse this XML both in Java and in C++.
I wrote (in Java) an XML parser generator that took an XML description of how you’d like the native-language data structures to look and where in the XML it could find the values for those data structures. The Java code-base for this was ugly, ugly, ugly. I tried several times to clean it up into something publishable. I tried to clean it up several times so that it could actually generate the parser it used to read the XML description file. Alas, the meta-ness, combined with the clunky Java code, kept me from completing the circle.
Fast forward to last week. Suddenly, I have a reason to parse a wide variety of XML strings in Objective C. I certainly didn’t want to pull out the Java parser generator and try to beat it into generating Objective C, too. That’s fortunate, too, because I cannot find any of the copies (in various states of repair) that once lurked in
What’s a man to do? Write it in Lisp, of course.
Here’s an example to show how it works. Let’s take some simple XML that lists food items on a menu:
<food name="Belgian Waffles" price="$5.95" calories="650">
<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<!-- ... more food entries, omitted here for brevity ... -->
We craft an XML description of how to go from the XML into a native representation:
<parser_generator root="menu" from="/menu">
<struct name="food item">
<field type="string" name="name" from="@name" />
<field type="string" name="price" from="@price" />
<field type="string" name="description" from="/description/." />
<field type="integer" name="calories" from="@calories" />
<field name="menu items">
<array_element type="food item" from="/food" />
Now, you run the parser generator on the above input file:
% sh parser-generator.sh --language=lisp \
--types-package menu \
--reader-package menu-reader \
This generates two files for you:
reader.lisp. This is what
types.lisp looks like:
(defclass food-item ()
((name :initarg :name :type string)
(price :initarg :price :type string)
(description :initarg :description :type string)
(calories :initarg :calories :type integer)))
(defclass menu ()
((menu-items :initarg :menu-items :type list :initform nil)))
I will not bore you with all of
reader.lisp as it’s 134 lines of code you never had to write. The only part you need to worry about is the
parse function which takes a stream for or pathname to the XML and returns an instance of the
menu class. Here is a small snippet though:
;;; food-item struct
(defmethod data progn ((handler sax-handler) (item food-item) path value)
(with-slots (name price description calories) item
(:|@name| (setf name value))
(:|@price| (setf price value))
(:|/description/.| (setf description value))
(:|@calories| (setf calories (parse-integer value))))))
Where it’s at
I currently have the parser generator generating its own parser (five times fast). I still have a little bit more that I’d like to add to include assertions for things like the minimum number of elements in an array or the minimum value of an integer. I also have a few kinks to work out so that you can return some type other than an instance of a class for cases like this where the
menu class just wraps one item.
My next step though is to get it generating Objective C parsers.
Somewhere in there, I’ll post this to a public git repository.