Playing with XML and XSL and need some data to work with? I came across a resource of all of Shakespeare's plays in an XML format. They were apparently first marked up by Jon Bosak before the XML recommendation had even been finalized. It's a great source of test data.
I began playing with them because i've appeared in a few Shakespeare plays and i had a few questions about my parts: Exactly how many lines did i have? Which scens do i appear in? I thought i could answer most of my queries with a simple XSL transformation on the XML data which was conveniently available.
Here are some of the XSL files i've put together. I've uploaded three of the play XML files for you to try them out on. If you need another play, download it yourself from the link above.
See how many lines each character has...
See who is in each scene...
These samples both rely on finding a way to group XML data which isn't always the easiest thing
to do. XSL doesn't have a nice group
element. One technique is to loop through
all the elements of the type you want, making sure its value wasn't already included in
any of the preceding elements of the same type. I'll call that the preceding-sibling method.
There is also the Muenchian method.
The method uses the XSL key
element to build lists of nodes that share a common
value or attribute. We then find all the nodes which happen to be the first in the list of
node in the key for the value or attribute they have. I know my eplaination isn't the greatest
but there are plenty of
better resources
on the topic out there. While lightly harder to wrap your head around (at least the first time
you use it), the Muenchian seems to be the perfered (usually faster) method.
I've inluded an example of each method for the line-counting transformation in the downloads
section.