2007-12-18

Automating the insertion of keybinding code in .el files, surrounded by a bunch of explanatory blabber.

Recently, I have been learning my way around the lisp family
of languages. Naturally, that led me to learning my way around emacs -
the editor-of-choice for most of those languages - and it turns out
that I like using emacs. A lot.

Emacs, as I'm sure most people reading this are already well
aware of, is extensible in a dialect of lisp called elisp - and it's
more extensible than any other editor I've used as of yet. Everything
it does is open to you, and not only because it is an open source
program, although the politics behind that are a large reason for its
extensibility.

Emacs was written to be extended - and it shows. Things like
being able to load changes into the editor on the fly, being able to
hit C-h f and bring up the docstring for a function as
well as a link straight to its source code make developing in it a
breeze - not to mention the loads of documentation and other things
that make editing much more powerful like C-x ( which starts the
definition of a one-off keyboard macro.

I could ramble on about features of emacs that make it nice -
but I don't really want to talk about that right now. What I want to
talk about is the process of writing code in lisp family languages,
especially with emacs - so far I have found it to be much nicer than
in other languages.

Now, I am not a seasond lisper. I've only been messing around
with these languages (mostly Common Lisp, elisp, and Scheme) for a
month or so now. However, that has been more than enough time to
realize that they are powerful. Very, very powerful. I'm not going to
talk about specific features of the lisps that make them powerful
though - partly because I don't fully understand them as of yet, and
partly because at the moment, I feel that the process that one goes
through while writing code in them is far more important - and I feel
that it's best to illustrate by example.

Earlier today, I was reading through some of the emacs
documentation, specifically the FAQ entry on keybindings. It mentions
that you can add or modify keybindings to modes (assuming that the
author of the mode wrote it correctly - and I don't think that there
are any that come with emacs that aren't) by placing code like this in
your .emacs, or in some other file, or the scratch buffer (or
wherever) and loading it:

(add-hook 'lisp-interaction-mode-hook
(lambda ()
(local-set-key (^M) (newline-and-indent))))

This would bind ^M, which is or , to the function
newline-and-indent under lisp-interaction mode. (By the way, you can
type C-q to ouput the control character for a key, which is
handy)

You could also call local-set-key (or it's big brother
global-set-key) interactively by typing M-x local-set-key and get
prompted for the values of key and command.

The FAQ also mentioned a useful "trick" for getting the code
to put in the add-hook form - bind the key by calling it
interactively, then type C-x C-a C-k C-g . This runs redo,
which puts the code in the message buffer at the bottom, goes to the
beginning of it, kills it, and quits. Then you can just yank it into
your .emacs, or wherever you want it. Convienient.

Even more convienient than that is that a string or a vector
can be bound to a key, and as such treated as a macro. You could do
something like this:

(global-set-key [f10] "\C-x\e\e\C-a\C-k\C-g")

and from then on, whenever you hit f10, the last command you entered
will be put in the kill ring, ready to be copied.

This got me thinking - could I just tack a "\C-y" on the end
of that and have it auto-yank for me? Unfortunately no, because C-g is
quitting the command loop. Now I just wanted to automate the process
of entering keybinding (re)definitions into my .emacs - it's not
something I do that often, but I need something to do to practice my
emacs-customizing skills, and off I went, hacking away.

Here's where the process comes into play. The code I wrote
went through several iterations. I started off just trying to automate
the process of entering those keybindings - I ended up doing that, and
writing a few general purpose utilities that may or may not end up
being useful - but I have them now, regardless, and the code doesn't
feel right unless things that can be are generalized out (especially
in lisps).

I had some trouble at first figuring out how to get emacs to
print a string representation of a list - until I found princ, prin1,
and pp-to-string. I had managed to hack together a working solution
without building a list, but it was ugly, and I had lost the ability
to have auto-completion in the message buffer because I couldn't
figure out how to get a string representation of something gotten
through interactives flag C or v.

So yes, first workable solution was ugly, and looked a lot
like PHP-style templating code. Stuff like

(insert "(add-hook '" hook)
(newline-and-indent)
(insert "(lambda ()")
(newline-and-indent)
. . .

and so on. Not standing for that, nope. Ugly, and broken to boot.

After searching the documentation for a while, I came upon the
previously mentioned pp-to-string. I looked at the source for it, saw
that it called prin1, followed the chain around, and eventually I
found this variable called print-level - which is used by such
pretty-printing functions to signal when they should stop printing and
start abbreviating.

I can't just assume that it's going to be nil, which is what I
need, and I can't assume that there aren't any more variables like
this that people may have set, have nil as a default, and that I would
like to do something with at some point in time - but what I can do is
set the value of the symbol print-level to nil within the scope of
what I'm doing with a "let".

Once I got the ability to output the code to the
current-buffer working, it was just a matter of making it
interactive. I originally had one function for inserting these
keybindings, but after realizing that I didn't want to to an
"add-hook" every single time (not for hooks that already had one
defined anyhow), I split it into three functions - one for inserting a
new hook and keybinding, one for either returning the list
representation for a keybinding or inserting it depending on how it
was called, and one that handles the interaction and re-indentation.

Here's the code:

(defun insert-new-hook-and-keybinding (hook key command)
"Outputs the code needed to make a new key binding on KEY to COMMAND under HOOK.
HOOK should be the name of a mode-hook (e.g. lisp-interaction-mode-hook).
KEY should be the key to bind (e.g. ^T )
COMMAND should be the command to bind the key to.

Is meant to be run through insert-keybinding, but could be called directly."
(interactive "vHook: \nKKey: \nCCommand: ")
(let ((print-level nil))
(let ((result `(add-hook ',hook
,(create-local-keybinding key command))))
(insert (strip-quotes (pp-to-string result))))))


(defun create-local-keybinding (key command)
"Do the appropriate action to create the output for a local keybinding of KEY to COMMAND.
If called interactively, output the keybinding to the current buffer.
If called non-interactively, return the list for use in other output."
(interactive "KKey: \nCCommand: ")
(let ((result `(lambda ()
(local-unset-key (,key))
(local-set-key (,key) (',command)))))
(cond ((interactive-p)
(let ((print-level nil))
(insert (strip-quotes (pp-to-string result)))))
(t result))))


(defun insert-keybindings ()
"Insert keybinding code into the current buffer, prompting the user for values.
If cursor is at the beginning of the line from which insert-keybindings is invoked,
also prompt for hook name and create the appropriate form.

Insert keybindings re-indents and moves point to the bottom of the current keybinding-form
when it is finished in an attempt to act sane and make the code look decent."
(interactive)
(let ((line-no (line-number-at-pos (point))))
(if (equal (point-at-bol) (point))
(call-interactively #'insert-new-hook-and-keybinding)
(call-interactively #'create-local-keybinding))
(while (y-or-n-p "Insert another keybinding? ")
(goto-line line-no)
(goto-char (point-at-eol))
(newline)
(call-interactively #'create-local-keybinding))
(goto-line line-no)
(goto-char (point-at-bol))
(indent-sexp)
(end-of-defun)))



(global-set-key [f10] 'insert-keybinding)

It's rather simple.

The thing that struck me as I was writing this, and the thing
that I've been failing rather horribly at articulating is that this
code evolved as I was writing it. I didn't have much of a plan - I did
know what I needed to do, but I didn't put very much thought into how
I would accomplish it. As I wrote it, the form above came about on
it's own - a result of wanting to avoid duplication, and wanting to
express things a certain way.

Not that that doesn't happen in other languages, but it seems
to happen more in the lisp family. At least it's more in-your-face.

2007-05-23

Parsing simple XML files in python using etree

So, I'm relatively new to the worlds of python and xml, and as such haven't quite figured everything out, as was demonstrated earlier today.

I needed to parse a really simple XML file, storing all the tags underneath the root tag as items in a dictionary - basically parsing an xml options file.

So, let's say our xml file looks like this:

<?xml version="1.0"?>
<options>
<option1>foo</option2>
<option2>bar</option2>
<option3>zip</option3>
</options>



Pretty simple right? I figured, hey this can't possibly be a pain, I'm sure python has some great XML parsing libraries available.

Some basic internet searching led me to xml.parsers.expat . In fact, this was what most of the "parse xml with python" examples I could find were using, so instead of looking through the rest of the available xml libraries, I started using expat.

It was frustrating.

Now, I've made xml parsers in php before, following the same basic model as expat (probably using expat) - you create a parser, and set some functions to get called whenever a tag is started, whenever a tag ends, or whenever character data is found.

I made a simple xmlParser class to inherit specified parsers from, inherited from it, set appropriate functions - which included setting a flag to say we were inside the root element, setting the value of the current tag being parsed, adding to the dictionary. . . here's the inherited class:


from xmlParser import *
import time

class optionsParser(xmlParser):
"""
"This parser is used to generate a dictionary of overall options"
"for our script"
"""
def __init__(self,xmlFile):
xmlParser.__init__(self,xmlFile)
self.inOptions = False
self.curTag = " "
self.options = {}

def handleCharacterData(self, data):
#print "Handling: " , data
if self.inOptions and self.curTag != "options":
self.options[self.curTag] = data
#time.sleep(1)

def handleStartElement(self, name, attributes):
# print "Starting: ", name
if name == "options":
self.inOptions = True
if self.inOptions:
self.curTag = name
# time.sleep(1)

def handleEndElement(self, name):
# print "Ending: ", name
if name == "options":
self.inOptions = False
self.curTag = ""
# time.sleep(1)

def getOptions(self):
return self.options","python", "code1")



The commented out print statements and sleep statements were so I could watch it parse. The first time I ran it, I noticed that it was outputting a bunch of


Handling:
Handling:
Starting: option1
Handling: foo
Handling:
Ending: option1


And so on - lots of empty stuff, I'm assuming they were new lines in the text. Then, if I was to print out the dictionary, it would look like this:

:

option1 : foo
option2 : bar
option3 : zip


This annoyed the crap out of me. I figured out that it was because I was setting the curTag value to "" after processing the first set of character data under a tag - but if I didn't do that, then the tags got overwritten with blank space.

My options were to set another flag, "tagNameAlreadyProcessed" or similar, or change the
if self.inOptions and self.curTag != "options":

line to read
if self.inOptions and self.curTag != "options" and self.curTag != ""


That worked, but damn is it ugly. I knew there had to be a better way.

I posted to the python-list explaining what was happening (python-list rocks by the way - if you use python, and aren't subscribed, do it now) and an awesome fellow by the name of Steven Bethard pointed me to xml.etree.ElementTree

etree is way easier to use, at least for simple xml parsing like I needed to do here. I haven't tried to use it for anything remotely complex, but for this problem it works like a charm. Compare the above code to the equivalent with etree:


optionsXML = etree.parse("options.xml")
options = {}

for child in optionsXML.getiterator():
if child.tag != optionsXML.getroot().tag:
options[child.tag] = child.text","python", "code2");

Yeah, that's it. 6 lines of code. Suited more for being embedded into a class as a function. Short and sweet.

Nerdgasm.

Lessons learned:

  • Look over all available options before choosing one (Christ, I feel stupid)

  • Use etree with python, at least for simple parsing tasks

  • python-list is your friend (we already knew that, right?)