[Ur] Calling all Emacs wizards

Sun Aug 7 11:17:37 EDT 2011

The urweb-mode for Emacs has some very slow syntax highlighting, to the 
point of being a real hindrance to the development of non-trivial 
projects.  I know exactly which code is to blame, but the harder 
question is how the same goal may be accomplished more efficiently.  In 
the past, I've sent pleas to this list, asking for help on the issue, 
with no response.  I'm going to try again, and this time I'm able to 
give more information on the problem.

The issue comes only from detecting text that is literal XML CDATA; that 
is, normal text that, in the case of HTML, should be passed on directly 
to the user.  I built urweb-mode by modifying sml-mode.  I presume 
sml-mode is doing syntax highlighting in a standard way, but, in any 
case, it's based on regular expressions identifying spans of text that 
should have particular Emacs font faces associated with them.

The crux of the problem, then, is that, in Ur/Web, being XML CDATA is a 
context-free property, but not a regular property (in the sense of 
regular languages and regular expressions).  An XML sequence appears 
within <xml>...</xml> brackets, and within there may be "antiquoted" Ur 
sequences appearing within {...} brackets, within which there may be 
further XML, and so on, up to unbounded depth.

My current urweb-mode code uses a regular expression to identify maximal 
segments of text that could possibly be CDATA.  Then, a custom Elisp 
function is called to search backward from that point, counting open and 
close brackets to figure out whether we are in XML.  This search process 
may proceed arbitrarily far back in the buffer, and the process is 
repeated for each sequence of CDATA between tags/antiquotes.  That can 
be a lot of different calls to this not-particularly-efficient recursive 
function, with no reuse of results!

I've tried to bumble my way through Emacs mode authorship without 
sitting down to learn Elisp properly, and I'm hoping to stay on that 
path!  Would any Emacs wizards do us the favor of reworking this part of 
the code to improve the efficiency?  For instance, it wouldn't surprise 
me if there is an easy way to examine formatting already set on some 
text segments to speed up the decision for later segments.

All the relevant source code is in urweb/src/elisp/urweb-mode.el.  
Function 'urweb-in-xml' is where I hypothesize most time is spent.  It's 
called from one of the actions in 'urweb-font-lock-keywords'.

Thanks in advance to anyone who can help fix this long-standing problem!