[Ur] Calling all Emacs wizards
Adam Chlipala
adamc at impredicative.com
Sun Aug 7 11:17:37 EDT 2011
The urweb-mode for Emacs has some very slow syntax highlighting, to the
point of being a real hindrance to the development of non-trivial
projects. I know exactly which code is to blame, but the harder
question is how the same goal may be accomplished more efficiently. In
the past, I've sent pleas to this list, asking for help on the issue,
with no response. I'm going to try again, and this time I'm able to
give more information on the problem.
The issue comes only from detecting text that is literal XML CDATA; that
is, normal text that, in the case of HTML, should be passed on directly
to the user. I built urweb-mode by modifying sml-mode. I presume
sml-mode is doing syntax highlighting in a standard way, but, in any
case, it's based on regular expressions identifying spans of text that
should have particular Emacs font faces associated with them.
The crux of the problem, then, is that, in Ur/Web, being XML CDATA is a
context-free property, but not a regular property (in the sense of
regular languages and regular expressions). An XML sequence appears
within <xml>...</xml> brackets, and within there may be "antiquoted" Ur
sequences appearing within {...} brackets, within which there may be
further XML, and so on, up to unbounded depth.
My current urweb-mode code uses a regular expression to identify maximal
segments of text that could possibly be CDATA. Then, a custom Elisp
function is called to search backward from that point, counting open and
close brackets to figure out whether we are in XML. This search process
may proceed arbitrarily far back in the buffer, and the process is
repeated for each sequence of CDATA between tags/antiquotes. That can
be a lot of different calls to this not-particularly-efficient recursive
function, with no reuse of results!
I've tried to bumble my way through Emacs mode authorship without
sitting down to learn Elisp properly, and I'm hoping to stay on that
path! Would any Emacs wizards do us the favor of reworking this part of
the code to improve the efficiency? For instance, it wouldn't surprise
me if there is an easy way to examine formatting already set on some
text segments to speed up the decision for later segments.
All the relevant source code is in urweb/src/elisp/urweb-mode.el.
Function 'urweb-in-xml' is where I hypothesize most time is spent. It's
called from one of the actions in 'urweb-font-lock-keywords'.
Thanks in advance to anyone who can help fix this long-standing problem!
More information about the Ur
mailing list