[Ur] Parsing xml fragments.
Adam Chlipala
adamc at csail.mit.edu
Fri Oct 12 08:53:22 EDT 2012
On 10/12/2012 08:46 AM, Alexei Golovko wrote:
> Yes, they are not legal XML, they are parts, corresponding to user's
> selection.
> I need such things:
> 1) to get a length of fragment (to save position of fragment as sum of
> lengths of all preceding fragments)
> 2) to compare fragments (hence attributes shoud be ordered some way,
> whitespaces inside tag normalised etc.)
> 3) to check that string forms valid fragment (no "partial-tag />")
> Additional bonus is to be sure, that one fragment may be safely
> (w.r.t. fixed XML schema) replaced by another (suggested by
> application user).
> Currently, I treat length of tag (open: <em attr1="value1" ... > — or
> closed: </em>) as 1 (because tag is atomar for selecting and because
> it look more natural where I use DOM api).
OK, then I'd expect to parse fragments into a simple tree datatype.
Maybe if you point us to an example of your parsing code, I can give
some advice on making it faster. (Linear-time/space parsing of strings
in pure Ur code should be pretty easy, if you use the right standard
library functions.)
> 12.10.2012, 03:09, "Adam Chlipala" <adamc at csail.mit.edu>:
>> On 10/10/2012 01:22 PM, Alexei Golovko wrote:
>>> What is the best way to parse xml on the client side? More
>>> precisely, I need to process not only full xml data, but also the
>>> fragments like /"bla-bla</em> baz-baz-<strong>baz</strong>"/ with
>>> bounds in the text nodes (that is not inside tag as
>>> /"end-of-tag-name> text"/).
>>> I have some (quick and dirty) parsec-like combinators, but they are
>>> buggy and too slow.
>>
>> So you want fragments that are not legal XML on their own? Well,
>> which type do you want to target with your translation?
> Thanks.
> Fragments are not html, so the first does not solve problem.
> The second, I thought, doesn't work on client-side, does it?
Right; the feed library is server-side.
>> Two bits of related library code:
>> - A basic & configurable HTML parser (only does legal fragments,
>> though): http://hg.impredicative.com/meta/file/7530b2b54353/html.urs
>> - The XML feed processing library: http://hg.impredicative.com/feed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20121012/068b6255/attachment.html>
More information about the Ur
mailing list