[Ur] Parsing xml fragments.

Fri Oct 12 08:53:22 EDT 2012

On 10/12/2012 08:46 AM, Alexei Golovko wrote:
> Yes, they are not legal XML, they are parts, corresponding to user's 
> selection.
> I need such things:
> 1) to get a length of fragment (to save position of fragment as sum of 
> lengths of all preceding fragments)
> 2) to compare fragments (hence attributes shoud be ordered some way, 
> whitespaces inside tag normalised etc.)
> 3) to check that string forms valid fragment (no "partial-tag />")
> Additional bonus is to be sure, that one fragment may be safely 
> (w.r.t. fixed XML schema) replaced by another (suggested by 
> application user).
> Currently, I treat length of tag (open: <em attr1="value1" ... > — or 
> closed: </em>) as 1 (because tag is atomar for selecting and because 
> it look more natural where I use DOM api).

OK, then I'd expect to parse fragments into a simple tree datatype.

Maybe if you point us to an example of your parsing code, I can give 
some advice on making it faster.  (Linear-time/space parsing of strings 
in pure Ur code should be pretty easy, if you use the right standard 
library functions.)

>   12.10.2012, 03:09, "Adam Chlipala" <adamc at csail.mit.edu>:
>> On 10/10/2012 01:22 PM, Alexei Golovko wrote:
>>> What is the best way to parse xml on the client side? More 
>>> precisely, I need to process not only full xml data, but also the 
>>> fragments like /"bla-bla</em> baz-baz-<strong>baz</strong>"/ with 
>>> bounds in the text nodes (that is not inside tag as 
>>> /"end-of-tag-name> text"/).
>>> I have some (quick and dirty) parsec-like combinators, but they are 
>>> buggy and too slow.
>>
>> So you want fragments that are not legal XML on their own?  Well, 
>> which type do you want to target with your translation?
> Thanks.
> Fragments are not html, so the first does not solve problem.
> The second, I thought, doesn't work on client-side, does it?

Right; the feed library is server-side.

>> Two bits of related library code:
>> - A basic & configurable HTML parser (only does legal fragments, 
>> though): http://hg.impredicative.com/meta/file/7530b2b54353/html.urs
>> - The XML feed processing library: http://hg.impredicative.com/feed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.impredicative.com/pipermail/ur/attachments/20121012/068b6255/attachment.html>