[Ur] Regular expressions DSEL?

Artyom Shalkhakov artyom.shalkhakov at gmail.com
Tue Mar 7 02:10:31 EST 2017


Hello Ziv,

I've implemented your idea [1], and it works. Looking for feedback.

2017-02-20 15:48 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
> By parametrizing over a type-level record, you can encode the names of
> matched subgroups in the type of the regular expression. The interface
> probably looks something like:
>
> (* Abstract type of uncompiled regular expressions. *)
> con t :: {Unit} (* names of captured subgroups *) -> Type
>
> val literal :
>     string -> t []
> val any : t []
> val concat : r1 ::: {Unit} -> r2 ::: {Unit} -> [r1 ~ r2] =>
>     t r1 -> t r2 -> t (r1 ++ r2)
> val star : r ::: {Unit} ->
>     t r -> t r
> val alt : r ::: {Unit} -> s1 ::: {Unit} -> s2 ::: {Unit} -> [r ~ s1] => [r ~
> s2] => [s1 ~ s2] =>
>     t (r ++ s1) -> t (r ++ s2) -> t (r ++ s1 ++ s2)
> val capture : r ::: {Unit} ->
>     nm :: Name -> [r ~ [nm]] => t r -> t (r ++ [nm])
>
> (* Abstract type of compiled regular expressions *)
> con compiled :: {Unit} -> Type
>
> val compile : r ::: {Unit} ->
>     t r -> compiled r
>
> Obviously this probably won't exactly work. For instance, if the underlying
> library requires explicit names for captured groups, you'll need something
> more like
>
> val capture : r ::: {Unit} -> nm :: Name -> [r ~ [nm]] =>
>     {nm : string} -> t r -> t (r ++ [nm])
>
> because there's no way to extract a string from a type-level Name. In fact,
> this way has another benefit: the internal representation of the type can
> just be a string! All the tracking of subgroups can happen at the type
> level. You may not even need compile.
>
> This interface assumes each subgroup may appear 0 or 1 times. You could
> maybe keep track of the number of matches more precisely by using {Type}
> instead of {Unit} and using a type class to implement type-level functions.
>
> con t : {Type} (* maps names of captured subgroups to type-level
> representation of number of matched fields *) -> Type
>
> type one
> type oneOrNone
>
> class meet : Type -> Type -> Type
> val meet_one_oneOrNone : weaken one oneOrNone oneOrNone
> val meet_oneOrNone_one : weaken oneOrNone one oneOrNone
> val meet_oneOrNone_oneOrNone : weaken oneOrNone oneOrNone oneOrNone
>
> This makes the types more complicated, but I suspect (though I don't know
> for sure) that Ur/Web will be able to infer the extra stuff in any concrete
> (that is, monomorphic) use, so the user code will still look pretty. Here's
> how some of the functions look with this. The place weaken gets used is alt.
>
> val star : r ::: {Type} ->
>     t r -> t (map (fn _ => oneOrNo) r)
>
> val alt : r ::: {Type * Type * Type} -> s1 ::: {Type} -> s2 ::: {Type} ->
>     [map fst3 r ~ s1] => [map snd3 r ~ s2] => [s1 ~ s2] =>
>     $(map (fn (t1, t2, t3) => weaken t1 t2 t3) r)
>     t (map fst3 r ++ s1) -> t (map snd3 r ++ s2)
>     -> t (map thd3 r ++ map (fn _ => oneOrNo) (s1 ++ s2))
>
> You could extend the same sort of idea to types like oneOrMore and
> noneOrMore—useful if the underlying library can return a list of matches
> inside a + or *—by adding more cases to weaken.
>
> Here's a Haskell library that takes basically this approach:
> https://github.com/sweirich/dth/tree/master/popl17/src
>
> Hope this helps!
> Ziv
>
>
> On Mon, Feb 20, 2017 at 12:00 AM, Artyom Shalkhakov
> <artyom.shalkhakov at gmail.com> wrote:
>>
>> 2017-02-20 10:47 GMT+06:00 <fold at tuta.io>:
>>>
>>> Maybe just define another attribute in basis.urs?
>>>
>>> It would be much interesting to define a DSEL though. I have not seen any
>>> attempts to create a DSEL for regular expressions.
>>>
>>
>> Indeed! And it seems like it would be a nice exercise as well having some
>> practical uses to it.
>>
>> I envision something along the lines of:
>>
>> datatype RegExp = ... (* constructors *)
>>
>> type t (* abstract, not exported outside the module *)
>> val compile : RegExp -> t (* the trusted function *)
>> val matches : string -> t -> bool (* check if passed-in string matches the
>> expression *)
>>
>> And the implementation might use something unsafe (like a string, or an
>> actual regexp object in JS). Then we could offload the interpretation of
>> regexps to an off-the-shelf library. The [compile] function would print and
>> escape the AST to a regexp string, and the Ur/Web compiler might actually
>> optimize all of this away.
>>
>> If we are to take care of matching subgroups, that would require a more
>> complicated type. I guess that should be doable as well, but I don't see
>> immediately how to handle this.
>>
>>>
>>>
>>>
>>> 20. Feb 2017 16:28 by artyom.shalkhakov at gmail.com:
>>>
>>>
>>> Hello all,
>>>
>>> 'm doing form validation stuff, following examples on MDN. [1]
>>>
>>> It seems like Ur/Web doesn't handle regular expressions?
>>>
>>> I guess it should not be too diffcult to define a DSEL in Ur/Web (just
>>> AST constructors and a few trusted functions: e.g. [compile : AST ->
>>> regexp], where [regexp] is an abstract type, or maybe just a string :-)) and
>>> have it work at least in the browser.
>>>
>>> Could somebody point me in the right direction here?
>>>
>>> --
>>> Cheers,
>>> Artyom Shalkhakov
>>>
>>> [1]
>>> https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation
>>>
>>
>>
>>
>> --
>> Cheers,
>> Artyom Shalkhakov
>>
>> _______________________________________________
>> Ur mailing list
>> Ur at impredicative.com
>> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>>
>
>
> _______________________________________________
> Ur mailing list
> Ur at impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>

-- 
Cheers,
Artyom Shalkhakov

[1] https://github.com/bbarenblat/urweb-regex/pull/1



More information about the Ur mailing list