[Ur] Regular expressions DSEL?
Artyom Shalkhakov
artyom.shalkhakov at gmail.com
Thu Mar 9 10:48:29 EST 2017
Ziv,
Thanks for the catch with the non-matched groups, I just fixed a bug. :-)
Also, thanks to Benjamin for pointing out an extremely useful blog
post in a different thread! I'm not sure exactly if the client-side
tests are actually client-side, though... How does the Ur/Web's
client-side "VM" work? As it is, the JS code is impenetrable (making
it hard to debug). I'll start a new thread for this.
Please, let me ponder on your detailed response for a bit. This is a
deep rabbit hole. Maybe waiting indefinitely for v2 of typed regex is
the best outcome for now.
2017-03-09 21:20 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
> I agree we probably don't want to do fancy things that are only available
> the server. However, I think it's still possible to support nice behavior
> for alt just by postprocessing on the Ur/Web side.
>
> Let's say "names" are capture groups from the library user's perspective and
> "indices" are capture groups from the FFI's perspective. Right now, each
> name corresponds to exactly one index. You could support alt by having each
> name correspond to multiple indices, each of which is matched at most once.
>
> Concrete example: say you have (#X:a)(#Y:b)|(#X:c)(#Z:d)—that is, abusing
> notation somewhat, an expression that captures names either X and Y or X and
> Z. You could represent this using the record of index lists {X = [1,3], Y =
> [2], Z = [4]}. When you get the result list back from the FFI, you have to
> check groups 1 and 3 to see if X was matched. If you only ever concatenate
> index lists as part of alt, you ensure that you never have two non-null
> matches for the same name: given a match, in each name's index list, all but
> at most one of the indices is from an alt branch that wasn't taken.
>
> You could take these ideas further to track statically when a group is
> guaranteed to be matched (like X above) or not (like Y or Z above). The
> easiest way is probably to have two record parameters for the type—one for
> guaranteed match names, another for optional match names. For example
> (leaving out folders and disjoitedness constraints):
>
> star : r ::: Unit -> s ::: Unit -> t r s -> t [] (r + s)
>
> plus : r ::: Unit -> s ::: Unit -> t r s -> t r s
>
> capture : r ::: Unit -> s ::: Unit -> nm :: Name -> t r s -> t (r ++ [nm])
> s.
>
> The left record has the names of guaranteed matches, and the right record
> has the names of optional ones. You can see that plus keeps guaranteed
> matches guaranteed, star makes all of them optional. The most complicated
> type would, of course, be alt.
>
> alt : r ::: Unit -> r1 ::: Unit -> r2 ::: Unit s1 ::: Unit -> s2 ::: Unit
> -> t (r ++ r1) s1 -> t (r ++ r2) s2
> -> t r (r1 ++ r2 ++ s1 ++ s2)
>
> That is, the only groups that are guaranteed to be matched are those
> guaranteed in both of the branches; all others are optional.
>
>
> On Thu, Mar 9, 2017 at 09:17 Artyom Shalkhakov <artyom.shalkhakov at gmail.com>
> wrote:
>>
>> 2017-03-09 19:39 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
>> > Looks good—glad it worked out!
>> >
>> > My main question: what happens if the whole string is matched but a
>> > particular subgroup isn't? This can happen with alt or star, for
>> > instance. I
>> > couldn't tell from the types. I would expect a return type for match to
>> > be
>> > something like `match r (option string)` instead of `option (match r
>> > string)`. I don't know the underlying regex library, so perhaps there's
>> > a
>> > good reason for this.
>>
>> I think a test is in order; I don't know what happens in that case (do
>> you have an example handy?). From JS RegExp object documentation, it
>> seems that subgroups are always present (but, indeed, the strings
>> might be nulls or something else entirely, this is JS, after all).
>> Good catch!
>>
>> >
>> > I also thought it may be useful if alt allowed the same group to be
>> > captured
>> > on either side, but I realize that takes further fiddling with subgroup
>> > indices and post-processing of the results (each group field name would
>> > map
>> > to a list of indices instead of a single index), so it can wait for
>> > version
>> > 2 :).
>> >
>>
>> I was a bit puzzled about the type you gave for [alt], it only dawned
>> on me later what you meant with it. :-)
>>
>> I can see two questions here:
>>
>> 1. say we have an expression like [a(b)*c], if matched against
>> [abbbbbbc] in JS, it will only capture the *first* b -- I guess
>> Boost.Regex which is used on the server behaves the same (Benjamin,
>> could you help out on this?)
>>
>> 2. say we have an expression like [(a)|(b)], in JS the groups will be
>> 1 and 2. PCRE and some other engines support a "group reset" feature,
>> so that we can indeed capture both alternatives with one group. This
>> is unsupported in JS, so I decided to leave it as is, for now.
>>
>> >
>> > On Tue, Mar 7, 2017 at 02:11 Artyom Shalkhakov
>> > <artyom.shalkhakov at gmail.com>
>> > wrote:
>> >>
>> >> Hello Ziv,
>> >>
>> >> I've implemented your idea [1], and it works. Looking for feedback.
>> >>
>> >> 2017-02-20 15:48 GMT+06:00 Ziv Scully <zivscully at gmail.com>:
>> >> > By parametrizing over a type-level record, you can encode the names
>> >> > of
>> >> > matched subgroups in the type of the regular expression. The
>> >> > interface
>> >> > probably looks something like:
>> >> >
>> >> > (* Abstract type of uncompiled regular expressions. *)
>> >> > con t :: {Unit} (* names of captured subgroups *) -> Type
>> >> >
>> >> > val literal :
>> >> > string -> t []
>> >> > val any : t []
>> >> > val concat : r1 ::: {Unit} -> r2 ::: {Unit} -> [r1 ~ r2] =>
>> >> > t r1 -> t r2 -> t (r1 ++ r2)
>> >> > val star : r ::: {Unit} ->
>> >> > t r -> t r
>> >> > val alt : r ::: {Unit} -> s1 ::: {Unit} -> s2 ::: {Unit} -> [r ~ s1]
>> >> > =>
>> >> > [r ~
>> >> > s2] => [s1 ~ s2] =>
>> >> > t (r ++ s1) -> t (r ++ s2) -> t (r ++ s1 ++ s2)
>> >> > val capture : r ::: {Unit} ->
>> >> > nm :: Name -> [r ~ [nm]] => t r -> t (r ++ [nm])
>> >> >
>> >> > (* Abstract type of compiled regular expressions *)
>> >> > con compiled :: {Unit} -> Type
>> >> >
>> >> > val compile : r ::: {Unit} ->
>> >> > t r -> compiled r
>> >> >
>> >> > Obviously this probably won't exactly work. For instance, if the
>> >> > underlying
>> >> > library requires explicit names for captured groups, you'll need
>> >> > something
>> >> > more like
>> >> >
>> >> > val capture : r ::: {Unit} -> nm :: Name -> [r ~ [nm]] =>
>> >> > {nm : string} -> t r -> t (r ++ [nm])
>> >> >
>> >> > because there's no way to extract a string from a type-level Name. In
>> >> > fact,
>> >> > this way has another benefit: the internal representation of the type
>> >> > can
>> >> > just be a string! All the tracking of subgroups can happen at the
>> >> > type
>> >> > level. You may not even need compile.
>> >> >
>> >> > This interface assumes each subgroup may appear 0 or 1 times. You
>> >> > could
>> >> > maybe keep track of the number of matches more precisely by using
>> >> > {Type}
>> >> > instead of {Unit} and using a type class to implement type-level
>> >> > functions.
>> >> >
>> >> > con t : {Type} (* maps names of captured subgroups to type-level
>> >> > representation of number of matched fields *) -> Type
>> >> >
>> >> > type one
>> >> > type oneOrNone
>> >> >
>> >> > class meet : Type -> Type -> Type
>> >> > val meet_one_oneOrNone : weaken one oneOrNone oneOrNone
>> >> > val meet_oneOrNone_one : weaken oneOrNone one oneOrNone
>> >> > val meet_oneOrNone_oneOrNone : weaken oneOrNone oneOrNone oneOrNone
>> >> >
>> >> > This makes the types more complicated, but I suspect (though I don't
>> >> > know
>> >> > for sure) that Ur/Web will be able to infer the extra stuff in any
>> >> > concrete
>> >> > (that is, monomorphic) use, so the user code will still look pretty.
>> >> > Here's
>> >> > how some of the functions look with this. The place weaken gets used
>> >> > is
>> >> > alt.
>> >> >
>> >> > val star : r ::: {Type} ->
>> >> > t r -> t (map (fn _ => oneOrNo) r)
>> >> >
>> >> > val alt : r ::: {Type * Type * Type} -> s1 ::: {Type} -> s2 :::
>> >> > {Type}
>> >> > ->
>> >> > [map fst3 r ~ s1] => [map snd3 r ~ s2] => [s1 ~ s2] =>
>> >> > $(map (fn (t1, t2, t3) => weaken t1 t2 t3) r)
>> >> > t (map fst3 r ++ s1) -> t (map snd3 r ++ s2)
>> >> > -> t (map thd3 r ++ map (fn _ => oneOrNo) (s1 ++ s2))
>> >> >
>> >> > You could extend the same sort of idea to types like oneOrMore and
>> >> > noneOrMore—useful if the underlying library can return a list of
>> >> > matches
>> >> > inside a + or *—by adding more cases to weaken.
>> >> >
>> >> > Here's a Haskell library that takes basically this approach:
>> >> > https://github.com/sweirich/dth/tree/master/popl17/src
>> >> >
>> >> > Hope this helps!
>> >> > Ziv
>> >> >
>> >> >
>> >> > On Mon, Feb 20, 2017 at 12:00 AM, Artyom Shalkhakov
>> >> > <artyom.shalkhakov at gmail.com> wrote:
>> >> >>
>> >> >> 2017-02-20 10:47 GMT+06:00 <fold at tuta.io>:
>> >> >>>
>> >> >>> Maybe just define another attribute in basis.urs?
>> >> >>>
>> >> >>> It would be much interesting to define a DSEL though. I have not
>> >> >>> seen
>> >> >>> any
>> >> >>> attempts to create a DSEL for regular expressions.
>> >> >>>
>> >> >>
>> >> >> Indeed! And it seems like it would be a nice exercise as well having
>> >> >> some
>> >> >> practical uses to it.
>> >> >>
>> >> >> I envision something along the lines of:
>> >> >>
>> >> >> datatype RegExp = ... (* constructors *)
>> >> >>
>> >> >> type t (* abstract, not exported outside the module *)
>> >> >> val compile : RegExp -> t (* the trusted function *)
>> >> >> val matches : string -> t -> bool (* check if passed-in string
>> >> >> matches
>> >> >> the
>> >> >> expression *)
>> >> >>
>> >> >> And the implementation might use something unsafe (like a string, or
>> >> >> an
>> >> >> actual regexp object in JS). Then we could offload the
>> >> >> interpretation
>> >> >> of
>> >> >> regexps to an off-the-shelf library. The [compile] function would
>> >> >> print
>> >> >> and
>> >> >> escape the AST to a regexp string, and the Ur/Web compiler might
>> >> >> actually
>> >> >> optimize all of this away.
>> >> >>
>> >> >> If we are to take care of matching subgroups, that would require a
>> >> >> more
>> >> >> complicated type. I guess that should be doable as well, but I don't
>> >> >> see
>> >> >> immediately how to handle this.
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> 20. Feb 2017 16:28 by artyom.shalkhakov at gmail.com:
>> >> >>>
>> >> >>>
>> >> >>> Hello all,
>> >> >>>
>> >> >>> 'm doing form validation stuff, following examples on MDN. [1]
>> >> >>>
>> >> >>> It seems like Ur/Web doesn't handle regular expressions?
>> >> >>>
>> >> >>> I guess it should not be too diffcult to define a DSEL in Ur/Web
>> >> >>> (just
>> >> >>> AST constructors and a few trusted functions: e.g. [compile : AST
>> >> >>> ->
>> >> >>> regexp], where [regexp] is an abstract type, or maybe just a string
>> >> >>> :-)) and
>> >> >>> have it work at least in the browser.
>> >> >>>
>> >> >>> Could somebody point me in the right direction here?
>> >> >>>
>> >> >>> --
>> >> >>> Cheers,
>> >> >>> Artyom Shalkhakov
>> >> >>>
>> >> >>> [1]
>> >> >>>
>> >> >>>
>> >> >>> https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Cheers,
>> >> >> Artyom Shalkhakov
>> >> >>
>> >> >> _______________________________________________
>> >> >> Ur mailing list
>> >> >> Ur at impredicative.com
>> >> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >> >>
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Ur mailing list
>> >> > Ur at impredicative.com
>> >> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >> >
>> >>
>> >> --
>> >> Cheers,
>> >> Artyom Shalkhakov
>> >>
>> >> [1] https://github.com/bbarenblat/urweb-regex/pull/1
>> >>
>> >> _______________________________________________
>> >> Ur mailing list
>> >> Ur at impredicative.com
>> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >
>> >
>> > _______________________________________________
>> > Ur mailing list
>> > Ur at impredicative.com
>> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>> >
>>
>>
>>
>> --
>> Cheers,
>> Artyom Shalkhakov
>>
>> _______________________________________________
>> Ur mailing list
>> Ur at impredicative.com
>> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>
>
> _______________________________________________
> Ur mailing list
> Ur at impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>
--
Cheers,
Artyom Shalkhakov
More information about the Ur
mailing list