[Ur] An issue with Cyrillic characters
Artyom Shalkhakov
artyom.shalkhakov at gmail.com
Fri Jul 5 00:46:53 EDT 2013
Hello list,
I'm trying to persist some strings with Cyrillic characters in them
into a Postgres 9.1 database. Here's my program:
table entry : {Id : int, Title: string}
PRIMARY KEY Id
sequence entryS
fun new_handle r =
id <- nextval entryS;
dml (INSERT INTO entry (Id, Title) VALUES ({[id]}, {[r.Title]}));
return <xml><body><p>OK</p></body></xml>
fun main (): transaction page =
return <xml><body>
<form>
Title: <textbox {#Title}/>
<submit action={new_handle}/>
</form>
</body></xml>
When I submit "текст" to Ur/Web, I get an error along these lines:
Fatal error: /home/user/proj/simple.ur:7:2-10:2: DML failed:
INSERT INTO uw_Simple_entry (uw_Id, uw_Title) VALUES (20::int8,
E'\377\377\377\377\377\377\377\377'::text)
ERROR: invalid byte sequence for encoding "UTF8": 0xff
I've prepared a patch (attached; it is made against the tip revision).
The behaviour of sprintf/printf for characters with high bit set is
unexpected on my system, for instance, the following program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
char c = (char)255;
printf("%03o\n", c);
return 0;
}
prints "37777777777". If [c] is cast to [unsigned char], then the
program prints "377" (as expected). I'm wondering if this has to do
with locale? FYI, on my system, LANG is set to en_US.UTF-8.
--
Cheers,
Artyom Shalkhakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tip.patch
Type: application/octet-stream
Size: 570 bytes
Desc: not available
URL: <http://www.impredicative.com/pipermail/ur/attachments/20130705/30864730/attachment.obj>
More information about the Ur
mailing list