[Ur] CMS like features ? unsafe XML - encodings?
Marc Weber
marco-oweber at gmx.de
Wed Dec 15 13:28:07 EST 2010
Excerpts from Adam Chlipala's message of Wed Dec 15 15:35:32 +0100 2010:
> types), then simple code like this gets the job done.
Thanks
> > If we are at it: Does it make sense to encode the encoding of a string
> > somehow?
> Maybe so, but I'm woefully underinformed about encodings. The last time
> I looked into this, I think my conclusion was that sticking with UTF-8
> could please everybody reasonably well.
Let me quote two lines from gians blog code:
Body = {Nam = "Entry Body",
Show = (fn b => <xml>{[if strlen b > 25 then substring b 0 25 else b]}...</xml>),
I don't expect C's substring to be UTF-8 aware. In UTF-8 some bytes may
be represented by up to 4 bytes.
And for PDF files there may be a difference - because non-UTF8 fonts
shipped with the viewers in the past - so .pdf files can be smaller.
That is no longer mandatory for the future - however it still seems to
work. But that's a corner case. So for now that case is not important
enough.
Eg In Haskell you could use phantom types:
data UTF8
data ISOXX
data Buffer a = Buffer String
let x : (Buffer UTF8) = Buffer "text"
class ConcatStrs a b c | a,b -> c where
concat :: Buffer a -> Buffer b -> Buffer c
instance ConcatStrs a a a where
-- same encoding: trivial
By not providing an instance for "ConcatStrs UTF8 ISOXX" you disallow
concatenating them.
Marc Weber
More information about the Ur
mailing list