Ticket #87 (assigned enhancement)

Opened 6 months ago

Last modified 5 months ago

UTF-8 encoding

Reported by: nsteiner Assigned to: mikey (accepted)
Priority: minor Milestone:
Component: XML/XSL view engine Version:
Keywords: Cc: schst, rist

Description (Last modified by mikey)

Make Stubbles UTF-8 ready.

Change History

11/09/07 00:32:04 changed by mikey

  • description changed.

11/09/07 17:51:22 changed by mikey

  • status changed from new to assigned.

11/09/07 18:25:41 changed by mikey

  • priority changed from major to minor.
  • version deleted.
  • milestone deleted.

Changesets 1017, 1018, 1019 and 1020 changed the default encoding of the XML stream writers from ISO-8859-1 to UTF-8. As this introduces a BC-break the XML stream writer factory can now be configured with the XML version and the encoding to be used by instances created with this factory.

Currently I'm not sure about further places where we need to take care of the encoding. Therefore the bug does not get closed by decreased to minor priority and not associated with any milestone. Probably some of the XML files delivered by Stubbles need to be converted to UTF-8.

11/21/07 21:47:37 changed by mikey

Changeset 1058 fixed a bug in the net::stubbles::xml::stubDomXMLStreamWriter so that already encoded strings are not encoded again.

Changeset 1059 converted the default encoding of request value error messages to UTF-8.

12/19/07 13:38:33 changed by mikey

  • cc set to schst, rist.

Decision: internal encoding of all Stubbles strings should be UTF-8. Conversion should be done when reading data from any input stream and when writing data to any output stream.

This means several steps have to be undertaken:

  • Detect input and output encoding to be able to do the correct conversions.
  • Think about type classes that support the handling of strings and binary data inside Stubbles.
  • Check if any regular expressions must be changed to use the /u modifier.
  • Database connections must state that the client character set is UTF-8, drop support for databases that do not consider character sets.

See RFC #0146 for XP-Framework how UTF-8 is handled there.