Unicode policy
==============

.. contents:: :local:

Motivation
----------
In order to reduce the number of conversions between unicode and regular 
strings, it is useful to be consistent in what encoding APIs accept their 
arguments.

Without a proper policy, it is possible to get the encoding wrong, or 
end up with indexing bugs.

Policy
------

Revision and file ids
~~~~~~~~~~~~~~~~~~~~~
Revision and file ids will be passed as regular strings, consistent with 
Bazaar itself.

Subversion paths
~~~~~~~~~~~~~~~~
Subversion itself returns regular strings using the utf-8 encoding. For that 
reason, all branch paths also use that encoding and are not converted to 
unicode objects until they are added to the inventory.

Subversion properties
~~~~~~~~~~~~~~~~~~~~~
Subversion itself returns regular strings using the utf-8 encoding. Since the 
bzr:file-ids property contains metadata to be used to construct 
Bazaar inventories, the dictionary with file ids returned by various APIs 
should use unicode objects for the keys and regular string objects for the 
values (paths as unicode, file ids as regular strings).

Branching schemes
~~~~~~~~~~~~~~~~~
Since serialized branching schemes are part of revision ids, which are 
regular strings, they are also regular strings.

SQLite cache
~~~~~~~~~~~~~
Since SQLite tends to return unicode strings, most strings need to be 
encoded as utf8 before they are returned for use by other parts of the
code. The LogWalker object should take care of this.

Working Trees
~~~~~~~~~~~~~
Bazaars working tree functions assume all relative paths are unicode, 
so the same convention will be used for bzr-svn's working tree-related code.

..
    vim: ft=rest
