concatenation nodes, or substring nodes that reference the tree node. (We'll see later that references from some iterators are also included.) When the reference count of a tree node becomes zero, the tree node is deallocated, and reference counts of any subtrees are correspondingly decremented.

In a few cases, the reference counts are also used to allow in-place updates of ropes. If the reference counts of all tree nodes on the path from a rope R's root node to the leaf node L holding a specific character are 1, then L occurs exactly once in R, and in no other rope. Thus R can safely be updated by updating L in place.

If the rope implementation is compiled with __GC defined, it will assume that there is an underlying garbage collector and inaccessible tree nodes will be automatically reclaimed. In this case rope must be instantiated with a suitable garbage-collecting allocator, and no reference count is maintained. Thus the above optimization for in-place updates is also not implemented. Since even non-destructive updates copy only portions of a rope, and since many rope clients will use them purely as immutable strings, this is often not a serious loss. But it may be for some applications.

The remainder of this section assumes that __GC is not defined, and that reference counts are used.

Since rope nodes can be shared by different ropes, which can be concurrently copied, updated, or destroyed by different threads, reference counts must be updated atomically. This is the only explicit synchronization performed by the implementation, since the reference count is the only part of a potentially shared data structure that is updated.

The synchronization required for reference count updates may consume a significant fraction of the time required for rope operations. Reference count updates should be implemented in terms of an atomic add operation whenever such an operation is available. It is important that the reference count decrement operation not only atomically decrement the count, but also return the result as part of the atomic operation. If the zero test is performed outside the atomic part of the operation, the same tree node may be deallocated twice.

On Irix and win32 platforms, the current implementation maintains reference counts using an atomic add operation. A more generic implementation based on PThread mutexes is also provided. But it is unlikely to provide optimal performance for applications that use ropes extensively.

Allocator Use

The rope implementation can use either standard-conforming allocators (compiler permitting) or SGI-style simplified allocators. In the former case and if there are distinct allocator instances of a given allocator type, the allocator instance is stored in each rope tree node, as well as in the rope itself. It is illegal to concatenate ropes built with different allocator instances.

This representation was chosen because it keeps the implementation comparatively clean, and the instance-less case reasonably efficient. The alternative of storing the allocator instance only in the rope would have added additional allocator arguments to many internal functions. It would have been difficult to eliminate this overhead for allocator types that do not have distinct instances.

Basic Algorithms and Rope Balancing

Concatenation is normally implemented by allocating a new concatenation node, and having it refer to the two arguments to the concatenation operation. Thus in most cases its execution time is independent of the length of strings.

The case in which a short rope consisting of a single leaf is concatenated onto the right of a rope which is either a leaf, or a concatenation node whose right child is a leaf, is handled specially. In this case, if the leaves in question are sufficiently short, we may either allocate a new leaf holding the combined contents of the two leaves or, under the right circumstances, even update the left operand in place. In order to allow the destructive update, the actual arrays holding leaf characters are grown in increments of 8.

For example, the rope 'abcedefghigklmnopqrstuvwxy' might be concatenated to 'z' as shown in the following figure:

Handling this case specially guarantees that ropes built by repeatedly concatenating short strings onto the right will be composed of leaves of a minimum size, and thus can be stored and processed efficiently. It has a similar effect on repeated character insertions in the same position.

Although concatenation is efficient independent of the shape of the tree, some other operations such as retrieving the ith character, are more efficient if the tree representing the rope is approximately balanced. Ropes can be rebalanced either via an explicit call to the balance member function, or implicitly as the result of a concatenation. Ropes are implicitly rebalanced whenever the depth is in danger of overflowing, or the rope both exceeds a smaller depth threshold (currently 20) and is below a minimum length (currently 1000).

The balance operation proceeds as described in the paper cited above. The operation is non-destructive; rebalanced pieces formerly shared with other ropes are no longer shared after the operation. As a rope is being balanced, the balanced bit is set in each concatenation node that has sufficiently small depth for its length. Tree nodes with the balanced bit set are not examined by further balancing operations. Thus the balance operation tends to not rebalance the same substring.

The worst-case cost of rebalancing is nonetheless linear in the string. However, the observed behavior is that rebalancing typically consumes a small fraction of the running time. Indeed, all natural ways of building a rope of length N by repeated concatenation require total time linear in N. It is possible, but nontrivial, to design concatenation sequences that violate this.

The substring operation is performed differently depending on the tree node representing the root of the rope. The operation is recursive:

1. For a leaf node, we either return a leaf node with the substring, or if that would be too long, we return a subscript node.

2. For a concatenation node, we return the concatenation of the appropriate substrings of the left and right subtrees. We stop if we find that we need a zero length substring. Similarly, we simply return a pointer to the entire rope when that's appropriate.

3. For a substring node, we either return a short leaf node, or a new substring node. referring to the rope from which the original substring was obtained. Thus we do not build up nested substring nodes.

4. Any other function node is treated roughly as a leaf.

Note that this process requires time proportional to the rope depth, and doesn't directly depend on the length. The algorithm only copies pieces of leaf nodes at the beginning and end of the substring, and it builds new concatenation nodes along the paths from the root to either end of the substring. The interior of the substring can remain shared with the original rope.

Iterators

Iterators generated by the normal begin() and end() operations generate const_iterators, i.e. iterators that do not permit updates to the underlying rope. Such iterators basically contain three kinds of information:

1. A pointer to the root node of the rope with which the iterator is associated. This pointer is not included in the reference count, Const_iterators become invalid if the underlying rope is modified or destroyed. (In the garbage collected case they remain valid and continue to refer to the original pre-modification rope.)

2. The position inside the rope.

3. Cached information used to speed up access to sections of the rope close to the current position.

We maintain two kinds of cache information in the iterator:

1. The limits of a contiguous block of storage holding the characters surrounding the current character position, and the current offset within that block. Most iterator references can be resolved with access to only this part of the cache. The referenced block of storage can either be part of a leaf in the rope representation, or it can be a small block of characters reserved inside the iterator itself. The latter is used when the iterator refers to a rope section represented by a function node.

2. The bottom few rope nodes on the path from the root node to the leaf or function node currently referenced by the iterator. This is used to quickly increment or decrement the iterator across node boundaries. We do not cache the entire path, since that would make rope iterators unpleasantly large.

This implementation differs significantly from that used in the C 'cord' package. We do not reserve space inside iterators for a complete path from the root to the current node. This change was made to accommodate STL code that assumes small iterators that can be cheaply passed by value. We try to aid such code further by

Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату
×