computer hardware instead of an abstract machine model. Thus we settle for the following guidelines:

1. For an algorithm A to have running time O(f(n)), there must be a corresponding algorithm A' that is correct on machines with arbitrarily long pointer and size_t types, such that A and A' perform essentially the same sequence of operations on the actual hardware. (In simple cases A and A' will be the same. In other cases A may have been simplified with the knowledge that adresses are bounded.) For inputs of sufficiently large size n, A' must take at most time Cf(n) , where C is a constant, independent of both n and the address size. (Pointer, size_t, and ptrdiff_t operations are presumed to take constant time independent of their size.)

2. All container or iterator complexity specifications refer to amortized complexity. An individual operation may take longer than specified. But any sufficiently long sequence of operations on the same container or iterator will take at most as long as the corresponding sum of the specified operation costs.

3. Algorithms specify either worst-case or average case performance, and identify which. Unless otherwise stated, averages assume that container elements are chosen from a finite type with more possible values than the size of the container, and that container elements are independently uniformly distributed.

4. A complexity specification for an operation f assumes that operations invoked by f require at most the specified runtime. But algorithms generally remain appropriate if the invoked operations are no more than a logarithmic factor slower than specified in the expected case.

5. If operations are more expensive than assumed by a function F in the current STL, then F will slow down at most in proportion to the added cost. Any future operations that fail to satisfy this property will make that explicit.

To make this precise, assume F is specified to use time f(m) for input of size m. F uses operations Gk, with specified running times gk(n) on input size n. If F is used in a context in which each Gk is slower than expected by at most a factor h(n), then F slows down by at most a factor h (m). This holds because none of the current algorithms ever apply the operations Gk to inputs significantly larger than m.

Strings in SGI STL

This is an attempt to answer some of the questions related to the use of strings with SGI STL.

What's wrong with the string class defined by the draft standard?

There are several problems, but the most serious ones relate to the specification for lifetimes of references to characters in a string. The second committee draft disallows the expression s[1] == s[2] where s is a nonconstant string. This is not simply an oversight; current reference counted implementations may fail for more complicated examples. They may fail even for s[1] == s[2] if the string s is simultaneously examined (merely examined, not necessarily modified) by another thread. It is hard to define precisely what constitutes a correct use of one of the current reference counted implementation.

This problem was partially addressed at the July 1997 meeting of the C++ standardization committee; the solution was to adopt more complicated rules about reference lifetimes. Unfortunately, these new rules still do not address the multi-threading issues.

A related problem was pointed out in the French national body comments on the second committee draft. The following program produces the wrong answer for most reference counted basic_string implementations that we have tested; the problem is that, if two strings share a common representation, they are vulnerable to modification through a pre-existing reference or iterator. # include &ltstring&gt # include &ltstdio.h&gt

main() {

 string s('abc');

 string t;

 char& c(s[1]);

 t = s; // Data typically shared between s and t.

 c = 'z'; // How many strings does this modify?

 if (t[1] == 'z') {

  printf('wrong ');

 } else {

  printf('right ');

 }

}

The draft standard (as well as common sense) says that updating a reference to one of s's elements should only modify s, not t as well; the fact that s and t might share a representation is an implementation detail that should have no effect on program behavior. Given the design of basic_string , though, it is very difficult for a reference-counted implementation to satisfy that requirement.

The only known way for a reference-counted implementation to avoid this problem is to mark a string as unsharable whenever there might be an existing reference or iterator to that string. That is, whenever a program obtains a reference or an iterator to a string (e.g. by using operator[] or begin()), that particular string will no longer use reference counting; assignment and copy construction will copy the string's elements instead of just copying a pointer. (We are not aware of any implementation that uses this technique and that also attempts to be thread-safe.)

This is a drastic solution: since almost all ways of referring to characters involve references or iterators, this solution implies, in effect, that the only strings that can be reference-counted are the ones that are never used. In practice, then, a reference counted implementation of basic_string can't achieve the performance gains that one might otherwise expect, since reference counting is forbidden in all but a few special cases.

A different solution is to abandon the goal of reference-counted strings altogether, and to provide a non- reference-counted implementation of basic_string instead. The draft standard permits non-reference-counted implementations, and several vendors already provide them. The performance characteristics of a non-reference-counted basic_string are predicable, and are very similar to those of a vector<char>: copying a string, for example, is always an O(N) operation.

In this implementation, basic_string does not use reference counting. I have been using a reference counted implementation, and it works fine. Why haven't I seen problems?

The current implementations do work correctly, most of the time: preserving a reference to a character in a string is uncommon. (Although preserving iterators to strings may be more frequent, and exactly the same issues apply to iterators.) Some less contrived sequential programs also fail, though, or else behave differently on different platforms.

Multi-threaded applications that use a reference counted basic_string are likely to fail intermittently, perhaps once every few months; these intermittent failures are difficult to reproduce and debug. But it is likely that a large fraction of multi-threaded clients will fail occasionally, thus making such a library completely inappropriate for multi-threaded use.

Добавить отзыв
ВСЕ ОТЗЫВЫ О КНИГЕ В ИЗБРАННОЕ

0

Вы можете отметить интересные вам фрагменты текста, которые будут доступны по уникальной ссылке в адресной строке браузера.

Отметить Добавить цитату