Potentially long latency during system service blocking. | |
Simple; the implementation may even be (mostly) portable.*[5] | Single-process applications cannot take advantage of multiprocessor hardware. |
TABLE 5.2
5.6.2 One-to-one (kernel level)
One-to-one thread mapping is also sometimes called a 'kernel thread' implementation. The Pthreads library assigns each thread to a kernel entity. It generally must use blocking kernel functions to wait on mutexes and condition variables. While synchronization may occur either within the kernel or in user mode, thread scheduling occurs within the kernel.
Pthreads threads can take full advantage of multiprocessor hardware in a one-to-one implementation without any extra effort on your part, for example, separating your code into multiple processes. When a thread blocks in the kernel,
it does not affect other threads any more than the blocking of a normal UNIX process affects other processes. One thread can even process a page fault without affecting other threads.
One-to-one implemenations suffer from two main problems. The first is that they do not scale well. That is, each thread in your application is a kernel entity. Because kernel memory is precious, kernel objects such as processes and threads are often limited by preallocated arrays, and most implementations will limit the number of threads you can create. It will also limit the number of threads that can be created on the entire system — so depending on what other processes are doing, your process may not be able to reach its own limit.
The second problem is that blocking on a mutex and waiting on a condition variable, which happen frequently in many applications, are substantially more expensive on most one-to-one implementations, because they require entering the machine's protected kernel mode. Note that locking a mutex, when it was not already locked, or unlocking a mutex, when there are no waiting threads, may be no more expensive than on a many-to-one implementation, because on most systems those functions can be completed in user mode.
A one-to-one implementation can be a good choice for CPU-bound applications, which don't block very often. Many high-performance parallel applications begin by creating a worker thread for each physical processor in the system, and. once started, the threads run independently for a substantial time period. Such applications will work well because they do not strain the kernel by creating a lot of threads, and they don't require a lot of calls into the kernel to block and unblock their threads.
Figure 5.6 shows the mapping ofPthreads threads (left column) to kernel entities (middle column) to physical processors (right column). In this case, the process has four Pthreads threads, labeled 'Pthread 1' through 'Pthread 4.* Each Pthreads thread is permanently bound to the corresponding kernel entity. The kernel schedules the four kernel entities (along with those from other processes) onto the two physical processors, labeled 'processor 1' and 'processor 2.' The important characteristics of this model are shown in Table 5.3.
FIGURE 5.6
Advantages | Disadvantages |
Can take advantage of multiprocessor hardware within a single process. | Relatively slow thread context switch (calls into kernel). |
No latency during system service blocking. | Poor scaling when many threads are used, because each Pthreads thread takes kernel resources from the system. |
TABLE 5.3
5.6.3 Many-to-few (two level)
The many-to-few model tries to merge the advantages of both the many-to-one and one-to-one models, while avoiding their disadvantages. This model requires cooperation between the user-level Pthreads library and the kernel. They share scheduling responsibilities and may communicate information about the threads between each other.
When the Pthreads library needs to switch between two threads, it can do so directly, in user mode. The new Pthreads thread runs on the same kernel entity without intervention from the kernel. This gains the performance benefit of many-to-one implementations for the most common cases, when a thread blocks on a mutex or condition variable, and when a thread terminates.
When the kernel needs to block a thread, to wait for an I/O or other resource, it does so. The kernel may inform the Pthreads library, as in Digital UNIX 4.0, so that the library can preserve process concurrency by immediately scheduling a new Pthreads thread, somewhat like the original 'scheduler activations' model proposed by the famous University of Washington research [Anderson, 1991]. Or. the kernel may simply block the kernel entity, in which case it may allow programmers to increase the number of kernel entities that are allocated to the process, as in Solaris 2.5 — otherwise the process could be stalled when all kernel entities have blocked, even though other user threads are ready to run.
Many-to-few implementations excel in most real-world applications, because in most applications, threads perform a mixture of CPU-bound and I/O-bound operations, and block both in I/O and in Pthreads synchronization. Most applications also create more threads than there are physical processors, either directly or because an application that creates a few threads also uses a parallel library that creates a few threads, and so forth.
Figure 5.7 shows the mapping of Pthreads threads (left column) to kernel entities (middle column) to physical processors (right column). In this case, the process has four Pthreads threads, labeled 'Pthread 1' through 'Pthread 4.' The Pthreads library creates some number of kernel entities at initialization (and may create more later). Typically, the library will start with one kernel entity (labeled 'kernel entity 1' and 'kernel entity 2') for each physical processor. The kernel schedules these kernel entities (along with those from other processes) onto the
FIGURE 5.7
two physical processors, labeled 'processor 1' and 'processor 2.' The important characteristics ofthis model are shown in Table 5.4.