Rhamphoryncus's Brain Dump: Ideal Threading Model

The following is what I currently regard as the ideal threading model in a programming language.

Process — contains all the kernel resources, shared between troops and threads
Troop — group of threads sharing a common set of resource limits (language‐enforced)
Thread — group of tasks, uses m-n to map to kernel threads, each thread may be very light weight
Task — cooperatively scheduled using the yield keyword

Each task may represent a single operation, or more likely a chain of tasks to carry out a single operation. As they are cooperatively scheduled at only one runs at a time within each thread, they are given full access to objects used by other tasks.

Multiple threads may be running concurrently on a multi‐cpu server, and this leads to some restrictions. No mutable objects may be shared between them, only immutable ones. The exception is thread-safe builtins, which use the appropriate low-level mechanisms to do atomic updates.

A group of threads that can directly access each other's objects is called a troop, because they trust each other not to include malicious objects. A thread that isn't trusted would be in a different troop, and would be required to use proxy objects and explicit copies for all accesses (although simple objects like numbers or strings may be optimized).

A threading model such as this accomplishes many goals:

Security. You can run untrusted bits of code, such as in a web browser.
SMP. All CPUs could be utilized.
Performance. By not allocating large stacks or kernel resources for every thread, you can easily have thousands (or millions!) of threads on a single server.
Reliability. Only safe operations are permitted across threads, preventing the corruption and race conditions which plague C code.
Ease of Programming. A task using Python 2.5's yield keyword is the closest you can get to traditional blocking code, letting you avoid the spaghetti code of event‐driven programming.

Unfortunately, this would require a near‐total rewrite of the CPython codebase. Perhaps PyPy will have some hope here?

1 comment:

Adam Olsen said...: Some further comments:

* Although a bit awkward, it'd still be possible to call C functions. The easiest way would be to designate a single thread as the "main thread" and only allow it to call C functions.
* In a clustered environment, each thread could be on a different box. This only change is that your thread‐safe builtins have to support clustering.
* The only likely way I see to implement this would be with PyPy. Unfortunately, it needs to advance a lot to get performance comparable to CPython or Java. While Java does get reasonable performance from a tracing garbage collector, it benefits from not exclusively using first‐class objects; Python's performance would invariably be worse.; 25/11/06 7:31 a.m.

Rhamphoryncus's Brain Dump

2006-11-25

Ideal Threading Model

1 comment:

Blog Archive

About Me