2006-12-29

Concurrency Comparison

Coarse: Very large stacks (often several MiB, but may be dropped to around 10 KiB if the usage is predictable enough). Uses a lot of kernel resources. Significant costs in switching between them.
  • Processes: Transferring data between them is expensive. Effective for specific, independent operations. Allows low-level failures or corruption (which leads to crashes) to be isolated.
  • Preemptive threads: Shared, mutable, non-atomic data structures. Independent operations can be done in a straightforward manner, but it's very hard to ensure they really are independent, and mistakes are very hard to detect. Somewhat effective for specific, independent operations, but tends to be used more generally.
  • Cooperative threads with deep stacks: Doesn't scale to multiple CPUs. Easier to program with than preemptive threads, but possible to trigger a switch while in a function that does not expect it. Not obvious to the programmer where switching may occur.

Fine: Very small stacks (could be a few hundred bytes, or a lot more if not optimized by the language). Uses no kernel resources. Cheap to switch between them.
  • Coroutines: Can be used to implement Fine cooperative threads (and a few other things). More of a low-level tool than a high-level tool.
  • Generator-based tasks: Doesn't scale to multiple CPUs. Easy to use and switching locations are obvious, but can't cooperate with existing libraries that are not already designed to be asynchronous.
  • Event-driven state machines: Doesn't scale to multiple CPUs. Confusing spaghetti code. Doesn't require any special language features.
  • Erlang-style "processes": Scales to multiple CPUs. Interactions are easy and predictable. As a predominantly functional language, it is unpopular with those who would prefer a more imperative style.

It should be fairly obvious no single style is optimal in all situations. I believe a selection of complementary approaches is ideal, which I've explained in my previous post, Ideal Threading Model.

2006-12-18

Securely Establishing Wireless Connections

Current wireless mechanisms are very insecure. They either offer no security or securing them properly requires so much effort as to rarely be done properly. I propose a mechanism that would solve this.

In the most basic form, a person would touch two devices together (on specified contact points), and simultaneously press each of their connect buttons. Once done the devices could be separated, and would communicate securely as if there was a physical wire running between them.

Moving the devices in order to make a connection may be impractical, especially for larger devices such as a television. To connect them you would want to use an intermediary device, called a “wand”, with which you would first establish connections to both desired devices, then use these connections to instruct them to connect directly to each other. However, since this implies each device may have more than one connection, it requires a further mechanism to prevent a visitor from establishing connections and then surreptitious controlling or snooping on your devices.

The security of the basic form is done by exchanging public keys in a rapid burst, fast enough and carefully timed so as to ensure both ends are in very close proximity to each other (preferably a few centimeters, if not closer). A randomly generated number would be integrated into this key exchange protocol to prevent attackers from trying to send preemptive messages before they detect the communication.

This would not protect against tampering with the devices, including putting a false front on one of them.

2006-11-25

Ideal Threading Model

The following is what I currently regard as the ideal threading model in a programming language.
  • Process — contains all the kernel resources, shared between troops and threads
  • Troop — group of threads sharing a common set of resource limits (language‐enforced)
  • Thread — group of tasks, uses m-n to map to kernel threads, each thread may be very light weight
  • Task — cooperatively scheduled using the yield keyword
Each task may represent a single operation, or more likely a chain of tasks to carry out a single operation. As they are cooperatively scheduled at only one runs at a time within each thread, they are given full access to objects used by other tasks.

Multiple threads may be running concurrently on a multi‐cpu server, and this leads to some restrictions. No mutable objects may be shared between them, only immutable ones. The exception is thread-safe builtins, which use the appropriate low-level mechanisms to do atomic updates.

A group of threads that can directly access each other's objects is called a troop, because they trust each other not to include malicious objects. A thread that isn't trusted would be in a different troop, and would be required to use proxy objects and explicit copies for all accesses (although simple objects like numbers or strings may be optimized).

A threading model such as this accomplishes many goals:
  1. Security. You can run untrusted bits of code, such as in a web browser.
  2. SMP. All CPUs could be utilized.
  3. Performance. By not allocating large stacks or kernel resources for every thread, you can easily have thousands (or millions!) of threads on a single server.
  4. Reliability. Only safe operations are permitted across threads, preventing the corruption and race conditions which plague C code.
  5. Ease of Programming. A task using Python 2.5's yield keyword is the closest you can get to traditional blocking code, letting you avoid the spaghetti code of event‐driven programming.
Unfortunately, this would require a near‐total rewrite of the CPython codebase. Perhaps PyPy will have some hope here?

2006-10-24

Robust Packaging System

Far too long since I last posted. Ah well.

One thing that's bothered me for a long time is how fragile Debian's package architecture is. Currently over 15000 packages, most with dozens of versions over the years, if not hundreds or even thousands. To install a package you must first download an archive, then extract all of the files as if you had compiled and installed it locally without a package system, and then run various scripts which may do even more. A myriad of things can (and do) go wrong.

The name "Debian unstable" refers not to the software, but to the packages themselves. It takes an immense amount of effort to get things right, and although the debian community should certainly be lauded for providing that effort, mistakes still slip through. Even after all that, it's a very slow process.

It shouldn't be this way.

We need a packaging system that's inherently robust. Minimal moving parts to wear out, less interaction between packages. I believe this means that the concept of extracting from an archive needs to be scrapped, replaced with a kernel module that makes the contents directly accessible. Then we need to eliminate the global namespace that is /usr/bin, giving each installed program its own namespace with only relevant packages (and of the right version!) exposed. Finally, most installation scripts should be replaced with a description file (probably XML based, for extensibility).

Although implementing this would require a large amount of up front work, the long run payoff would be enormous. How could you not want to do this or something similar to it?

2006-07-28

Controlling Floating Point Math Without Contexts

It's funny, a great deal of the problem with floating point is because different applications desire different behaviours, but there's no good way to control those behaviours. Usually there's only a single global context, but this breaks encapsulation in all sorts of nasty ways. Sometimes (such as in my previous post, Floating Point and NaNs in Python) there is a context object, but using methods from it becomes quite tedious.

I think there is better option out there, and it goes like this...

n = roundup(1 / 3)

The operation “1 / 3” would not do the math. Instead it would create an object that says what operation is to be performed and with what operands. This then gets passed to the roundup function which does the specified operation at its own precision.

Some more complicated examples:

e = roundup(m*rounddown(c**2))
rn = roundnearest
z = rn(rn(sum([
rn(list_old[b-1] % mod),
rn(list_old[b] % mod),
rn(list_new[b-1] % mod)
])) % mod)

As you can see this “delayed rounding” allows you to retain the use of normal math operators. There are, however, still some questions to be answered, mostly about what happens when you don't use the round functions immediately. Some possible answers are:
  • Raise an exception. Failing to use a round function immediately is always an error.
  • Apply a default rounding. This allows the current Python behaviour to be retained.
  • Compound operations with no intermediate loss. “roundup(m*c**2)” would only round once. Possibly very difficult and expensive to do.
  • Switch to intervals. “1 / 3” would produce an interval along the lines of “0.333..0.334”. Repeated operations would expand the “size” of the interval but it would remain correct in a mathematical sense. There are further questions of how to round an interval which does not have a minimal “size”. I plan to post about this later.
My personal favourite is intervals, as they are mathematically correct while still having good performance and a simple implementation. Unfortunately I have yet to explore how useful they would be for real programs.

2006-01-14

Copyright Reform, First Draft

At this point things aren't well organized. Just a list of items. At some point I'll make a new post once I have sufficient changes collected and stop editing this one. I'll edit it to add a link at that point.

  1. The intent of this law is support a diverse field of artistic works. The intent is not to prevent the use or distribution of such works. (XXX or to be used as corporate leverage?)
  2. This law shall not impose any effects or restrictions on works not of direct artistic intent, or who's practical necessity exceeds that of the artistic intent. This includes (but is not limited to) any legal documents, data collections, specifications of interactions(XXX), XXX.
    1. A specification of interaction is defined as a work explaining the size, behavior, or other property of a subject of interaction, as necessary for it to be used or interacted with.
    2. Subjects of interaction include (for example) physical objects (such as a bolt), processes, software protocols, formats or other interfaces.
  3. This law shall only apply to the expression of the work, not to the abstract concept underlying it.
  4. An entity in posession of a work shall be permitted to do anything(XXX) necessary to display or make personal or private use of the work. This includes redistribution of altered forms (such as translations) to other posessors of the work when they are not otherwise readily available.
  5. Upon receiving a work an entity shall be obligated, if they wish to duplicate it (other than otherwise permitted in this law), they shall be obligated to acquire a license from the author.
  6. If an entity has possessed a work for atleast five years the author shall be obligated to provide, as an option to the licensee, a flat fee of 50% of revenues relevent to the duplication (XXX or selling of the duplication?) with no other restrictions. It is not expected that all such licenses shall have appreciable revenue.
  7. If an entity has possessed a work for atleast ten years they are no longer obligated to obtain a license and may duplicate the work without restriction.
  8. No contract shall restrict an entity from duplicating or distributing a work for more than fifteen years after they first received it. Attempts to do so shall be deemed null and void (XXX).
  9. Within fifteen years of first receiving a work an entity is obligated to accurately inform the receiver of any duplicates as to the identity of the author, to the best of their ability.
  10. XXX I need something involving advertising. Using a character to advertise your work should be protected for longer than the normal terms. [Update] First 20 years consider the work to be an endorsement of the author, and as such any use of the work that operates as endorsement require the author's consent. XXX is there a specific law I should reference here?
  11. [Update] XXX Need a way for publically distributed materials to revert to the earliest date of distribution. Maybe private as well? Is there already common law on this?
  12. [Update] XXX I think the “upon receiving the work” dates are too complicated. Too many different people with different dates. Instead I'd prefer “public dissemination” (aka publishing) or something of the sort to be the primary date, with an extra 5 years for an “upon receiving the work” fallback.
  13. [Update] XXX The identity aspects should perhaps be expanded to rewrite trademark laws as well. There's essentially two purposes to identity laws. One is to ensure a consumer gets the product they expect (this is what current trademark laws do and it's essentially to make fraud less ambiguous.) The other is allowing consumers to locate the author of a work and support them (i.e. by buying merchandise or other works.) Notably lacking is a desire to give the author fame.

2006-01-09

Floating Point and NaNs in Python

Python's current support for floating point is very poorly defined. In fact all it does is say it depends on what C does. C in turn says very little, which leaves the programmer with no practical way to reason about floating point in his program.

I believe this could be fixed, although not without significant effort, and (unavoidably) harming the performance of floating point. First though I'd like to discuss NaNs.

NaN stands for “Not a Number.” It is used when we don't know how to represent the result of the operation, or rather the form we're currently using can't represent the result. One common way to produce a NaN is to evaluate Infinity/Infinity. anything/Infinity always produces 0, Infinity/anything always produces Infinity, so Infinity/Infinity should produce both 0 and Infinity. Obviously it can't be both so we say it produces a NaN instead. Or maybe it should raise an exception, but that's not what I want to discuss. I want to discuss how to handle a NaN once it is created.

Most people “know” that IEEE Floating Point requires “any comparison involving a NaN must return False, even NaN==NaN.” However, from my digging on the web it seems this is not what IEEE requires at all! A post by DWCantrell[1] quotes:

Four mutually exclusive relations are possible: "less than," "equal," "greater than," and "unordered." The last case arises when at least one operand is NaN. Every NaN shall compare unordered with everything, including itself.

So what IEEE requires is you treat NaN as “unordered.” In C the closest that can be done is to treat all comparisons as False, which is probably why people think returning False is what IEEE requires. But in Python we can do better, by raising an exception. Those that have a sane fallback could catch the exception, but the rest of us could rest easy in the fact that our errors will be reported.

Unfortunately there's another problem. In Python identity supersedes value. “a is b” automatically implies “a == b.” We may be able to treat it as “unordered” in some cases, but not all (notably those involving containers.) But this begs the question, do we really want it to be “unordered” in all cases? We say “unordered” because we have no reasonable ways to compare by value, and because IEEE Floating Point only deals with comparisons by value, “unordered” is the only answer it gives. However, basic math does use identity comparisons! “a=a” is obviously a true statement. So why should Python limit itself to value comparisons when the identity comparison is just as valid and provides us with more information? Thus, I believe Python should not only tolerate identity comparisons involving NaN, but it should embrace them!

However, that leads us to needing to preserve the identity of NaN objects. This is actually quite easy. We simply need to create a NaN type which cannot be interned (unlike other number types), and have floating point operations return it instead. All comparisons would raise an exception unless they are comparisons with itself. This should all be the default behavior for non-number types, which is not surprising since NaN is “not a number.”

As an aside, you may wonder if this affects numeric arrays. It does not. A copied NaN will not share identity with the original object, and thus will not compare equal with the original object. Numeric needs only document that it always copies the NaN objects.

One final note on NaNs. If identity is significant then you obviously don't want a global NaN constant. Instead I propose a makeNaN() function, which would create a unique object with every call.

Okay, back to floating point in general. What I believe we need is to do is provide identical results on all platforms by default and provide an option for faster computation if less stringent results are required. Ideally it should also have a way to simulate those less stringent results even if they're not native to the platform, to allow testing and development of all platforms from a single computer.

The way I believe this should be done is using Context objects, similar (but not identical) to what decimal provides. A few notes:
  • What should the size of the output be? I believe the simplest behavior is to have the output be exactly specified by the Context, and not be affected by the size of the inputs. This makes operations such as 1/somefloat(3) produce an obvious result. [Edit] It also means Infinity can be a singleton, maybe even a separate class with no concept of size.
  • I believe Context objects should be created by a factory function and should be immutable. The reason is that altering a hardware context is actually quite expensive, and allowing attribute modification would be misleading. It's easier to swap two unchanging Context objects than it is to repeatedly compute the state of a single changing Context object. Additionally, having Contexts be immutable means there is less exposed when the user wants to create their own Context class (to produce intervals for instance.)
  • There should be sys.getfloatcontext() and sys._setfloatcontext() functions, providing access to per-thread contexts. Note that sys._setfloatcontext() has a leading underscore, indicating that it is private—only those hacking on the implementation are expected to use it, not those wanting to changing the default context (use a private context instead.)
  • How to implement this all is probably the biggest issue. The best solution I have is to use LLVM to access the hardware in a portable way while circumventing C entirely. This has the added advantage that it could be reused by the PyPy project.
  • The most common Contexts would be IEEE-like 32bit, 64bit, and 80bit sizes. They would provide exact results for all operations and functions (even transcendental functions such as sqrt()!)
  • If exact results are not necessary then we could either provide functions named loosesqrt() or provide a loose context where all functions are loose. We would then have to decide on whether the functions will be documented to provide results within a certain range, or if they're defined to provide any result at all (random()?)
  • A loose context may be permitted to arbitrarily pick the size of the objects it outputs, so long as those objects are stable once created. For instance, “c.divide(1, 3) == c.divide(1, 3)” may return one object in 32bits and the other in 80bits, thus causing the comparison to return False.
  • [Edit] It may not be obvious so I'll say so explicitly: math.sqrt() would become a wrapper for sys.getfloatcontext().sqrt()
[1] http://groups.google.ca/group/sci.math.num-analysis
/msg/912e7246f03b4185?hl=en

2006-01-06

Safe Threads

I need to explain why safe threads are so important. Basically you have three options:
  1. Honour system. The programmer uses explicit locks and is expected to do things right. Any mistakes invoke a memory model (as in Java) or undefined behavior (as in C). Either way you get bogus results. May significantly harm performance when the compiler can't reason about the locks used, as it has to fall back to the most generic code possible (that conforms to the memory model.)
  2. Compiler-enforced locks. As above, but when the compiler can't reason about a lock it emits an error instead of emitting generic code. Performance is optimal, but forces the language specification to include compiler internals.
  3. Inherently safe primitives. Inter-thread queues that only permit deeply immutable (or otherwise thread-safe, such as the queue itself) objects as contents. Atomic reference objects that allow alteration and retrieval of a single reference in an atomic (thread-safe) manor, and again requiring that reference to be deeply immutable. These primitives cannot be subverted, thus keeping the compiler easy, code reliable, and language specifications small.
The first option violates many of Python's principles:
  • Explicit is better than implicit.
  • Readability counts.
  • Errors should never pass silently.
  • In the face of ambiguity, refuse the temptation to guess.
  • If the implementation is hard to explain, it's a bad idea.
The second option, while better, still violates at least one principle:
  • If the implementation is hard to explain, it's a bad idea.
I believe the third option is the only acceptable one for python.

BTW, "deeply immutable" means that the object itself is immutable, all objects it references are immutable, all objects they reference are immutable, etc.

With the third option most of it could be kept out of the language specification, although it would need to be a core part of the implementation, not just an extension module. Unfortunately there is one aspect of the language that requires such threading, namely finalization (including weakref callbacks.) Not everybody realizes this but finalization is a form of concurrency. It causes functions to execute at unpredictable times and with no natural granularity to determine how they should interact with already executing functions.

The only reasonable way I see to handle finalization is to use safe thread mechanisms. It doesn't matter whether this is done using a single queue that the programmer must check explicitly, or by spawning a new safe thread for each finalizer function, just so long as it's entierly safe and predictable.

2006-01-01

User-defined Operators

I'm not convinced they should be added but if they are to be added they can be done fairly simply.
x = (foo oper bar)
Equivalent to:
x = oper(foo, bar)
There's a lot of notes to be made though.
  1. oper is looked up in the local (and global) namespaces, not as a method of foo or bar. If you want it to be a method you need to make the local/global function do all the lookups itself.
  2. oper can only be a single name. It cannot be an expression. The reason is that ((foo)(bar)(baz)) is currently legal, meaning (foo(bar))(baz). A limited subset of expressions could be permitted, but I feel it is better to educate users on a "no tolerance" policy.
  3. Custom operators only become really useful when Unicode variable names are permitted.
  4. There's never a question of precedence with this syntax (unlike builtin operators).
  5. Most builtin operators could be altered to use this syntax, but it would make them a bit more verbose. 1+2*3 becomes (1 + (2 * 3)). However, this bloat can be countered by making parsing aware of word boundaries and by making the parenthesis optional under most conditions. Then you return to 1+(2*3).
  6. [Edit] Perhaps the biggest drawback is requiring the user to explicitly import all operators that may be used, ie from math import ×, ≺, ≻, ≼, ≽, ⊀, ⊁, ∈, ∉, ∋, ∌, ⊂, ⊃, ⊄, ⊅, ⊆, ⊇, ⊈, ⊉, ⊊, ⊋, ∩, ∪, ∖, etc. An alternative would be defining a protocol that does use a method of one of the arguments. Unfortunately that significantly increases the complexity of my proposal, so I'll leave it as an exercise for the reader.
I think that's about it. As I said, I'm not convinced they should be added, but if they are I think this is how it should be done.