To complete our three-part series on permissions, which began with Race-Safe Strategies, let’s talk about the transitional nature of reference permissions. When are permissions transitional? When we can safely create a copy of a reference which has a different permission than the reference it copied from. There are several ways in which this can happen, which this diagram summarizes (and the following sections explain): The following sections describe the nature of several one-way transitions that flow downward in the diagram.
In my last post, Race-safe Strategies, one footnote stated “safety issues which look suspiciously similar to race conditions can crop up when a language supports the creation of “interior references” to shared, mutable values of certain types”. Let’s explore that now. I will begin by recapitulating Manish Goregaokar’s excellent post “The Problem With Single-Threaded Shared Mutable”. His post clearly explains why the Rust language wishes to steer developers towards RefCell for shared references over use of Cell, its inflexible shared, mutable counterpart.
I recently made the observation that many people seem unaware of the full collection of constraint mechanisms available for protecting race safety. Someone sensibly asked for a link to an article that provides a modern, comprehensive review. It turns out that the pickings are very slim; the best I could find is this Wikipedia article on thread safety. It’s accurate, but incomplete. To close that gap, let me take a stab here at more comprehensive treatment.
Is it possible to improve on Rust’s single-owner strategy to support more complex data structures? Before digging into this challenge, let’s summarize the story so far… The Promise and Limitations of Single-Owner Rust’s single-owner memory management is a form of automatic memory management; a garbage collection strategy that is distinct from tracing and reference-counting. Fundamentally, it is an improvement on RAII, which automatically finalizes some defined resource at the end of its defined lexical scope.
Most programming languages support only copy semantics. A value passed to a function or stored in a variable is a copy of the original value. We know it is a copy, because any change we make to the copy has no impact on the original value. A few languages, like C++ and Rust, also support move semantics. Unlike a copy, a transfer moves the original value to its new home; that value is no longer accessible at its previous home.
In the world of automatic memory management, reference counting is considered to be one of the easiest to implement. The rules seem simple: When a reference is created to an allocated memory area, set its counter to 1 When the reference is copied (aliased), increment the counter When an alias is destroyed (de-aliased), decrement the counter When the counter reaches zero, free the reference’s memory area The simplicity of these rules does not always translate to a simple implementation.
The Cone compiler performs a data flow analysis pass after name resolution and type checking. Given that this sort of analysis is rarely covered by compiler literature, I thought it might be useful to jot down some thoughts about its purpose and intriguing mechanics. Goals Like Rust (and unlike C), Cone applies constraints to references that ensure they can only access memory safely, even in the face of concurrency. Some of these constraints are completely enforced by the compiler.
As mentioned in the previous post, all typed nodes use the TypedNodeHdr common header. It only contains a pointer to the node’s type. This type field applies to every node in the “expression” group. This group holds all node types that return a value, including leaf nodes like literals and variables, function call nodes, and even the block and if nodes. The type check pass focuses largely on this type field, ensuring it specifies a valid, consistent type.
Working out effective name handling mechanisms in Cone’s IR took several tries. The key challenges: Although most node types don’t use names, the ones that do are spread out unevenly across every node group: a few expressions (variables and functions), most types, and some statements (e.g., module nodes). What is the best way to make a node’s name information generically accessible regardless of node type, without wasting a lot of space for non-named nodes?
Having spoken about the compiler’s IR tree in general terms, let’s focus in on an important detail: how to represent a node that could be any arbitary type. Cone’s IR makes use of dozens of different types of nodes, each defined using a different struct. However, sometimes the compiler needs to point to a node without restricting which type of node it must be. For example, consider an assignment node.