Data Flow Analysis

The Cone compiler performs a data flow analysis pass after name resolution and type checking. Given that this sort of analysis is rarely covered by compiler literature, I thought it might be useful to jot down some thoughts about its purpose and intriguing mechanics. Goals Like Rust (and unlike C), Cone applies constraints to references that ensure they can only access memory safely, even in the face of concurrency. Some of these constraints are completely enforced by the compiler.

The IR Tree: Typed Nodes

As mentioned in the previous post, all typed nodes use the TypedNodeHdr common header. It only contains a pointer to the node’s type. This type field applies to every node in the “expression” group. This group holds all node types that return a value, including leaf nodes like literals and variables, function call nodes, and even the block and if nodes. The type check pass focuses largely on this type field, ensuring it specifies a valid, consistent type.

The IR Tree: Named Nodes and Namespaces

Working out effective name handling mechanisms in Cone’s IR took several tries. The key challenges: Although most node types don’t use names, the ones that do are spread out unevenly across every node group: a few expressions (variables and functions), most types, and some statements (e.g., module nodes). What is the best way to make a node’s name information generically accessible regardless of node type, without wasting a lot of space for non-named nodes?

The IR Tree: The INode Interface

Having spoken about the compiler’s IR tree in general terms, let’s focus in on an important detail: how to represent a node that could be any arbitary type. Cone’s IR makes use of dozens of different types of nodes, each defined using a different struct. However, sometimes the compiler needs to point to a node without restricting which type of node it must be. For example, consider an assignment node.

The IR Tree - Introduction

The Cone compiler uses a traditional pipeline to transform source programs into object files. The Intermediate Representation (IR) plays a central role in this design. It is the glue binding together the parser, semantic analysis passes, and the LLVM IR generator. The literature on effective IR design is relatively sparse as compared to other topics, such as parsing, type inference/theory, and optimization techniques. This is a shame. I have devoted far more time trying to get Cone’s IR “right” than I have on any other aspect of the compiler.