linerwood.blogg.se - Bytecode control flow graph builder

#Bytecode control flow graph builder code#

#Bytecode control flow graph builder code#

You can generate stack-based code easily from register-based code after all optimization passes are done It's painful to keep it consistent when you modify the code: if you remove an instruction that pushes a value on the stack on one branch you need to add a drop on the other branches too, or else the stack will be inconsistent at the merge point. I would highly recommend a representation that doesn't use the stack. If you're the kind that learns better by reading code, the Scala compiler does all of the above, and the JVM fits all of the bullet points you listed for your VM. I really like Modern Compiler Implementation but I think it's out of print. Muchnik's book can work well as a reference. There's a huge body of research on this topic, so I really recommend reading a compiler book. What kinds of techniques can I use to optimize those away in the presence of control flow and global side effects (both reads and writes) whose order needs to be preserved? Does CPS or SSA or any of the other intermediate representation formats help in this? Presumably I'll need a controlflow graph in addition to the dataflow graph, but I'm not sure how the two graphs would work together to help me optimize the bytecode. But that doesn't seem to work in the case where there are side effects whose order I need to preserve, since a dataflow graph would just ignore them. I know for side-effect-free programs I can compute a dataflow graph, starting from the returned value, and walk backwards to see which computed values are unused. But how about the others: optimizing away code computing un-used values, or redundant GOTOs? I know for a start, a forward traversal following jumps will tell us where the reachable instructions are and let us eliminate unreachable instructions. The input will be some valid bytecode for this virtual machine, but with some redundancy: some instructions will be unreachable, some variables that are computed on the operand-stack will get popped, some variables stored in local-variables will get over-written without ever being read, some GOTOs will go first to another GOTO rather than straight to the final destination.

Side effecting instructions: reading/writing the top value of the operand stack to global variables Unconditional GOTO, conditional JUMPs with various predicates on the top of the operand stack if you call a method for side effects but don't want the return valueĪn array of N local variables, instructions to copy values from a local variable to/from the top of the operand stack occur on the top 1, 2, N values on an operand stack, and leave their result on the top of the operand stackīytecode to discard the top value of the stack, e.g. Stack bytecode: instructions like i++, a+b, method calls f(a, b, c), etc. NET or bytecode from the JVM, and could work for a lot of cases of native platforms such as x86.I'm trying to figure out a technique to optimize bytecode for the following virtual machine:īytecode is a flat list of instructions, with execution starting from the first instruction. This means it is a very efficient algorithm that scales linearly in the number of instructions to be processed, and it is usually enough for most simple instruction sets such as CIL from. Furthermore, this also allows for disassemblers to implement this interface, and decode instructions on-the-fly while simultanuously building the control flow graph. The reason why a separate interface IStaticInstructionProvider is used over a normal IEnumerable, is because a normal list might not always be the most efficient data structure to obtain instructions at very specific offsets. By repeatedly using the provided IStaticInstructionProvider and the IStaticSuccessorResolver instances, it collects every instruction and determines the outgoing edges of each basic block. A static control flow graph builder performs a recursive traversal over all instructions, starting at a provided entrypoint, and adds for every branching opcode an edge in the control flow graph.