Abstract:
As the performance of computer systems stagnates due to the end of Moore’s Law,
there is a need for new models that can understand and optimize the execution
of general purpose code. While there is a growing body of work on using Graph
Neural Networks (GNNs) to learn static representations of source code, these
representations do not understand how code executes at runtime. In this work, we
propose a new approach using GNNs to learn fused representations of general
source code and its execution. Our approach defines a multi-task GNN over
low-level representations of source code and program state (i.e., assembly code
and dynamic memory states), converting complex source code constructs and data
structures into a simpler, more uniform format. We show that this leads to improved
performance over similar methods that do not use execution and it opens the door
to applying GNN models to new tasks that would not be feasible from static code
alone. As an illustration of this, we apply the new model to challenging dynamic
tasks (branch prediction and prefetching) from the SPEC CPU benchmark suite,
outperforming the state-of-the-art by 26% and 45% respectively. Moreover, we
use the learned fused graph embeddings to demonstrate transfer learning with high
performance on an indirectly related algorithm classification task.