Semantic Analysis of Programs using Graph Neural Networks
Aayan Kumar
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2022-269
December 16, 2022
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.pdf
We present a framework that given a piece of code and an input to it, can recommend the NumPy API which is semantically closest to the input code. Our approach is based on executing the provided code on the given input and creating a representation capturing the execution of the code, rather than relying on the syntax. The representation is expressed in the form of a graph which would be analyzed further using a Graph Convolutional Network for predicting the relevant API.
Existing work in semantic program analysis invariably depends on some analysis of source code or Abstract Syntax Trees and similar static representations, which is brittle as it can be sensitive to the style of writing of programs, and can get fooled using simple semantic-preserving transformations, like different variable names, dead code and so on. The representation we use is agnostic of the style of writing the code and hence generalizes across different implementations from the wild of the same functionality, eliminating the need to mine different implementations of the same method.
Advisors: Koushik Sen
BibTeX citation:
@mastersthesis{Kumar:EECS-2022-269, Author= {Kumar, Aayan}, Title= {Semantic Analysis of Programs using Graph Neural Networks}, School= {EECS Department, University of California, Berkeley}, Year= {2022}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.html}, Number= {UCB/EECS-2022-269}, Abstract= {We present a framework that given a piece of code and an input to it, can recommend the NumPy API which is semantically closest to the input code. Our approach is based on executing the provided code on the given input and creating a representation capturing the execution of the code, rather than relying on the syntax. The representation is expressed in the form of a graph which would be analyzed further using a Graph Convolutional Network for predicting the relevant API. Existing work in semantic program analysis invariably depends on some analysis of source code or Abstract Syntax Trees and similar static representations, which is brittle as it can be sensitive to the style of writing of programs, and can get fooled using simple semantic-preserving transformations, like different variable names, dead code and so on. The representation we use is agnostic of the style of writing the code and hence generalizes across different implementations from the wild of the same functionality, eliminating the need to mine different implementations of the same method.}, }
EndNote citation:
%0 Thesis %A Kumar, Aayan %T Semantic Analysis of Programs using Graph Neural Networks %I EECS Department, University of California, Berkeley %D 2022 %8 December 16 %@ UCB/EECS-2022-269 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.html %F Kumar:EECS-2022-269