Aayan Kumar

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-269

December 16, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.pdf

We present a framework that given a piece of code and an input to it, can recommend the NumPy API which is semantically closest to the input code. Our approach is based on executing the provided code on the given input and creating a representation capturing the execution of the code, rather than relying on the syntax. The representation is expressed in the form of a graph which would be analyzed further using a Graph Convolutional Network for predicting the relevant API.

Existing work in semantic program analysis invariably depends on some analysis of source code or Abstract Syntax Trees and similar static representations, which is brittle as it can be sensitive to the style of writing of programs, and can get fooled using simple semantic-preserving transformations, like different variable names, dead code and so on. The representation we use is agnostic of the style of writing the code and hence generalizes across different implementations from the wild of the same functionality, eliminating the need to mine different implementations of the same method.

Advisors: Koushik Sen


BibTeX citation:

@mastersthesis{Kumar:EECS-2022-269,
    Author= {Kumar, Aayan},
    Title= {Semantic Analysis of Programs using Graph Neural Networks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.html},
    Number= {UCB/EECS-2022-269},
    Abstract= {We present a framework that given a piece of code and an input to it, can recommend the NumPy API which is semantically closest to the input code. Our approach is based on executing the provided code on the given input and creating a representation capturing the execution of the code, rather than relying on the syntax. The representation is expressed in the form of a graph which would be analyzed further using a Graph Convolutional Network for predicting the relevant API.

Existing work in semantic program analysis invariably depends on some analysis of source code or Abstract Syntax Trees and similar static representations, which is brittle as it can be sensitive to the style of writing of programs, and can get fooled using simple semantic-preserving transformations, like different variable names, dead code and so on. The representation we use is agnostic of the style of writing the code and hence generalizes across different implementations from the wild of the same functionality, eliminating the need to mine different implementations of the same method.},
}

EndNote citation:

%0 Thesis
%A Kumar, Aayan 
%T Semantic Analysis of Programs using Graph Neural Networks
%I EECS Department, University of California, Berkeley
%D 2022
%8 December 16
%@ UCB/EECS-2022-269
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-269.html
%F Kumar:EECS-2022-269