Xinyun Chen

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-42

May 9, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-42.pdf

With the advancement of modern technologies, programming becomes ubiquitous not only among professional software developers, but also for general computer users. However, gaining programming expertise is time-consuming and challenging. Therefore, program synthesis has many applications, where the computer automatically synthesizes programs from specifications such as natural language descriptions and input-output examples. In this dissertation, we present our work on learning-based program synthesis, where we demonstrate deep learning techniques for synthesizing programs from different specification formats.

First, we present our work on synthesizing programs from multi-modal specifications with real-world applications. In particular, our SpreadsheetCoder work has been integrated into Google Sheets to support the formula suggestion feature, showing the power of learning-based program synthesis in real products. Second, we present our work on execution-guided program synthesis, which brings significant performance gain for synthesizing more complex programs from input-output examples. Our work on program translation and code optimization then demonstrate the importance of representing the program structures and designing learning algorithms correspondingly, which improve the generalization of the learned model and the complexity of programs that can be correctly generated. Finally, our work on neural-symbolic frameworks show that integrating symbolic components into neural networks empower the models with better reasoning and generalization capabilities.

Advisors: Dawn Song


BibTeX citation:

@phdthesis{Chen:EECS-2022-42,
    Author= {Chen, Xinyun},
    Title= {Learning-Based Program Synthesis: Towards Synthesizing Complex Programs from Multi-Modal Specifications in the Wild},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-42.html},
    Number= {UCB/EECS-2022-42},
    Abstract= {With the advancement of modern technologies, programming becomes ubiquitous not only among professional software developers, but also for general computer users. However, gaining programming expertise is time-consuming and challenging. Therefore, program synthesis has many applications, where the computer automatically synthesizes programs from specifications such as natural language descriptions and input-output examples. In this dissertation, we present our work on learning-based program synthesis, where we demonstrate deep learning techniques for synthesizing programs from different specification formats.

First, we present our work on synthesizing programs from multi-modal specifications with real-world applications. In particular, our SpreadsheetCoder work has been integrated into Google Sheets to support the formula suggestion feature, showing the power of learning-based program synthesis in real products. Second, we present our work on execution-guided program synthesis, which brings significant performance gain for synthesizing more complex programs from input-output examples. Our work on program translation and code optimization then demonstrate the importance of representing the program structures and designing learning algorithms correspondingly, which improve the generalization of the learned model and the complexity of programs that can be correctly generated. Finally, our work on neural-symbolic frameworks show that integrating symbolic components into neural networks empower the models with better reasoning and generalization capabilities.},
}

EndNote citation:

%0 Thesis
%A Chen, Xinyun 
%T Learning-Based Program Synthesis: Towards Synthesizing Complex Programs from Multi-Modal Specifications in the Wild
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 9
%@ UCB/EECS-2022-42
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-42.html
%F Chen:EECS-2022-42