Interleaving Static Analysis and LLM Prompting
Chapman, Patrick J. and Rubio-González, Cindy and Thakur, Aditya V.Proceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, 2024
This paper presents a new approach for using Large Language Models (LLMs) to improve static program analysis. Specifically, during program analysis, we interleave calls to the static analyzer and queries to the LLM: the prompt used to query the LLM is constructed using intermediate results from the static analysis, and the result from the LLM query is used for subsequent analysis of the program. We apply this novel approach to the problem of error-specification inference of functions in systems code written in C; i.e., inferring the set of values returned by each function upon error, which can aid in program understanding as well as in finding error-handling bugs. We evaluate our approach on real-world C programs, such as MbedTLS and zlib, by incorporating LLMs into EESI, a state-of-the-art static analysis for error-specification inference. Compared to EESI, our approach achieves higher recall across all benchmarks (from average of 52.55% to 77.83%) and higher F1-score (from average of 0.612 to 0.804) while maintaining precision (from average of 86.67% to 85.12%).
PDF ACM©@inproceedings{SOAP2024, author = {Chapman, Patrick J. and Rubio-Gonz\'{a}lez, Cindy and Thakur, Aditya V.}, title = {Interleaving Static Analysis and LLM Prompting}, year = {2024}, isbn = {9798400706219}, publisher = {ACM}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3652588.3663317}, doi = {10.1145/3652588.3663317}, booktitle = {Proceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis}, pages = {9–17}, numpages = {9}, series = {SOAP 2024} }