Path-Based Function Embedding and Its Application to Error-Handling Specification Mining

DeFreez, Daniel and Thakur, Aditya V. and Rubio-González
Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18) , 2018

Identifying relationships among program elements is useful for program understanding, debugging, and analysis. One such kind of relationship is synonymy. Function synonyms are functions that play a similar role in code; examples include functions that perform initialization for different device drivers, and functions that implement different symmetric-key encryption schemes. Function synonyms are not necessarily semantically equivalent and can be syntactically dissimilar; consequently, approaches for identifying code clones or functional equivalence cannot be used to identify them. This paper presents Func2vec, a technique that learns an embedding mapping each function to a vector in a continuous vector space such that vectors for function synonyms are in close proximity. We compute the function embedding by training a neural network on sentences generated using random walks over the interprocedural control-flow graph. We show the effectiveness of Func2vec at identifying function synonyms in the Linux kernel. Finally, we apply Func2vec to the problem of mining error-handling specifications in Linux file systems and drivers. We show that the function synonyms identified by Func2vec result in error-handling specifications with high support.

PDF    

@inproceedings{defreez_thakur_rubio_FSE2018,
  author = {DeFreez, Daniel and Thakur, Aditya V. and Rubio{-}Gonz{\'{a}}lez},
  title = {Path-Based Function Embedding and Its Application to Error-Handling Specification Mining},
  booktitle = {Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering {(ESEC/FSE'18)} },
  year = {2018},
  pages = {423--433},
  publisher = {ACM}
}