Instrumentation and sampling strategies for cooperative concurrency bug isolation

Jin, Guoliang and Thakur, Aditya V. and Liblit, Ben and Lu, Shan
Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2010

Fixing concurrency bugs (or crugs) is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment.

We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by concurrency bugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.

PDF ACM©

@inproceedings{jin_etal_OOPSLA10,
  author = {Jin, Guoliang and Thakur, Aditya V. and Liblit, Ben and Lu, Shan},
  title = {Instrumentation and sampling strategies for cooperative concurrency
                 bug isolation},
  booktitle = {Proceedings of the 25th Annual {ACM} {SIGPLAN} Conference on Object-Oriented
                 Programming, Systems, Languages, and Applications ({OOPSLA})},
  year = {2010},
  pages = {241--255},
  doi = {10.1145/1869459.1869481},
  publisher = {ACM}
}