Mining Input Grammars for Security Testing

Play Mining Input Grammars for Security Testing
Sign in to queue


Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program. In a URL "", for instance, the protocol "http", the host "", and the path "path" would be handled by different functions and stored in different variables. Given a set of sample inputs, we use _dynamic tainting_ to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities. The result is a _context-free grammar_ that accurately reflects valid input structure; as it draws on function and variable names, it can be as readable as textbook examples: URL ::= PROTOCOL "://" HOST "/" PATH PROTOCOL ::= "http" | "https" | ... HOST ::= /[a-zA-Z0-9.]+/ ... We expect inferred grammars to considerably ease the understanding of file and input formats. Their most important use, however, will be in automatic fuzz testing, where grammars can easily be turned into producers that help to quickly cover program features. Our grammar-based LANGFUZZ fuzzer is in daily use at Mozilla and has uncovered more than 4,000 defects so far; mining grammars automatically will bring such techniques to a wide range of programs. For details on our work on grammar mining, see 



Download this episode

Download captions

The Discussion

Add Your 2 Cents