Various types of how-to-knowledge are encoded in natural language instructions: from setting up a tent, to preparing a dish for dinner, and to executing biology lab experiments. These types of instructions are based on procedural language, which poses unique challenges. For example, verbal arguments are commonly elided when they can be inferred from context, e.g., "bake for 30 minutes", not specifying bake what and where. Entities frequently merge and split, e.g., "vinegar" and "oil" merging into "dressing", creating challenges to reference resolution. And disambiguation often requires world knowledge, e.g., the implicit location argument of "stir frying" is on "stove". In this talk, I will present our recent approaches to interpreting and composing cooking recipes that aim to address these challenges. In the first part of the talk, I will present an unsupervised approach to interpreting recipes as action graphs, which define what actions should be performed on which objects and in what order. Our work demonstrates that it is possible to recover action graphs without having access to gold labels, virtual environments or simulations. The key insight is to rely on the redundancy across different variations of similar instructions that provides the learning bias to infer various types of background knowledge, such as the typical sequence of actions applied to an ingredient, or how a combination of ingredients (e.g., "flour", "milk", "eggs") becomes a new entity (e.g, "wet mixture"). In the second part of the talk, I will present an approach to composing new recipes given a target dish name and a set of ingredients. The key challenge is to maintain global coherence while generating a goal-oriented text. We propose a Neural Checklist Model that attains global coherence by storing and updating a checklist of the agenda (e.g., an ingredient list) with paired attention mechanisms for tracking what has been already mentioned and what needs to be yet introduced. This model also achieves strong performance on dialogue system response generation. I will conclude the talk by discussing the challenges in modeling procedural language and acquiring the necessary background knowledge, pointing to avenues for future research.
See more on this video at https://www.microsoft.com/en-us/research/video/procedural-language-knowledge/