Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this talk we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric. The supervised training minimizes the stochastic leave-one-out nearest neighbor classification error on a perdocument level by updating an affine transformation of the underlying word embedding space and a word-imporance weight vector. As the gradient of the original WMD distance would result in an inefficient nested optimization problem, we provide an arbitrarily close approximation that results in a practical and efficient update rule. We evaluate S-WMD on eight real-world text classification tasks on which it consistently outperforms almost all of our 26 competitive baselines.