Deep neural networks (DNNs) have led to significant improvements to the accuracy of machine-learning applications. For many problems, such as object classification and object detection, DNNs have led to levels of accuracy that are acceptable for commercial applications. In other words, thanks to DNNs, an ever-growing range of ML-enabled applications are now ready to be put into commercial use. However, the next hurdle is that many DNN-enabled applications can only achieve their highest value when they are deployed on smartphones or other small, low-wattage, embedded hardware. When deploying DNNs on embedded hardware, there are a number of reasons why small DNN models (i.e. models with few parameters) are either required or strongly recommended. These reasons include: - Small models require less bandwidth communication when sending updated models from the cloud to the client (e.g. smartphone or autonomous car) - Small models train faster - Small models require fewer memory transfers during inference, and off-chip memory transfers require 100x more power than arithmetic operations To create DNNs that met the requirements of embedded systems and benefitted from the advantages of small DNNs, we set out in 2015 to identify smaller DNN models that can be deployed on embedded devices. The first result of our efforts was SqueezeNet, a DNN targeted for the object classification problem that achieves the same accuracy as the popular DNN AlexNet but with a 50x reduction in the number of model parameters. SqueezeNet was created using a few basic techniques including kernel reduction, channel reduction, and delayed pooling. Over the last year, many other researchers have pursued the same goals of small, fast, energy-efficient DNNs for computer-vision problems ranging from object classification to style-transfer. In this talk we review these developments and report our progress in developing a systematic approach to the design of small DNNs.