The DISCO Nets is a new type of probabilistic model for estimating the conditional distribution over a complex structured output given an input. DISCO Nets allows efficient sampling from a posterior distribution parametrised by a neural network. During training, DISCO Nets are learned by minimising the dissimilarity coefficient between the true distribution and the estimated distribution. This coefficient can be tailored to the loss related to the task at hand. Empirical results in the case of a continuous output show that (i) by modeling uncertainty on the output value, DISCO Nets outperform equivalent non-probabilistic predictive networks and (ii) DISCO Nets accurately model the uncertainty of the output, outperforming existing probabilistic models based on deep neural networks. Since many important problems in probabilistic structured prediction are discrete by nature, we extend DISCO Nets to Discrete DISCO Nets. We theoretically show that the Discrete DISCO Nets' learning objective is a valid divergence measure between two discrete distributions, and that we can directly optimise the non-differentiable objective function while retaining the DISCO Nets' advantages. We empirically show that Discrete DISCO Nets successfully captures the distribution of a discrete structured output in a multiclass multilabel task.