In the image synthesis domain, diffusion models had shown very effective manipulation of images. However the models require abundant GPU calculations since they are operated in pixel space. To conserve computational resource while preserving flexibility and quality, Stable Diffusion (2022) [1] introduced latent space into the process.
summarize the papers of GPT1, BERT, GPT3 and the scaling laws.
After the publication of “Attention Is All You Need”, a number of language models have chosen the transformer as their base structure. As a result, many powerful models have been derived from it, the most famous of which are the GPT models and BERT.
Generative deep learning models like VAEs and GANs have shown brilliant performance. “Denoising Diffusion Probabilistic Models” (DDPM) [1] is novel generative model published in 2020. This model is based on “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics (2015)”.
In the sequence data domain, ML models based on recurrent or convolutional neural networks were dominant. However, even though many improvements have been made to them, there exist intrinsic limitations in the RNN and CNN. In 2017, the authors released the transformer that uses only the attention mechanism and no recurrence or convolutions at all, which is currently the most common technique in the field.
review one of the applications of the information bottleneck method.
Multi-frame human pose estimation is one of very interesting problem in computer vision. Accurate estimation in the problem is disturbed by ambiguous images due to fast movement, intersections or blockings of objects. There are at least 3 method in this domain: image-based estimation, video-based estimation and feature alignment. In the paper, the authors used the feature alignment method which supplements the key frame feature by extracting coarse-to-fine ones from the neighbor frames. Furthermore, they applied mutual information (MI) to maximize the task-relevant information.
review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.
The Information Bottleneck framework proposes a information theoretical principle in the representation learning which preserves information of input data relevant to label while removes irrelevant one. Despite of the wide range of applications, the optimization of the IB needs the precise estimation of mutual information. In the paper, the authors present a novel strategy, Variational Self-Distillation (VSD), which does not explicitly estimate MI as well as provide analystic and scalable solution. Furthermore, by expending VSD to mutli-view learning, they provide Variational Mutual-Learning (VML) and Variational Cross-Distillation (VCD) which remove view-specific and task-irrelevant information respectively.
review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.
Even the brilliant success of deep neural networks, there is no comprehensive theoretical base of deep learning until 2016. Tishby and Zaslavsky proposed to analyze DNN via the information plane in 2015. They also argued that the object of NN is to optimize the Information Bottleneck (IB) tradeoff between compression and prediction for each layer.
review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.
“The Information Bottleneck Method” is published in 1999 and had a lot of influence on deep learning. In this paper, the relevant information in a signal \(x \in X\) is defined by the information provides about another signal \(y \in Y\). Understanding the signal \(x\) is not just predicting \(y\) through \(x\) but specifying which features of \(X\) take a role in. Formally speaking, it is important to find a compact code for \(X\) that preserves the information about \(Y\) maximally. In this regard, the variational principle in this paper provides a framework for dealing with various problems in signal processing.