Attention is All You Need

review the paper of the transformer.

In the sequence data domain, ML models based on recurrent or convolutional neural networks were dominant. However, even though many improvements have been made to them, there exist intrinsic limitations in the RNN and CNN. In 2017, the authors released the transformer that uses only the attention mechanism and no recurrence or convolutions at all, which is currently the most common technique in the field.

Temporal Feature Alignment and Mutual Information Maximization for Video Based Human Pose Estimation

review one of the applications of the information bottleneck method.

Multi-frame human pose estimation is one of very interesting problem in computer vision. Accurate estimation in the problem is disturbed by ambiguous images due to fast movement, intersections or blockings of objects. There are at least 3 method in this domain: image-based estimation, video-based estimation and feature alignment. In the paper, the authors used the feature alignment method which supplements the key frame feature by extracting coarse-to-fine ones from the neighbor frames. Furthermore, they applied mutual information (MI) to maximize the task-relevant information.

Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification

review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.

The Information Bottleneck framework proposes a information theoretical principle in the representation learning which preserves information of input data relevant to label while removes irrelevant one. Despite of the wide range of applications, the optimization of the IB needs the precise estimation of mutual information. In the paper, the authors present a novel strategy, Variational Self-Distillation (VSD), which does not explicitly estimate MI as well as provide analystic and scalable solution. Furthermore, by expending VSD to mutli-view learning, they provide Variational Mutual-Learning (VML) and Variational Cross-Distillation (VCD) which remove view-specific and task-irrelevant information respectively.

Opening the Black Box of Deep Neural Networks via Information

review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.

Even the brilliant success of deep neural networks, there is no comprehensive theoretical base of deep learning until 2016. Tishby and Zaslavsky proposed to analyze DNN via the information plane in 2015. They also argued that the object of NN is to optimize the Information Bottleneck (IB) tradeoff between compression and prediction for each layer.

The Information Bottleneck Method

review the information bottleneck method and the following-up papers to take a look IB framework’s mainstream.

“The Information Bottleneck Method” is published in 1999 and had a lot of influence on deep learning. In this paper, the relevant information in a signal \(x \in X\) is defined by the information provides about another signal \(y \in Y\). Understanding the signal \(x\) is not just predicting \(y\) through \(x\) but specifying which features of \(X\) take a role in. Formally speaking, it is important to find a compact code for \(X\) that preserves the information about \(Y\) maximally. In this regard, the variational principle in this paper provides a framework for dealing with various problems in signal processing.
