In video coding, there are inter-frame dependencies due to motion-compensated prediction. The achievable rate distortion performance of an inter-coded frame depends on the coding decisions made during the encoding of its reference frames. We have developed a multi-frame rate distortion optimization algorithm which determines transform coefficient levels such that their impact on subsequent frames is taken into account.

Formally, a numerical optimization problem is stated where the optimization variables are the transform coefficients c of a group of N frames. An affine reconstruction operator is assumed, such that the reconstructed video signal samples s can be derived from the transform coefficient levels c as follows:

Here, A is a square matrix representing both inverse transform and motion-compensated prediction, and p is a column vector representing intra-coded blocks as well as motion-compensated prediction referring to outside the currently considered group of N frames.
As in the usual Lagrangian approach, a weighted sum of the distortion D(c) and the bit rate R(c) is minimized. The distortion D(c) is measured as the energy of the difference vector between the original signal y and its reconstruction s. For the bit rate R(c), the l1-norm (i.e., sum of absolute values) of c is used as a simple substitute, since transform coefficient levels having lower absolute value are typically cheaper to encode. In particular, zero coefficients can be encoded very efficiently.

The problem is now cast in the form of an l1-regularized least squares problem, which can efficiently be solved using the iterative shrinkage/thresholding algorithm (ISTA).
For the experiments, a modified version of the HEVC reference encoder is used and the encoder settings as in the official common test conditions are employed. When using a simple IPPP. . . prediction structure with one reference frame, i.e. each P frame references its directly preceding frame, and infinite intra period, i.e. only the first frame is coded as an I frame, average bit rate savings in the range of 10% can be observed for N = 4.

When using the Random Access configuration, which uses a periodic prediction structure with a Group of Picture (GOP) of eight frames, there are average bit rate savings of about 3 %, with a maximum of 10 %. Note that only inter-coded blocks of the “key frames” are included in the optimization.
Publications
[1] M. Winken, Multi-Frame Optimized Quantization for High Efficiency Video Coding, Ph.D. dissertation, TU Berlin, 2015. [Online available]
[2] M. Winken, A. Roth, H. Schwarz, and T. Wiegand, Multi-Frame Optimized Quantization for High Efficiency Video Coding, Picture Coding Symposium (PCS) 2015, May 2015, pp. 159–163.
[3] M. Winken, H. Schwarz, and T. Wiegand, Joint rate-distortion optimization of transform coefficients for spatial Scalable Video Coding using SVC, Proceedings of IEEE International Conference on Image Processing (ICIP) 2008, Oct 2008, pp. 1220–1223.
[4] M. Winken, H. Schwarz, D. Marpe, and T. Wiegand, Joint Optimization of Transform Coefficients for Hierarchical B Picture Coding in H.264/AVC, Proceedings of IEEE International Conference on Image Processing (ICIP) 2007, vol. 4, 2007, pp. 89–92.