The Multiview Video Coding amendment (MVC) of the H.264/AVC standard provides view scalability at the bitstream level. This allows the efficient transmission of multiview video (e.g., video with 2 views suitable for viewing on a stereo display) in an efficient and backward compatible way. This is illustrated in Figure 1 for a bitstream with 2 views. A legacy H.264/AVC decoder decodes only one (the so-called base view) of the two views that are included in the multiview bitstream. The reconstructed video sequence can be displayed on a conventional 2d display. On the contrary, a stereo decoder is capable of decoding both views and the decoded video sequences (one for the left and one for the right eye) are suitable for 3d displays.
The desire for an efficient representation of multiview video originates from the growing interest in 3D video. One of the premium application areas of MVC are 3D-Blu-Ray players. Conventional Blu-Ray players can decode one of the two views that are stored on a 3D-Blue-Ray disc and ignore the data on the disc for the other view. 3D-capable Blu-Ray players, however, can decode both views, which can then be displayed on a 3D display.
A simple approach for multiview video coding is to code all of the multiple views with a legacy 2d video coding standard. Although such a concept provides the functionality required for multiview video coding, it does not use the dependencies between views and thus does not provide a suitable coding efficiency. Besides the exploitation of temporal dependencies, it is required to also exploit the dependencies between the different views for an efficient coding. On the other side, for an adoption of a multiview coding standard by the industry it was desirable to modify as less aspects of the original H.264/AVC design as possible.
In collaboration with the 3D Coding Group, the Image and Video Coding Group developed a simple, but very efficient concept for extending the H.264/AVC standard for multiview video coding. The base view is coded using the unmodified H.264/AVC syntax, so that legacy decoders can always decode the base view. As another important aspect, the MVC extension does not include any changes of the basic decoding process. Only the high-level syntax was modified. For efficiently exploiting the inter-view dependencies, the concept of multi-frame motion-compensated prediction, which was developed by the Image and Video Coding Group, was extended. In conventional 3d video coding, multiple previously decoded picture can be inserted into so-called reference picture lists. When coding a block using motion-compensated prediction (MCP), a reference picture index, which signals the used reference picture inside a reference picture list, is transmitted in addition to a motion vector. For dependent views, not only previously decoded pictures of the same view can be inserted into the reference picture lists, but also already decoded pictures of other views for the same access unit (time instant). This is illustrated in Figure 2. Here the decoded picture of the base view of the same access unit is inserted at position 1 of the reference picture list. Thus, when a reference index equal to 1 is transmitted for a block, it signals that disparity-compensated prediction (DCP) using the base view picture of the same time instant is used. If any other reference index is transmitted, conventional motion-compensated prediction (MCP) using a previously decoded picture of the same view is used.
The investigations for multiview video coding also showed that the disparity-compensated prediction, and thus the entire multiview video coding approach, is particularly efficient if it is combined with the concept of hierarchical prediction structures, which have been previously developed by the Image and Video Coding Group.
The joint proposal of Image and Video Coding Group and the 3D Coding Group has been chosen as the starting point of the MVC development and has been adopted with some additional high-level signaling mechanisms as the final MVC standard.