MPEG-4

MPEG-4 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group) for interactive multimedia creation, delivery and playback for the Internet. MPEG-4 can not only be used for digital television, but also for interactive multimedia and interactive graphics applications. To do so, MPEG-4 can not only compress video and audio but is also able to handle text, pictures, animations, 2D and 3D objects. All these elements can be used to build interactive multimedia presentations that can be tailored to be transported over low bandwidth networks as well as high-definition broadband broadcasting networks.

MPEG-4 comprises the following components:

Table 1: Different components of MPEG-4
Scene Description
Interactivity
Synchronization
Application engine (MPEG-J)	Audio Audio Speech Synthetic Audio Synthetic Speech		Visual Video Still images Text 2D graphics 3D graphics Face and body animation
Intellectual Property Management and Protection
File Format (MP4)		Data Transport (Flexmux/Transmux)

MPEG-4 is based on different objects which can be encoded and transmitted seperately. The objects are used to build the scene after decoding. In order to build the compositions, MPEG-4 includes a scene description language, called BiFS, Binary Format for Scenes. The scenes can comprise interactivity. Different objects can appear, disappear or e.g. change their color according to input of the user. Objects can be about anything, e.g video, text, graphics, 2D objects, 3D objects, audio, speech. All the objects are encoded with their own optimal coding scheme. Video, audio and the other objects can be tightly synchronized.

MPEG-4 Audio
MPEG-4 audio facilitates a wide variety of audio objects, from high quality multichannel audio to a mono speech signal as well as synthetic audio and speech.

General audio supports a wide range of input signals and related bandwidths, from sounds with a telephone line quality (4 kHz bandwidth) up to broadcast quality multichannel audio. Audio coding is based on the Advanced Audio Coding (AAC) that was introduced in MPEG-2 and extended in MPEG-4. MPEG-4 also introduces audio coding based on a parametric coding to allow coding of audio at extremely low bitrates down to 1 kbit/s per channel.

MPEG-4 Audio has specific tools for speech coding. Speech coding can be done at bitrates ranging from 1.2 kbit/s up to 24 kbit/s using HVXC or CELP.

MPEG-4 also comprises tools to make synthetic sounds. MPEG-4 Structured Audio is a language to describe instruments and input to "play" those instruments. These instruments are mathematical objects, which makes it possible to generate not only the sound of a real instrument such as a piano, but also to generate a waterfall or even "unnatural" sounds. MPEG-4 has also the tools available to synthesize speech with text as input.

MPEG-4 Visual
The MPEG-4 Visual standard allows the coding of natural still images, video and 2D and 3D graphics. MPEG-4 is able to compress video into a bitstream of 5 kbit/s up to more than 1 Gbit/s with a resolution ranging from smaller than QCIF up to Studio resolution of 4000 × 4000 pixels. The principle of the coding is still the same as in MPEG-1 and MPEG-2. Coding is based on only sending the differences between the different pictures and prediction of movement. The only difference is that the coding can be based on individual objects instead of a whole picture. This makes it possible to differentiate between a fixed background, which is sent as a still image and a talking person in the foreground, which is sent as a video object.

Popular video formats on the internet such as DivX and Xvid are based on MPEG-4.