Getting started
If you don't have much knowledge of MPEG, this is a good place to start: http://en.wikipedia.org/wiki/MPEG-1
Also the ISO/IEC 11172-2 is an excellent reference. Unfortunately you have to pay for it: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=22411
The standard is divided in five sections. For this project using section 2 (Video) will be enough.
Algorithms
Here's an overview of the decoding process, and the classes involved.

InputStream.java: The video stream is a bit stream, so we need to be able to retrieve an arbitrary amount of bits. For example, the Sequence Header contains the width and height of the frame, both 12-bits quantities. Some of the fields are variable length, so this class is used heavily by the VLC decoder.
Vlc.java: As already mentioned, some fields are coded using Huffman variable length codes (ie. frequent items are represented by shorter codes). The problem here is determining how long each code is. The easiest (and naïve) approach would be reading a bit at a time, but needless to say, it is not performance wise, as it involves multiple shifts and masking each time.
The approach used here is making use of tables. As an example, let's consider the variable length code for dct_dc_size luminance (Table 2-B.5a in ISO/IEC 11172):
The strategy here is ''peeking'' 7 bits (the longest variable length code) and using it as an index in the following table:
So when reading a bit sequence like "1110101" we know this is the "1110" code, whose value is 5 and length 4 bits. With this information we can remove these bits from the stream and proceed.
A problem with this approach can also be seen from the example above. We need an array of 128 items (each one holding value and length) for a table of 9 elements. That's a lot of redundancy (about 14:1). A way to deal with this is using hierarchical tables. We choose a convenient value and create an extra level. For example, let's discard the 3 least significant bits and use the remaining bits as an index. This will leave us with two tables: one with 16 entries (2^4) and the other one with 8 entries (2^3). The last entry of the first table will contain a ''escape code'' that will indicate we should use the other table. Now we have 16 + 8 = 24 entries which yields a much better 3:1 ratio, at the expense of some extra computation.
Here's the code:
Idct.java: MPEG uses 8x8 DCT to convert data in time domain to data in frequency domain. This is a critical part as this code is called everytime a block is decoded. Use of fixed point math is made both for efficiency and due to constraints in CLDC 1.0
The equation for the 2D IDCT is

where F(m,n) is the DCT of the signal f(x,y) and M = N = 8
MotionVector.java: used to represent and calculate the forward and backward motion vectors.
Picture.java: A picture consists of three rectangular matrices of eight-bit numbers; a luminance matrix (Y), and two chrominance matrices (Cr and Cb). This class also provides functions for copy, compensation and interpolation of blocks. Each pel is an ''unsigned'' 8 bit sample. Unfortunately Java doesn't support such type, so a ''short'' type is used instead.
Bitmap.java: This class performs the conversion from Y'CbCr 4:2:0 to RGB.
Queue.java: A straight forward unbounded queue, used as a communication between the decoder (producer) and renderer (consumer) threads.
Decoder.java: The decoder itself, as described by the ISO/IEC 11172-2 standard. Here's an excerpt:
Player.java: A very simple MIDlet used for example purposes. It takes care of displaying the frames in display order using a very simple approach (other codecs would require a proper Playout Buffer). No other task is performed.
Optimization
There's certainly plenty of room for optimization. The aim was making the code as clear as possible, without ignoring basic performance issues. The following screen shows where the bottlenecks are:

Some possible optimizations:
- IDCT: handle special cases (ie. single coefficient)
- YCbCr-RGB: use table driven multiplication
- General Java optimizations: loop unrolling, avoid use of casts, object reutilization, etc.
The decoder was tested on a Nokia 7610, getting 4-5 fps for a 160x120 video.
This revision is from 2009-05-26 17:56
