Section 6 - Z-buffer vs. HCM Results

RenderDude! contains 2 camera types, the Hierarchical Coverage Mask camera and the Z-buffer camera. In this section, we compare these two rendering algorithms, using image quality, speed, and memory usage. All benchmarking was done on the Frinkiac 2000, a 266Mhz PentiumII system with 128MB of memory.

6.1 - The Fonz Meets Jurassic Park

The scene the I used for performance benchmarking depicts a bizarre situation in which 3 '57 Chevys (in blue) and 3 stegasauri (in pink) have somehow melded together. The scene describes 10 frames of animation in which the camera starts out in front of the cars and gradually backs out up and to the right. The very low-quality animated GIF below depicts the animation, and the still image on the right is a full size high quality version of the first frame of the animation, rendered with the HCM camera.

Figure 1: What a mess

The scene consists of approximately 110,000 triangles. Vertex normals have been suppressed, so face normals are used for shading computations (hence the blocky shading effect).

The animation was rendered using both per-face and per-vertex (Gouraud) shading options with both the z-buffer and HCM cameras. In per-face shading, a random colour is chosen for each face, and the face is filled uniformly with that colour. For the HCM camera mode, both subpixel (64 visibility samples per pixel) and non-subpixel (1 visibility sample per pixel) were rendered for each shading mode. The following table summarizes the timing and memory stats for all the different options. The links at the top lead to a sample image from each animation.

Input File	ZB_Face	ZB_Vertex	HCM_Face	HCM_Vertex	HCM_FaceSub	HCM_VertexSub
Camera	Z-Buffer	Z-Buffer	HCM	HCM	HCM	HCM
Shading	per-face	per-vertex	per-face	per-vertex	per-face	per-vertex
Visibility Samples/Pixel	1	1	1	1	64	64
Final # of Triangles	110528	110528	687242	687242	687242	687242
Peak Memory	21MB	21MB	98MB	98MB	104MB	104MB
Preprocess Time	0s	0s	29s	29s	29s	29s
Frame 0 Render Time	16s	27s	12s	14s	22s	32s
Frame 1 Render Time	15s	25s	12s	15s	242s	35s
Frame 2 Render Time	13s	22s	13s	17s	27s	41s
Frame 3 Render Time	12s	21s	15s	19s	30s	43s
Frame 4 Render Time	12s	18s	14s	18s	28s	42s
Frame 5 Render Time	10s	16s	14s	18s	29s	42s
Frame 6 Render Time	9s	15s	15s	18s	29s	43s
Frame 7 Render Time	9s	15s	15s	18s	30s	44s
Frame 8 Render Time	9s	13s	14s	18s	28s	42s
Frame 9 Render Time	8s	13s	14s	17s	28s	40s
Frame 10 Render Time	7s	12s	14s	16s	27s	40s
Total Render Time	121s	198s	181s	217s	331s	472s

Figure 2: Timing/Memory Results

Some interesting points to note:

For the per-face shaded cases (ZB_Face, HCM_Face, HCM_FaceSub), visibility determination is the only significant operation happening during rendering, so the total render times for these cases can be used to compare the visibility aspects of the two algorithms.
The HCM algorithm has to render nearly 7 times the triangles rendered by the z-buffer, due to the increased tessellation required for the visibility structure. The per-face shaded images clearly illustrate the increased face counts.
In terms of visibility, HCM is slower than Z-buffer, even at 1 sample per pixel, but HCM can achieve 64 samples per pixel at just under twice the cost of one sample per pixel. Achieving this same resolution with a Z-buffer would probably increase the computation time by a factor of around 64.
In the scene, the benefits of a subpixel visibility solution are quite apparent. The geometric aliasing on the thin pieces of geometry (the steering wheels, in particular) is quite apparent in the single visibility sample cases (HCM_Vertex, ZB_Vertex). In general, the softer edges in the HCM_VertexSub image are quite pleasing.
For the Z-buffer, frame render times monotonically decrease from frame 0 to frame 10, reflecting the fact that the same polygons are rendered for each frame, but they get smaller (and hence are rendered faster) as the camera moves away. With the HCM, the first frame is usually the fastest, since in this frame the Chevys occlude almost the entire image, so far fewer triangles from the stegasori actually get rendered. The frame render times increase as the stegasori come into view, then decrease as the camera moves away and the faces get smaller.

6.2 - Conclusions

Obviously, the subpixel visibility sampling of the HCM algorithm produces superior images to the Z-buffer, even with only a single shading sample per pixel. This increased quality is essential for a serious renderer, as evidenced by the horrible thin-geometry aliasing artifacts present in the Z-buffer images.

Unfortunately, this increased quality comes at a high memory cost - the increased face count due to the visibility structure increases the memory footprint by a factor of 5. This kind of memory usage would not be acceptable in a production environment. However, if a front-to-back traversal of scene triangles could be achieved without having to increase the face count by so much, then the HCM algorithm would most likely beat the Z-buffer handily. In fact, Greene's paper reports that his software HCM algorithm could resolve visibility in subpixel mode as fast as the single-sampling Z-buffer. I believe he achieved this result by using a pre-sorted polygon database, and having both algorithms render exactly the same set of faces.

Another advantage of the HCM approach is that it opens the door for subpixel adaptive shading schemes, since it resolves visibility on a grid of sample positions within the pixel, each of which could potentially be used as a shading sample.

My conclusion is that although a subpixel visibility solution is a requirement for a serious renderer, the memory cost of the HCM algorithm (my implementation of it, at least) probably makes it unsuitable for general use.