AS6

Make a faster raycaster and add reflections. (No partner)

Helix

helix.scd
Rendered at 4rpp (rays per pixel), 4mrd (maximum reflectance depth).

Thousand Spheres

thousandspheres.scd
Rendered at 4rpp, 4mrd.

Frost Wyrm

Triangle mesh example. I didn't make this model, it's from a game that I used to play (time constraints!). Thumbnail Rendered at 128rpp, 4mrd. Click for 1680x1050 render at 1024rpp, 4rmd.

Animated Frost Wyrm

Animated example! Rendered at 16rpp, 4mrd. 180 frames.

Analysis of Speedup

My system is decently fast. These times were tested on an Intel Core 2 Quad Q6600 (B3 Stepping, Kentsfield [about 4 years old at the time of this project, and not a Xeon even though the picture says it is]) overclocked to 3.0GHz, running OS X 10.6.6. However, the accelerated timings are relatively similar for less powerful machines. A 32rpp render of Thousand Spheres using AABB Depth 16 ran in 45 seconds on an Intel Core 2 Duo SL9600 (2.13GHz, Penryn, Late 2010 MacBook Air). This makes the ratio larger for slower machines, on the same code.

Note: Although my final code contains other optimizations, these numbers were taken only with the AABB acceleration structure on top of my AS5. Later, I went back to use pointers everywhere, parallelize, and minimize larger calculations. I also changed things related to memory efficiency, which indirectly affect runtime.

AABB - Thousand Spheres (1rpp, 4mrd)

Acceleration Time Ratio
Max No Acceleration14s1x
Max Depth 025s0.56x
Max Depth 113s1.07x
Max Depth 27s2x
Max Depth 44s3.5x
Max Depth 8*1s14x
Max Depth 16*1s14x

AABB - Thousand Spheres (4rpp, 4mrd)

Acceleration Time Ratio
No Acceleration57s1x
Max Depth 0102s0.56x
Max Depth 156s1.02x
Max Depth 230s1.9x
Max Depth 418s3.2x
Max Depth 8*6s9.5x
Max Depth 16*6s9.5x

AABB - Thousand Spheres (32rpp, 4mrd)

Acceleration Time Ratio
No Acceleration361s1x
Max Depth 0636s0.57x
Max Depth 1348s1.04x
Max Depth 2188s1.92x
Max Depth 4113s3.19x
Max Depth 8*39s9.26x
Max Depth 16*39s9.26x
* denotes that the structure was saturated before this depth

Here is a comparison with a slower machine, my MacBook Air.

AABB - Thousand Spheres (4 rpp, 10mrd)

Acceleration Time Ratio MBA Time MBA Ratio
No Acceleration57s1x81s1x
Max Depth 0102s0.56x142s0.57x
Max Depth 156s1.02x78s1.03x
Max Depth 230s1.9x42s1.93x
Max Depth 418s3.2x24s3.8x
Max Depth 8*6s9.5x7s11.6x
Max Depth 16*6s9.5x7s11.6x
* denotes that the structure was saturated before this depth

For Thousand Spheres, depth 0 AABB structures are noticeably slower. For every intersection test, an extra box intersection test is needed, along with reiterating the entire list of spheres. Depth 1 AABB is about the same as a non-accelerated structure – about an equal number of intersections are computed (there are about 1/2 as many Sphere intersection tests, but every ray computes 1-2 bounding box intersections in addition to the sphere intersection). We notice speedup starting at Depth 2 AABB, where Sphere intersections are cut into 1/4, while only requiring a max of 3 bounding box intersection. Greater speedup is attained at depth 4, with a factor of 2^-4 sphere intersections and a maximum of 5 bounding box intersections. Depth 8 allows for a factor of 2^-8 sphere intersections, and a maximum of 6 bounding box intersections. The thousand spheres saturates at 7, so all depths greater than 7 will be the same, including the last test case, depth 16. This is because there are only 100 nodes, and each leaf node will contain only 1 primitive at depth 7. The 7 sphere saturation is calculated by log_2(100), because each AABB node splits the current number of Spheres into two equal nodes (median on axis). If we increase the number of spheres, we can see a greater speedup because the number of Sphere intersections is an inverse logarithmic function to depth, while the number of bounding box intersections is a linear function.

Increasing the maximum reflectance depth from 4 to 10 for thousand spheres had minimal impact because the reflections become smaller with each depth, so the added cost of more reflections was very small.

And here is a teapot.

AABB - Utah Teapot (6320 triangles, 4 rpp, 4mrd, final optimizations, Phong Normal Interpolation)

Acceleration Single-threaded Time Single-threaded Ratio Multi-threaded Time Multi-threaded Ratio
No Acceleration974s1x122s7.98x
Max Depth 0552s1.76x79s12.33x
Max Depth 1381s2.56x56s17.39x
Max Depth 2284s3.43x43s22.65x
Max Depth 4112s8.70x19s51.26x
Max Depth 826s37.46x4s243.5x
Max Depth 16*20s48.70x3s324.67x
* denotes that the structure was saturated before this depth

After implementing my final speedups, I measured the rendering time for the Utah Teapot. This would not be reasonable to render without accelerated data structures! The code parallelizes very well, giving very large speedups. I could render the Utah Teapot at 32rpp in 18 seconds. Thousand Spheres at 32rpp rendered in 4 seconds (90.25x speedup). These times were measured on delicious.cs by walking into the CITRIS Mac Lab and running my program manually. The time measured on the multi-threaded version is actual time, and the time measured on the single-threaded version is thread time, so the actual speed increase is slightly greater than reported because the computer is shared. Delicious.cs has 8 physical cores (2x Quad-Core Nehalem Xeon @ 2.4GHz).

Extra Credit

Triangle Mesh

I impleneted triangle meshes. Pretty straightforward because the loader already loaded information about them. I modified the default Sphere class to be a subclass of a Primitive class, to which I made a Triangle subclass. Each mesh retains the space efficiency by storing verticies once, but each Triangle subclass contains its own triangle object (and still shares the verticies) and own bounding box to utilize our AS6 speedups.

Phong Normal Interpolation

Normal triangle meshes have a boxy look beacuse of their natural spiky shape. This can be solved by Phong Normal Interpolation. I calculated the "normal" of a vertex by taking all the normals of the triangles that contain the vertex, adding them up, then normalizing the normal (effectively averaging them). This normal is saved to each vertex, and a Gourad Shading-like technique is used to interpolate the normals as a function of distance of the triangle's barycentric coordinates. It makes these things smoother.

Animation

Animation is implemented in a fashion similar to AS3. I output each frame and use ImageMagick to combine them into a gif.

Refraction/Transparency

I got refractions/transparency partially working, but didn't finish due to lack of time. However, it renders some scenes correctly.

Parallel/Threaded Rendering

I tried a few different approaches. I'll summarize them here.

  • OpenMP - The most straightforward framework to use. This is what I ended up using because of time constraints. I use OpenMP to create threads equal to however many processors your computer has, and splits up the work for each scene among the processors.
  • CUDA - Because I have a nVidia GPU (eVGA GeForce GTX280), I looked into CUDA to utilize it and run my program on my GPU. However, nVidia has (very) limited support of C++, so it was not as straightforward as using OpenMP to parallelize. I was overcoming this limitation by writing a CUDA wrapper to interface with my C++ raytracer, but I didn't finish before the assignment was due.
  • Hadoop MapReduce - MapReduce uses a distributed approach of SIMD. The C++ wrapper for Hadoop doesn't allow our raytracing program to fit very well. It's possible, it'll just take a while, and I definitely would not have finished on time.

My OpenMP code parallelizes very well. On my system of 4 cores, I got almost a 4 time speed increase (compared to the single-threaded version), and on delicious.cs, I attained almost a 8 time speed increase (delicious.cs has 8 physical cores).

Extra Credit from AS5

My extra credit from AS5 works in AS6. The same acceleration techniques apply here.

Yoshi

Discrete Penumbra

Infinite/Continuous Penumbra (Soft Shadows)

Depth of Field

Some More Renders

Again, I didn't make these. Models from Blizzard Entertainment. Click the thumbnail for a large, 1680x1050 1024rpp version.

Alexstrasza (World of Warcraft)

Barracks (Starcraft 2)

Siege Tank (Starcraft 2)

Frostwyrm (World of Warcraft)

Excited to learn about texture mapping!