How Mocap Works: Reconstruction

I’ll start this series by jumping to the middle.  Makes sense right?  Believe me, this is the easy part. All the other parts will make more sense when they are looked at relative to this part.

What is reconstruction?

Different mocap systems define this differently.  I’m going to define it as the task of taking the 2d data from a calibrated multi-camera system and making it 3d.  This is pretty analogous to what the OptiTrack point cloud SDK does.  I’m going to skip calibration for this blog entry.  You can assume that we have a set of cameras and we know where they are in 3d space.  We know their orientation and we know their lensing.  I’m also going to skip the 2d aspects of the process.  You can assume that each camera is providing us with multiple objects (I call them blips) and most importantly, their position on the camera’s backplane (where they are in the camera’s view in 2d).  There is one other thing you must assume.  You must assume that there is error in all those values.  Not just the blips, but also the camera positions orientations and lense data. We hope they’re all close, but they’re not perfect.  Reconstruction is the task of taking that data, and turning it into 3d points in space.


Well, the simplest way to think about it is as follows.  Each blip can be projected out into space from the camera’s nodal point (center of the virtual lense) based on where it is in 2d on that camera’s backplane (virtual imager).  Shooting a point out is generally referred to as shooting a ray or vector.  Any two rays from two separate cameras that intersect are likely to be intersecting at a point in 3d space where there is an actual marker.  This is the reverse process of the cameras seeing the real marker in the first place.  Rays of light bounce from the marker though the lense nodal point and onto the backplane imager where they are encoded and sent to the computer (thats a little oversimplified but you get the idea).  If a third ray intersects as well, its FAR more likely to be a marker than a coincidence (you’d be surprised how often you end up with coincidences running at 100fps+).  So, while you can reconstruct a 3d point from as little as two rays, if you have enough cameras to spend on verification, you’ll get less fake markers by requiring that 3 or more rays agree.

This is often referred to as Triangulation

Its probably worth noting that this is not the typical triangulation you’ll use when say, calculating the epicenter of an earthquake by knowing its distance from known points.  Its a different type of triangulation or should I say, a different subset of triangulation operations. 


Sorry, did I say those rays intersected?  That was a mistake.  The rays never intersect.  They just pass closely together.  See, that error I was talking about gets in the way.  So what your typical mocap app will do, is deal with residuals to basically say "its close enough."  When you give a NaturalPoint system a residual for its point cloud reconstruction, you are telling it that rays that pass within a distance below the residual, should be considered having intersected where the residual is the lowest.  A high residual, could suck discreet markers together into one larger marker if they are close enough.  A low residual could keep rays from intersecting and result in low marker counts per frame.  You’ll want to balance your residual against the overall accuracy in your volume.  You can get an idea of the accuracy of your volume by looking at the residuals that it gives you at the end of the calibration process.  Or, you can just mess around with it.  You’ll also want to pay attention to the units.  Different systems measure residuals in different units.  Meters, Centimeters, Millimeters etc.


There are other factors that play into the accuracy of a reconstruction.  If two rays have a similar angle (they are more parallel then perpendicular) the accuracy of their reconstruction goes down significantly.  Its harder to determine accurately at what distance they intersect, as a little inaccuracy in the angles translates to a potentially long distance.  Most of the inaccuracy plays into the depth axis.  If you have rays that are more perpendicular, their inaccuracy is spread evenly along all three axis of potential movement, rather than the one depth axis.  Therefore, most NaturalPoint reconstruction parameters include a threshold for the minimum angle between rays.  Rays that intersect but are closer than the minimum angle, are ignored.  The units are important here as well.  I believe they tend to be in radians rather than degrees.

Min and Max Distance 

These are simple.  After a point has been reconstructed, it can be tossed due to its distance from the cameras from which they have been cast.  One really good reason for this, is that the light sources on the cameras can flare out objects that are too close, generating WAY too many phantom blips.  Ignoring blips that are reconstruct so close is a safe bet.  Likewise, throwing out markers that reconstruct far into the distance is also safe, though often not needed.

Hopefully, these basic concepts help explain what is gonig on inside the black box.  This should give you plenty of concepts with which to start working with camera placement to get better performance out of an optical mocap system. An obvioues freebie would be:  don’t place cameras too close together that are meant to cover the same space.  The angle between rays seeing the same object will be too low to get an accurate reconstruction.

Leave a Reply