How Mocap Works: Reconstruction

I’ll start this series by jumping to the middle.  Makes sense right?  Believe me, this is the easy part. All the other parts will make more sense when they are looked at relative to this part.

What is reconstruction?

Different mocap systems define this differently.  I’m going to define it as the task of taking the 2d data from a calibrated multi-camera system and making it 3d.  This is pretty analogous to what the OptiTrack point cloud SDK does.  I’m going to skip calibration for this blog entry.  You can assume that we have a set of cameras and we know where they are in 3d space.  We know their orientation and we know their lensing.  I’m also going to skip the 2d aspects of the process.  You can assume that each camera is providing us with multiple objects (I call them blips) and most importantly, their position on the camera’s backplane (where they are in the camera’s view in 2d).  There is one other thing you must assume.  You must assume that there is error in all those values.  Not just the blips, but also the camera positions orientations and lense data. We hope they’re all close, but they’re not perfect.  Reconstruction is the task of taking that data, and turning it into 3d points in space.

Rays

Well, the simplest way to think about it is as follows.  Each blip can be projected out into space from the camera’s nodal point (center of the virtual lense) based on where it is in 2d on that camera’s backplane (virtual imager).  Shooting a point out is generally referred to as shooting a ray or vector.  Any two rays from two separate cameras that intersect are likely to be intersecting at a point in 3d space where there is an actual marker.  This is the reverse process of the cameras seeing the real marker in the first place.  Rays of light bounce from the marker though the lense nodal point and onto the backplane imager where they are encoded and sent to the computer (thats a little oversimplified but you get the idea).  If a third ray intersects as well, its FAR more likely to be a marker than a coincidence (you’d be surprised how often you end up with coincidences running at 100fps+).  So, while you can reconstruct a 3d point from as little as two rays, if you have enough cameras to spend on verification, you’ll get less fake markers by requiring that 3 or more rays agree.

This is often referred to as Triangulation

Its probably worth noting that this is not the typical triangulation you’ll use when say, calculating the epicenter of an earthquake by knowing its distance from known points.  Its a different type of triangulation or should I say, a different subset of triangulation operations. 

Residuals 

Sorry, did I say those rays intersected?  That was a mistake.  The rays never intersect.  They just pass closely together.  See, that error I was talking about gets in the way.  So what your typical mocap app will do, is deal with residuals to basically say "its close enough."  When you give a NaturalPoint system a residual for its point cloud reconstruction, you are telling it that rays that pass within a distance below the residual, should be considered having intersected where the residual is the lowest.  A high residual, could suck discreet markers together into one larger marker if they are close enough.  A low residual could keep rays from intersecting and result in low marker counts per frame.  You’ll want to balance your residual against the overall accuracy in your volume.  You can get an idea of the accuracy of your volume by looking at the residuals that it gives you at the end of the calibration process.  Or, you can just mess around with it.  You’ll also want to pay attention to the units.  Different systems measure residuals in different units.  Meters, Centimeters, Millimeters etc.

Angles

There are other factors that play into the accuracy of a reconstruction.  If two rays have a similar angle (they are more parallel then perpendicular) the accuracy of their reconstruction goes down significantly.  Its harder to determine accurately at what distance they intersect, as a little inaccuracy in the angles translates to a potentially long distance.  Most of the inaccuracy plays into the depth axis.  If you have rays that are more perpendicular, their inaccuracy is spread evenly along all three axis of potential movement, rather than the one depth axis.  Therefore, most NaturalPoint reconstruction parameters include a threshold for the minimum angle between rays.  Rays that intersect but are closer than the minimum angle, are ignored.  The units are important here as well.  I believe they tend to be in radians rather than degrees.

Min and Max Distance 

These are simple.  After a point has been reconstructed, it can be tossed due to its distance from the cameras from which they have been cast.  One really good reason for this, is that the light sources on the cameras can flare out objects that are too close, generating WAY too many phantom blips.  Ignoring blips that are reconstruct so close is a safe bet.  Likewise, throwing out markers that reconstruct far into the distance is also safe, though often not needed.

Hopefully, these basic concepts help explain what is gonig on inside the black box.  This should give you plenty of concepts with which to start working with camera placement to get better performance out of an optical mocap system. An obvioues freebie would be:  don’t place cameras too close together that are meant to cover the same space.  The angle between rays seeing the same object will be too low to get an accurate reconstruction.

How Mocap Works Series

I’m going to write a series of blog entries about optical motion capture and how it works.  Knowing whats going on inside a mocap system can help an operator better utilize it.  The series will focus on NaturalPoint’s OptiTrack cameras and systems, with references to other mocap systems and ideas.  It will also occasionally diverge into descriptions of Kinearx because my mind is a bit jumbled.  Sorry.

WPF 3D’s Wrong Direction

So there’s this new kid on the UI block.  Used to be called Avalon.  Now its called WPF.  Its probably the most unsung and misunderstood part of Vista by the general public and programmers alike.  Its actually pretty awesome.  I’ll explain what it is and how I use it in a bit.  There’s a problem though.  WPF is the next generation UI for windows, and as such, it seeks to integrate 3d as a first class citizen into the UI environment.  It fails miserably and I’ll give an in depth explanation and suggestions as to what needs to change IMHO.

What is WPF and XAML?

WPF is the brand spankin new UI system in Vista, that has apparently been backported to XP as well.  Its part of the often misunderstood .net framework.  Make no mistake though, WPF is meant to replace the venerable win32 forms GUI that we’ve been living with in some form or another since before time (read: windows95 and a little earlier in some form or another).  WPF is the future of windows programming.

Rather than just being a revision, WPF is a new approach to GUI construction.  Its based on a dependency graph model.   People who are familiar with Maya or XSI are familiar with dependency graphs, as 3d scenes in those programs are actually large dependency graphs.  Basically you build your gui as a dependency graph and you connect it to your application data which is also best done as a dependency graph.  What this means, is that your GUI is not really hardcoded.  Its authored like you would author an illustrator graphic or 3d scene.  It just needs to know how to connect itself to your app’s data and it kind of runs on its own.  It will then proceed to display itself based on your data, and if you authored it to, change your data.

XAML is the XML based file format in which you can author WPF dependency graphs (read UIs and UI pieces).

Its all about connections.  You define data sources and UI widgets and you connect them all up.  And when you run it, it just works.  Its great.

So what’s wrong with 3D?

There’s nothing wrong with adding 3d to this mix.  In fact, its a great idea.  Its HOW they did it that bothers me.  One of the cardinal rules of 3d programming is that 3d is usually not any harder than 2d.  Its just one more dimension.  So if you can solve a problem in 2d, you can usually adapt that solution to 3d by adding an extra dimension.  This can be as simple as adding an extra parameter, or one more layer of nested looping.  You name it.

So, lets take an example usage scenario.  Lets say that I’ve got a series of 2d points (unknown quantity) that I need displayed as dots on a black rectangle.  Oh I dunno… maybe its 2d blips coming from my mocap cameras.  Seems like a reasonable scenario Wink.  These points will be replaced with a new collection of points say, 100 times a second, so this is essentially a real time data flow.

WPF makes this easy with what is known as a ItemsControl.  You tell the ItemsControl where to find the collection of Point objects and it in turn, keeps an eye on that collection.  When the collection changes, it rebuilds part of the dependency graph.  What it does, is take a template for a panel and adds children to it.  Those children are also built from a template. Each of the children is given a Point from the collection as their data source.  So, if I make the panel a simple canvas panel (Cartesian coordinate based layout panel) and I make each item an ellipse that gets its coordinate from the point, I’ve got my 2d mocap viewer.

So thats great.  It works.  Not only does it draw them, but each of them is their own control and can respond to clicks and events and all kinds of wonderful things.  Go WPF!

But that was just the 2d viewer for camera data.  Lets build us a 3d viewer for the point cloud.  That should be no problem right?  WPF does 3d!  In fact, without knowing anything about WPF, you should already sort of know how this should work. I should have a panel thats a 3d canvas (3d Cartesian tri ordinate space) and an ItemsControl that puts some templated 3d controls into it.  Right?  OOPS!  WRONG

See, the problem is, 3D objects don’t inherit from Controls.  There are no Panels either.  You lay down a control that is a 3d view port, and then everything going on inside is part of a completely different set of objects that don’t interact with the 2d control object model at all unless its hardcoded in object by object.  You can’t template these 3d objects for inclusion into other 3d objects at all.  This is a HUGE problem.

If you look around at every 3d WPF demo around, you’ll see them all skirting over a very important issue.  They NEVER create a instance of a 3d object for each member of a collection.  They’ll hardcode the number of objects.  They’ll do the work in 2d and then render that 2d image onto a 3d object as a texture.  They’ll manipulate the vertex buffers of a single 3d object based on a collection.  Anything but actually spawn a true 3d object for each member of a collection.  And don’t get me started about the layout of these controls.  Its the same problem.

What they’ve done, is created a 3d engine inside of WPF, that can have static data defined by XAML and can have its parameters bound to dependency objects, but they didn’t actually extend the WPF into 3d.

Its actually a mistake that has been made before.  Back in the day, Macromedia had a product called Director.  Director owned the "multi-media" market for a while, until the web and flash took over.  In Director’s hayday, they wanted to add 3d to it.  Director had an extremely rich UI that allowed the author to manipulate controls (sprites) on a stage (panel) and control them via animation and scripting.  So you’d think the obvious direction to go, would be to expand the stage, to be able to hold 3d objects and textures and cameras and lights along with 2d sprites.  And then extend the animation keyframing environment to be able to keyframe in 3d.  And then make sure the scripting environment had the same control over the new objects as it did the 2d objects.  That’s not what they did.

Instead, they made one new type of 2d sprite that was a 3d engine.  There was no way to get inside the 3d engine except via scripting.  They made a valiant effort to fake it by authoring a lot of "behaviors" which are pre packaged scripts you can apply to sprites.  But that never made 3d break out of its sprite and onto the stage.  Director still exists… its just that its market has shrunk to near nothing compared to its potential.

This is EXACTLY the mistake Microsoft seems to have made with WPF’s 3d.  They put 3d in a black box as an engine rather than breaking open the box and making it part of the actual WPF controls API.  In so donig, all they’ve really done is created yet another 3d programming API thats in competition with DirectX and OpenGL when they should have been creating a 3d UI API. 

So what should they do?

Well, they should start over.  Not WPF, just the 3d part, which happens to be relegated to a separate namespace anyway.  The existing mess can continue to exist for all I care just as long as they make a new namespace and actually extend WPF into 3d rather than blackboxing 3d into a control in WPF.  Its really quite obvius.  Subclass Panel (or refactor panel to a higher level interface that you can implement) into a Panel3D.  Subclass Control (or refactor etc.) to Control3D.  And move in from there.  Panels and Controls work together to create layout and the rest is simple.  Ju
st make sure the ItemsControl works with it and templates work with it.  Just as in normal 2d WPF, Panels are not all cartesian based.  My favorite, the DockPanel can easily be expanded into 3d based on bounding boxes and scales for example.  Make sure NOT to seal the Panel3D, so developers can create their own layouts for say… spinning cylindrical layout etc.

What about your 3d point cloud viewer?

Luckily, my UI is a prototype UI meant to be replaced.  And luckily WPF’s overall design lets me replace the UI with very little effort.  So what I did for now, is write a 3d object that takes a collection of markers as a binding.  It then goes through its vertex buffer and moves a bunch of verticies around to make a bunch of sphere shapes (read: vertex animation).  Unfortunately, since these are not independent objects, they don’t have individual color or ability to detect mouse clicks.  They’re all one big mesh.  Its good enough for now.  When it ceases to be good enough, I may have to break out of WPF and write a pure DirectX viewer, which would be a shame since its clear WPF is supposed to be meant for this kind of thing. 

Kinearx is Coming: Humble Processing

One of the design departures that separates Kinearx from the competition is its approach to processing.  I’ve seen a number of motion capture solutions in multiple software packages and I’ve identified what I believe to be the single trait that holds most of them back: Arrogance.  Don’t get me wrong, I’m pretty arrogant myself Smile  I’m talking about a very specific kind of arrogance.  Its arrogant to assume that one’s algorithm is going to be so good, that it will be able to make sense of mocap data on its own, in a single pass, and be right about it.

Now you might be thinking, "Brad, thats silly.  All these programs let you go back and edit their decisions.  They all let you manually fix marker swaps and such.  They’re not assuming anything.  You’re blowing things out of proportion."  Ah, but then I ask you, why should an algorithm make a decision at all?  Why should you need to fix a marker swap that the algorithm has put into reality?

Kinearx approaches the way mocap data is processed in a way I would term a "humbled" approach.  Kinearx knows what it doesn’t know.  The design acknowledges that everything is a guess and its completely willing to give up on assumptions, should evidence point to the contrary.  The basic data structure that operators work on is one of recommendations, statistics and heuristics, rather than one of "the current state of the data. and what I’m going to change about it."  A typical labeling process can consist of running some highly trusted heuristic  algorithms  that make recommendations on labeling at points of high confidence.  It can also consist of applying less trusted heuristics that are wider in temporal scope.  The recommendations are weighted accordingly when they are eventually committed to the data as decisions.  Algorithms can peek at the existing recommendations to hint them along.  Manual labeling operations an be added to the list of recommendations as having extremely high confidence.  Algorithms can even go as far as to cull recommendations.  The difference between Kinearx and other mocap apps, is that this recommendation data lives from operation to operation.  It lives through manual intervention and as such, is open to being manipulated by the user, either procedurally or manually.

The power of this system will become apparent when looking at the pipeline system, which allows a streamlined processing environment in which to apply processing operations to data both procedurally and manually. 

Kinearx is Coming: Influences

Kinearx will be FIE’s flagship motion capture software offering.  Its currently in development and not quite ready for show.  However, it is taking a strong shape and is becoming functional, and I’d like to explain the goals and underpinnings of the system.  So, I’ll start with a blog entry about influences.

Software isn’t created in a vacuum of knowledge.  In this period of particularly vicious intellectual property warfare, it might even be dangerous to acknowledge any influence whatsoever.  That would not sit right be me in the long run however.  Also, I think acknowledging and explaining influences can keep design and goals on track.  So here they are in no particular order:

  • Natural Point‘s commodtization of the Motion Capture hardware market:  The commoditization of the hardware means one of two things.  Either a wide portion of the consumer market will have NP MC hardware, or a wide portion of the consumer market will have some form of competing commodity MC hardware.  Either way, the market will grow.  And more importantly, the size of the new market will so overshadow the old market, that previously existing systems will become mostly irrelevant.  So it will basically wipe the slate clean.  This means Kinearx is in a good position to come in and take a large chunk of the new commodity MC software market that will accompany the hardware market.  Its important not to lose focus on this target market.
  • Vicon IQ’s generalization and configurability:  IQ doesn’t recognize humans, it recognizes kinematic mechanical models.  It then has useful templates of those models for humans. 
    Generally, every operation exposes just about every parameter you could imagine, and many you just don’t want to know about.  It provides reasonable defaults.
  • Vicon IQ’s flexible processing:  While I think there are some serious flaws in their design, the overall concept of the post processing pipeline is a good one.  IQ stops short where it should expand further.
  • Vicon Blade:  Blade is a combination of IQ and HOM Diva.  The Diva half of it, I’m sure is fine.  The IQ half of it is really disappointing.  In combining it into Blade, they’ve lost sight of what was good about it, and in turn lost functionality.  I’ll be using IQ on Vicon system sessions for some time to come as I suspect a large portion of their existing customer base will be.
  • Giant‘s Autonomous Reliability: Giant’s software systems are proprietary, so you’d have to spend some time on set with them to observe this.  I had the opportunity to spend weeks at a time on The Incredible Hulk with one of their crews/systems.  The specifics of the software aside, the fact of the matter is, they were able to generate extremely clean data from arbitrarily complex takes overnight with minimal human intervention.  What does this look like?  Well, from what I could tell, the operator would scroll through the take, letting the system try to ID the markers (which it kinda did right on its own most of the time anyway).  When they found a frame that it got right, they marked it, and sent it off for batch processing.  The system took care of the rest.  I’d come back in the morning and they’d offload a day’s worth of cleaned up mocap directly to my hard drive.  Were there points where it interpolated the fingers?  Sure.  Did it matter?  No.  To put this in perspective, it was able to do this with two guys freestyle sparring roman greco wrestling style.  So you know, rolling around on the floor grappling.
  • Softimage XSI user interface: XSI has problems.  Believe me, I can list them.  But their UI is something special.
  • Motion Builder‘s Ubiquity:  Motion Builder is an amazing piece of software, but its meant to come into the mocap process too late and isn’t really trying to solve the hard problems early in the process.  Optical cleanup is missign half the tools it should have.  However, in the industry, Motion Builder is synonymous with motion capture.  Unless you are having your mocap vendor do all the cleanup and re targeting, you kind of need it (really big jobs can license software from Giant in some circumstances and Blade seeks to take MB’s place but those solutions have not solidified as general solutions just yet).
  • Motion Builder’s Bad Reputation:  Fact of the matter is, not many people like motion builder.  When I mention it, I get snears.  I’ve had a lot of trouble reconciling the volume of sneers I get from otherwise reasonable professionals, with the software and its production capabilities that I’ve come to know and rely on.  It is my opinion, that MB’s bad reputation comes from the complete and utter lack of production worthy training and demo material.  No one knows how to use it or what is actually in there.  Its all rather well hidden and unless someone trains you and shows you through it, you wont know its there.  Also, at this point it should be a no brainer to plug MB directly into a Maya character animation job.  However, I had to develop a ton of custom tools and workflows to accomplish it for PLF.  Add that to the fact that the 3d industry tends to be more religious and less scientific about its tools and techniques, and well, you end up in a bad spot.