Professional Software Forums

This is a rant.  If you do not like rants, move on.

So, I’ve frequented a number of professional level media creation software forums in my time.  I even worked with the creator of XSIBase for a couple of years (professionally in animation, not on XSIBase itself). I’ve also spent quite a bit of time in forums for MMOGs (Massively Multiplayer Online Games) both as a member and as a moderator.

As you might immagine, the MMOG forums are sesspools of childish behavior, ego jockeying and all around nastyness. They’re also a lot of fun.  And no one really expects them to be anything other than a waste of time.

Professional forums however, are a different matter.  What I’m talking about here are forums for software like Maya, XSI, Avid, Final Cut etc.  The expectations for these forums are different.  They’re a lot of things to a lot of people and I’d like to rant about some of the typical forum diseases I see a lot.

Top Dog Syndrome

This syndrome is common.  Users who play into this syndrome have their egos attached to an image of being a top dog within the forum.  They feel that they are seen as having a superior technical or professional ability and they have to keep that image up.  When this is practiced to a minor degree, its beneficial, as it provides a push toward answering questions.  It gets out of hand more often than not and has seriously detrimental effects.

Now, I KNOW I have a tendency toward falling to this syndrome myself.  And I try to keep a check on it.  That being said, its difficult.  Specifically because I actually am much more learned and experienced on most of these matters than the general professional community.  My usual position within an animation team is that of "Technical Director" which is defined as being the guy/gal with all the answers on technical and technique matters.  So I’m paid to be top dog.

So, what I use to define a healthy top dog versus an unhealthy top dog, is weather the pressure to answer questions results in answers that should never have been given.  When a person on a forum starts skimming questions and responding with erroneous information, its a problem.  And more importantly, some forums are pandemic with it, to the point that its the norm.

For example, I logged into the central forum for a very high end video editing application recently, because I was having a problem with a feature.  I was unsure if what I was seeing was user error, or a bug, or a design limitation after searching the forum for answers and reading the documentation.  I did have enough evidence that I was fairly certain it was user error or a bug, as I had been able to force the software to work correctly under some very specific settings that were unfortunately, not good enough to let me work in the general case.  Anyhow, I posted a good detailed explanation of what I was seeing, what I thought I should be seeing, forum threads that had talked about similar problems and what resulted when I tried to implement the recommendations in those threads.  I asked if anyone had any experience or ideas that were related to what I was seeing.  So I wrote what I believe to be any professional forum’s dream post.

What did I get?  I got a guy with over 20,000 posts responding almost immediately with a suggestion that I was doing everything wrong and should change my entire workflow.  He also recommended that I read the manual.

Now, I’ve been using non-linear editing software for over 12 years at this point.  I was using it professionally for broadcast before a single system cost less than $200,000.  I’m currently acting as a combo post production supervisor and visual effects supervisor on a feature film and I know more about video compression and post production workflow than most people with the title of "post production supervisor" on this planet. I work with the software.  I can write the software.  I developed those skills in a professional environment as the technology developed over the past decade.

So as you might imagine, when a 20,000 post top dog fails to actually read my post and comprehend it, and gives me a canned line for amateurs, I don’t say "thank you" and throw away my post production pipeline because he said so.  Instead, I posted that while the advice was appreciated, its not valid due to reasons a, b c, d and e that were explained in the initial post.  I also added that I’d prefer it if the thread remained focused on the features I was having an issue with, rather than commenting on the general post production workflow.  This again, is a professional way of dealing with the issue.  Keep the thread on target.  And if the problem is not solved, don’t let the thread die, if for no other reason than that others will find the thread when they run into the problem, and they’re as entitled to a reasonable conclusion as you are.

So, a 2,000 post user then came to his defense and reaffirmed that I was doing everything wrong and made even more suggestions that were immediately invalidated by the information in the original post (he didn’t read it).

So what’s really going on here?  Its top dog syndrome.  They are not actually interested in the problem.  They’re interested in being seen answering my question, especially since my tone and technical explanation indicates that I’m a threat to their top dog status. By composing an initial post thats very high level, I’ve put myself in their line of fire. I’m a threat and they have to respond.

There was a little back and forth while I refuted their claims with tests and information to the contrary.  They continued to tell me everything I was doing was wrong.  I made an extreme effort to not make personal attacks and stop at the level of suggesting the topic was steering off course.

Eventually I gave up, frustrated and angry.  I posted a quick rant at the end of the thread where I declared the forum usesless due to a focus on ego polishing and rampant misinformation.

I then proceeded to investigate the problem further myself until I was convinced I understood the behavior enough to classify it a bug or design flaw.  Either way, at that point it should actually be submitted to the developer in the form of a bug report.  Its also clear at that point, that you wont get any relief from it.  Possibly ever.  Just because you can isolate a bug and give full repro steps and get it into a developer’s system, doesn’t mean that its ever going to be fixed.  In fact, it often wont be if the developer is large enough.  Internal politics and bureaucracy almost always gets in the way.  So at that point, if the functionality is important, you have to find another solution (workaround.) And thats what the forums are for really at this level.  They allow exchange of information on bugs and software misbehaviors.  More importantly, they provide workarounds and ideas.  But this particular forum was not serving that purpose and probably never will.  All because of the rampant top dog syndrome.  The results of my attempts to combat it in just my one little post because I really needed someone to take the problem seriously?  I was belittled and  attacked.  Some forums are beyond help.

In the past, I’ve been able to overcome the top dog issue with the approach I tried here.  Generally, repeated appeals to fact and reason result in forcing the top dogs to actually deal with the problem in order to be seen ultimately solving the problem, or at least be part of the confirmation of the problem.  However, that only seems to work if the larger forum community is technical enough to see those facts and reasons for what they are, even if they can’t provide an answer.  If the top dog feels the general community i
s smart enough to see them messing up, they’ll try to save their skin.  Film and Video editors working in a generally Macintosh community do not meet that threshold and therefore, the top dogs on that community had no fear of being seen playing ego games when its clear to a technically inclined individual that there’s something wrong going on.  So it didn’t work.  And I declared the forum a lost cause.

Professionals vs Amatuers vs Prosumers

These forums tend to be populated by users at varying levels of usage.  Amateurs tend to be looking for training and answers to questions that require a certain level of expertise in order to research oneself.  These users drive professionals mad.  Because they often are asking to be able to do incredibly complex or difficult things without actually studying and training enough to even understand what they’re asking.  Add to that, they often belittle the fact that it does take a lot of training and dedication.  They often have a certain level of entitlement to these more difficult techniques but don’t feel the answer that it requires time and experience is fair, and it breeds anger.

Prosumers are people who see themselves as professional but are actually unaware what the professional level actually means.  For example, animators who work on projects of 30 to 300 seconds with teams less than 10 people.  They don’t comprehend the issues involved with projects of 20 or more minutes with teams of 50 – 500 people.  They think its just a matter of hiring more people and being organized.  So their responses and approaches to issues are often not scale-abe and would bring a full scale production to a halt.  But they and their peers don’t understand that and therefore are unable to evaluate or comprehend it.  These users make up the majority of the user base.  These types of users are frustrating to Professionals but often not infuriating.  They’re frustrating for a number of reasons.  Firstly, because they often spurn the advice of professionals because they don’t fully understand it and see them as being overly complex.  Second, because the software is usually written for prosumers and not professionals.  The developers often confuse the prosumers for the professionals and cater to them, often creating features that are useless in a professional environment at the expense of professional level features or functionality.  Thirdly, it is the prosumer userbase that professionals recruit from, and its frustrating to see the prosumer base become accustomed to working in a non-scalable manner, because you know you’re going to have to retrain them when you eventually recruit them.  Both they and you would be better off if they’d just listen and try to understand… but well, that wont happen.  So you just let it go and move on.  But the chorus of prosumer voices completely overpowers the professional voice.

What’s the solution?  The forum moderators need to try to categorize their forums.  Create subforums.  Create beginners forums.  Create topic forums.  This keeps everyone from getting in everyone else’s way.  This is the way XSIBase is organized actually.  And its a good approach. You’ll find most of the professional level users who are concerned with scale-able solutions in the "scripting" and "programming" forums.

WingIDE for Python

Thought I’d just put in a quick shout out for my favorite Python coding tool, ever.  WingIDE from WingWare. WingIDE is by far the best Python coding environment I’ve ever used.

I know the first question that comes to mind when looking at the pricetag:  "With all the free python IDEs and script editors, why bother buying one?  They’re all about the same."  Well, thats mostly true.  Most python script editors I’ve used are about the same.  They provide some mediocre code completion and code folding.  Not bad… just not as good as it could be.

For me, its all about code completion.  Smart code completion.  The kind that reads APIs on the fly, knows what kind of object you’re working with, and tells you what is possible with that object.  Its sort of a combination of code completion and an object browser.  Visual Studio is renowned for its ability to do this on the fly.  Most good python script editors attempt this level of completion but they’re confounded by pythons dynamic typing.  I’ll give an example.

[code] 

import xml.dom.minidom

def myFunction (doc, element):
    pass

[/code]

So, here’s the question.  Since python has dynamic loose typing (opposite of static typing), when I try to code with the objects doc and element, how is the editor to know what types of objects they are so it can tell me what I can do with them?  It might be able to look at the code that is calling the function, but thats backwards.  A function can be called multiple times from anywhere.  Perhaps with completely different object types.  And its possible both of those calls could be valid.  The same problem shows up when trying to figure out what type of object a function returned.  There’s no rule that a function always has to return the same kind of object.  So how could the system know?

Its at this point that most script editors give up.  Code completion stops working the moment you get outside the scope of objects you create yourself within a single function.

With WingIDE, you can hint the system and you get your code completion back.  All you have to do is put in an particular type of assert statement.  For example:

[code]

import FIE.Constraints

def myFunction(obj, const):
    assert isinstance(const, FIE.Constraints.ParentConstraint)

[/code]

from the assert statement on down, code completion now works again.  There’s also an added benefit, in that the script will throw an exception should the assertion fail.  In python, my script could go for another 20 lines working on the wrong type of object and giving me a vague error without the assert check that cuts straight to the heart of the matter.

WingIDE will also parse source files for documentation and display it for you as you code, eliminating the need to constantly look up the API docs yourself.

Now, I know there’s a hardcore base of programmers out there who say all they need is a text editor and be damned with all these fancy IDEs and their crutches.  Well, I simply disagree.  I’m sure if you are a coder who has maybe 2 APIs to work with on a regular basis, perhaps that is all you need.  But in my job, I am required to learn a new API within a few hours and repeat that as much a necessary.  That can sometimes be 2-3 APIs a day. Do I know the full API?  No.  I know enough to get the job done.  And thats what I’m paid to do.  For that kind of coding (and scripting, I think lends itself to that kind of coding more that development does) there is no better tool than WingIDE.  Call me a weak coder if you wish.  I’ll just keep coding, getting the job done faster and better, and keep getting paid to do it. I have a job to do.

 

 

 

How Moap Works: Trajectorization, and Labeling

So far in the series, we’ve started in the middle at reconstruction.  Then we took a step back and talked about reflectivity and markers.  Now, we’re going to move forward again, into the steps after reconstruction.

This article will be a little different than the previous ones, in that its more theoretical than practical.  That is to say, its the theory of how these kinds of things are done, not neccesarily how its done in Arena or in Vicon’s IQ.  Both systems are really closed boxes when it comes to a lot of this.  I can say, that the theory explained here is the basis for a series of operators in Kinearx, my "in development" mocap software.  And most of the theory is used in some form or another in Arena and IQ as well.  It just may not quite work exactly as I’m describing it.  Also, its entirely possible I’m overlooking some other techniques.  It would be good if this post spurred some discussion of alternate techniques. 

So, to review, the mocap system has triangulated the markers in 3d space for each frame.  However, it has no idea which marker is which.  They are not strung together in time.  Each frame simply contains a bunch of 3d points that are separate from the 3d points in the previous and next frames. I’ll term this "raw point cloud data."

Simple Distance Based Trajectorization

Theory:  Each point in a given frame can be compared to each point in the previous frame.  If the current point is closer than a given distance to a point in the previous frame, there’s a good chance its the same marker, just moved a little.

Caveats:   The initial desire here, will be to turn up the threshold, so that when the marker is moving, it registers as being close enough.  The problem, is that the distance one would expect markers to be from one another on a medium to small object, is close to the distance they would be expected to travel if the object were moved at a medium speed.  Its the same order of magnitude.  Therefore, there’s a good chance that it will make mistakes.

Recommendation:  This can be a useful heuristic.  However, the threshold must be kept low.  What will result, will be trajectorization of  markers that are moving slowly, or are mostly still.  However, movement will very quickly pass over the threshold and keep moving markers from being trajectorized.  This technique could be useful for creating a baseline or starting point.  However, it should probably be ignored if another more reliable heuristic disagrees with it.

Trajectorization Based on Velocity

Theory:  When looking at an already trajectorized frame, one can use the velocity of a trajectory to predict the location of a point in the next frame.  Comparing every point in the new frame against the predicted location, with a small distance threshold should yield a good match.  Since we are tracking real world objects that actually have real world momentum, this should be a valid assumption.  This technique can also be run in reverse.  This technique can be augmented further by measuring acceleration and using it to modify the prediction.

Caveats:  Since there is often a lot of noise involved in raw mocap data, a simple two frame velocity calculation could be WAY off.  A more robust velocity calculation taking multiple samples into consideration can help, but increase the likelihood that the data samples are from too far back in time to be relevant to the current velocity and acceleration of the marker (by now, maybe the muscle has engaged and is pushing the maker a different direction entirely).  An elastic collision will totally throw this algorithm off. Since the orientation of the surfaces that are colliding is unknown to the system, its not realistic for it to be able to predict direction.  And since most collisions are only partially elastic, the distance can not be predicted.  Therefore, an elastic collision will almost always result in a break of the trajectory.

Recommendation:  This heuristic is way more trustworthy than the simple distance calculation.  The threshold can be left much lower and should be an order of magnitude smaller than the velocity of a moving marker.  It can also be run multiple times with different velocity calculations and thresholds.  The results should be biased appropriately, but in general, confidence in this technique should be high.

Manual Trajectorization

 Theory: You, the human, can do the work yourself.  You are trustworthy.  And its your own fault if you’re not.

Caveats:  Who has time to click click click every point in every frame to do this?

Recommendation:  Manual trajectorization should be reserved for extremely difficult small sections of mocap, and for sparse seeding of the data with factual information.  Confidence in a manual trajectory should be extremely high however.

Labeling enforces Trajectorization

Theory:  If the labeling of two points says they’re the same label, then they should be part of the same trajectory.

Caveats:  Better hope that labeling is right.

Recommendation:  We’re about to get into labeling in a bit.  So you might think of this as a bit of a circular argument.  The points are not labeled yet.  And they’re trajectorized before we get to labeling.  So its too late right?  Or too early?  Not necessarily.  I can only really speak for Kinearx here, not Arena or IQ.  However, Kinearx will approach the labeling and trajectorization problems in parallel.  So in a robust pipeline, there will be labeling data and trajectorization data available.  The deeper into the pipeline, the more data will be available.  So, assuming you limit a trajectorization decision to labeling data that is highly trusted, this technique can also be highly trusted.

Trajectorization enforces Labeling

Theory: If a string of points in time are trajectorized, and one of those points are labeled, all the points in the trajectory can be labeled the same.

Caveats: Better hope that trajectorization is right.

Recommendation:  Similar to the previous technique, this one is based on execution order.  IQ uses this very clearly.  You can see it operate when you start manually labeling trajectories. The degree to which Arena uses it is unknown, but I suspect its in there.  Kinearx will make this part of its parallel solving system.  It will also likely split trajectories based on labeling, if conflicting labels exist on a single trajectory.  I prefer to rely on this quite a bit.  I prefer to spot label the data with highly trusted labeling techniques, erring on the side of not labeling if you’re not sure, and have this technique fill in the blanks.

Manaual Labeling

Theory: You, the human, can do the work yourself.  You are trustworthy.  And its your own fault if you’re not.

Caveats:  Who has time to click click click every point in every frame to do this?

Recommendation:  Manual labeling should be reserved for extremely difficult sections of mocap, and for sparse seeding of the data with factual information.  Confidence in a manual label should be extremely high however.  When I use IQ, I take an iterative approach to the process and have the system do an automatic labeling pass, to see where its having trouble on its own.  I then step back to before the automatic labeling pass and seed the trouble areas with some manual labeling.  Then I save and set off the automatic labeling again.  Iterating this process, adding more manual labeling data, eventually results in a mostly correct solve.  Kinearx will make sure to allow a similar workflow, as I’ve found it to be the most reliable to date.

Simple Rigid Body Distance Based Labeling

Theory:  If you kn
ow a certain number of markers to move together because they are attached to the same object, you can inform the system of that fact.  It can measure their distances from one another (calibrate the rigid body) and then use that information to identify them on subsequent frames.

Caveats:  Isosceles triangles and equilateral triangles cause issues here.  There is a lot of inaccuracy and noise involved in optical mocap and therefore, the distances between markers will vary to a point.  When it comes to the human body, there is a lot of give and stretch.  Even though you might want to treat the forearm as a single rigid body, the fact is, it twists along its length and markers spread out over the forearm will move relative to one another.

Recommendation:  This is still the single best hope for automatic marker recognition.  When putting markers on objects, its important to play to the strengths and weaknesses of this technique.  So, make sure you vary the distances between markers.  Avoid making equilateral and isosceles  triangles with your markers.  Always look for a scalene triangle setup.  When markering similar or identical objects, make sure to vary the marker locations so they can be individually identified by the system (this includes left and right sides of the human body).  If this is difficult, consider adding an additional superfluous marker on the objects in a different location on each, simply for identification purposes.  On deforming objects (such as the human body), try to keep the markers in an area with less deformation (closer to bone and farther from flesh).  Make good use of slack factors to forgive deformation and inaccuracy.  Know the resolution of your volume.  Don’t place markers so close that your volume resolution will get in the way of an accurate identification.

Articulated Rigid Body Distance and Range of Motion Based Labeling

Theory:  This is an expansion of the previous technique, to include the concept of connected, jointed or articulated rigid body systems.  If two rigids are connected by a joint (humerus to radius in a human arm for example) the joint location can be considered an extra temporary marker for distance based identification on either rigid.  Therefore, if one rigid is labeled enough to find the location of the joint, the joint can be used to help label the other rigid.  Furthermore, information regarding the range of motion of the joint can help cull mis identifications.

Caveats:  Its possible that the limits on a joint’s rotation could be too restricting compared with the reality of the subject, and cull valid labels.

Recommendation:  This is perhaps the most powerful technique of all.  Its nonlinear and therefore somewhat recursive in nature.  However, most importantly, it has a concept of structure and pose and therefore can be a lot more intelligent about what its doing that other more generic methods.  It wont help you track a bunch of marbles or a swarm of ants, but anything that can be abstracted to an articulated jointed system (most things you’d want to mocap) are greatly assisted by this technique.  You can also go so far as to check the pose of the system from previous frames against the current solution to throw out labeling that would create too much discontinuity from frame to frame.

Conclusion

These techniques get you what you need to trajectorize and label your data.  However, there are plenty of places to go from here.  These steps serve multiple purposes.  They’ll be executed for realtime feedback.  They’ll be the first steps in a cleanup process.  They may be used and their results exported to a 3rd party app such as motion builder.  Later steps may include:

  • more cleanup
  • export
  • tracking of skeletons and rigids
  • retargeting
  • motion editing

IQ, Arena, Blade and Kinearx may or may not support all of those paths.  For example, currently, Arena will allow more cleanup.  It will track skeletons and rigids.  It will stream data into motion builder.  It will export data to motion builder.  It will not regarget.  It will not get into motion editing.  Motiobuilder can retarget and motion edit, and it also has some cleanup functionality.  IQ will allow more cleanup, export and tracking.  It does not perform retargeting or motion editing.  Blade supports all of this.  Kinearx will likely support some retargeting but will stay clear of too much motion editing in favor of a separate product that will be integrated into an animator’s favorite 3d package (Maya or XSI for example).

The next topic will likely be tracking of skeletons and rigids.  You might notice that we’ve kind of gotten into this a bit with the labeling of articulated rigid systems.  And you’d be correct in making that identification. A lot of code would be shared between the labeler and the tracker.  However, whats best for labeling may not be best for tracking.  So the implementation is usually different at a higher level because the goals are different. 

How Moap Works: Markers and Retroreflectivity

The NaturalPoint cameras as well as your typical Vicon and Motion Analysis systems are what are known as Optical Motion Capture Systems.  More specifically, in their more common configuration, they’re Retroreflective Optical Motion Capture Systems.  Though, they can also be configured as active marker systems as well.  Its just less common.

Diffuse Bounce, Reflectivity and Retroreflectivity

Wikipedia has a page on these different types of reflected light (doesn’t it always?).  However, its a bit dense.  I’ll summarize and provide context.

There are plenty of potential light sources in your mocap space.  It can come through a window.  It can come from light bulbs.  It can come from the LED ring around the lense of your cameras.  When light hits the surface of an object, you tend to think about it as a whole bunch of individual rays generally coming from the same direction if it comes from a single light source, and generally having the same angle (orientation).  Anyhow, when the light strikes the surface, lots of different things happen to it.  For example, some of the light can be absorbed.  The resulting energy needs to go somewhere and can become heat, light, electricity etc.  This is how most pigments work.  Most of the light is usually not absorbed however.  Its either reflected or refracted.  A simplified explanation of refracted light, is that it passes through the object, like say, glass.  Reflected light however, is what we’re more concerned with.

Simple reflection or specular reflection, is what you find in a mirror.  The light ray bounces off a surface as per the law of reflectance.  More important than any one ray following the law of reflectance, in a material that has high specularity, most if not all the rays follow the law and end up having a similar angle after being reflected.  Hence an image as seen in a highly specular material maintains its general appearance.  It doesn’t blur or distort beyond recognition.  This is true of a mirror as an extreme example.  Its also true of say, car paint.  You can see things reflected in car paint and as such, it can be said that a significant number of light rays hitting car paint exhibit a tight specular reflection.  Or you could say car paint has high specularity (not as high as a mirror).

Diffuse bounce light is another form of reflection.  Diffuse bounce light is the light that you see when looking at a matte object, such as say, concrete or paper.  In the case of diffuse light, the incoming rays still respect the law of reflectance.  However, the material is rough enough, that its highly faceted at a microscopic level.  That is to say, at any given point on the surface, its orientation or surface normal is somewhat random.  So while individual rays reflect, as a whole, they scatter all over the place because the material doesn’t exhibit a single smooth uniform surface for all the rays to bounce the same direction off of.  The appearance and general characteristics of such a surface can generally be predicted through Lambert’s Cosine law.  Hence, why in 3d animation, we’re often applying "Lambert" shaders to objects for their diffuse component.  Diffuse bounce light makes up the majority of light you see when looking at objects in our world.  Anything that’s sorta matte finish, is putting out a lot more diffuse bounced light than other types of light.

Retroreflected light is light that manages to reflect directly back at the light source.  Retroreflection doesn’t usually happen naturally all that much.  However, it is incredibly useful for optical motion capture and safety.  "Reflective" paint on the road at night, and roadsigns are examples of man made retroreflective materials used for safety. Also, those strips of "reflective" material you put on haloween costumes are good examples.  Notice these materials are marketed as "reflective" when in reality its not their simple reflective characteristics that make them desirable.  Its their retroreflective characteristics, a subset of reflectivity, that make them work.  Marketing often isn’t concerned with being succinct.  Technically, a roll of masking tape is reflective tape.  Its just mostly diffuse reflection is all.  And it probably wont alert anyone driving a car as to its presence.

What does this have to do with Mocap?

So, how do we use this knowledge to get our mocap cameras to see markers and nothing else?  Hence making the task of tracking those markers easier?  Well, its generally a matter of contrast.  If you can make your markers brighter than anything else in the frame, you can adjust your exposure and threshold the image to knock everything else out of contention, leaving you with a mostly black image, with little gray and white dots that are your markers.

Its probably worth noting that this is not the only way to accomplish the task of tracking markers.  Another approach would be pattern recognition.  A system based on pattern recognition would probably count as an optical mocap system but doesn’t fall into the historical category of an optical system as used in the entertainment industry.

Anyhow, back to contrast.  The task of making your markers brighter than everything else.  Simple specular reflectivity makes some pretty bright highlights. You could theoretically conceive of a scenario where you know where your light source is and if you catch a reflection in a marker in a camera, you could solve for the marker.  In reality though, this isn’t useful.  Its rare that you’ll catch a reflection of a light source in a camera.  You’d need way too many light sources to make it common enough to use.  Its possible you could take this to an extreme and set up a colored dome and then use the color of the dome reflected in a marker to track the ray back to its source location, but again, this is speculative and the kind of setup you’d need to do is is expensive and quite disruptive on the shooting environment.  Remember, one of the goals of viable mocap systems is to be able to be used in parallel with principal photography on a movie set.

Diffuse light is potentially useful.  However, fact of the matter is, most things are fairly diffuse.  Things that are white, or light gray are highly diffuse.  A diffuse object can only put out as much light as it takes in.  Its not possible to be SO much more efficient than a white piece of paper.  So instead, approaches to using diffuse light to generate contrast go the other direction.  You try to make everything in your environment matte black (full absorption, no diffuse bounce).  That way, your markers show up bright by contrast.  Again, this solution isn’t ideal.  The room, the cameras, the people, everything but the markres must be matte black to get contrast this way.

As you might imagine, the solution here is retroreflection.  Again, retroreflection is light that reflects back at the light source.  So its super bright like specular reflection, but unlike specular reflection, its easy to pick up.  You know exactly where its going, right back to the source.  All you need to do is make sure your light source is also your camera lense (or close enough).  This is of course, why NaturalPoint cameras and optical mocap cameras in general, tend to have LED rings around the lense.  NP camera LEDs show up a dull pink when they’re active but don’t let this fool you.  They are actually putting out a ton of light.  Its just infrared… about 850 nanometers in wavelength.  According to Jim Richardson, the CMOS sensors in the cameras are actually more responsive to visible light than IR.  However, IR l
ight is usually used in mocap because a) we can’t see it, so it doesn’t distract us.  b) motion picture film and video cameras already filter it out because they are mimicking our own visual response.  This way, the mocap system’s lighting doesn’t interfere with human vision based imaging.

Markers

If you’ve got your light source and camera all set up to pick up retroreflective light, then all thats left to do is make sure your marker actually is retroreflective.  There are typically two ways this is done by contemporary humans.

Firstly, we can use "corner reflectors."  An example of a corner reflector is a bicycle reflector.  Corner reflectors are made by butting three mirrors together at right angles.  A bicycle reflector often has hundreds of little mirrors set up in triplets in this manner.  Believe it or not, this does actually work.  I have to cover up my bicycle all the time when I use cameras in my apartment.  I have looked into getting a bunch of small 1" bicycle reflectors to use as markers and in some situations, they may actually be useful.  Though, there are better solutions.

The second retroreflective material is whats known as 3m scotchlite.  Pretty much any retroreflective material you can think of besides corner reflectors comes back to 3m and scotchlite.  Even those reflective paints on the road are made with materials bought from 3m.  I have a can of "reflective" spray paint from Rustoleum.  They bought their materials from 3m.  Scotchlite is based on glass beads and can be bought in many forms, from raw beads (sand like) to textiles to tapes to paints.  Scotchlite comes in different grades and colors.  Generally though, the best retroreflectivity comes from scotchlite products in which the beads have been bonded to a material by 3m, rather than bonding done by other parties.  So, buying 3m tape or textile is your best bet for mocap.  The material that NaturalPoint sells in their own store is actually the highest quality material I’ve come across.  Markers built from that material perform better than some of the "hard" markers in their store, that clearly had the material sprayed on by a 3rd party.

Emissive Markers

You may have noticed that to this point, we’ve been talking about generating contrast on materials that are bouncing light from a separate light source.  However, its possible that a marker could emit its own light.  Generally, these types of markers are known as active markers.  I have actually constructed active markers in the past and will probably do so again within the year.  NaturalPoint actually sells wide throw 850nm LEDs in their store for this kind of application.  Mocap systems by PhaseSpace also work off of active LED markers.  Active markers have benefit and detriment.  They often put out a lot more light than a retroreflective maker will and therefore are really easy to track.  They are however, expensive, and they do require mounting electronics on your mocap talent.  This can be problematic in some cases. In some cases, they heat up quite a bit, though this problem can be designed away.

Hopefully some of this has helped give an understanding of what is going on in your mocap volume.  You can use this information to help get better quality captures.  Throwing your cameras into grayscale mode and looking at the enivironment as the camera sees it,  will let you see these concepts in action.  It should also give you a better idea of how to go about optimizing your mocap environment and exposure settings for capture. 

How Mocap Works: Reconstruction

I’ll start this series by jumping to the middle.  Makes sense right?  Believe me, this is the easy part. All the other parts will make more sense when they are looked at relative to this part.

What is reconstruction?

Different mocap systems define this differently.  I’m going to define it as the task of taking the 2d data from a calibrated multi-camera system and making it 3d.  This is pretty analogous to what the OptiTrack point cloud SDK does.  I’m going to skip calibration for this blog entry.  You can assume that we have a set of cameras and we know where they are in 3d space.  We know their orientation and we know their lensing.  I’m also going to skip the 2d aspects of the process.  You can assume that each camera is providing us with multiple objects (I call them blips) and most importantly, their position on the camera’s backplane (where they are in the camera’s view in 2d).  There is one other thing you must assume.  You must assume that there is error in all those values.  Not just the blips, but also the camera positions orientations and lense data. We hope they’re all close, but they’re not perfect.  Reconstruction is the task of taking that data, and turning it into 3d points in space.

Rays

Well, the simplest way to think about it is as follows.  Each blip can be projected out into space from the camera’s nodal point (center of the virtual lense) based on where it is in 2d on that camera’s backplane (virtual imager).  Shooting a point out is generally referred to as shooting a ray or vector.  Any two rays from two separate cameras that intersect are likely to be intersecting at a point in 3d space where there is an actual marker.  This is the reverse process of the cameras seeing the real marker in the first place.  Rays of light bounce from the marker though the lense nodal point and onto the backplane imager where they are encoded and sent to the computer (thats a little oversimplified but you get the idea).  If a third ray intersects as well, its FAR more likely to be a marker than a coincidence (you’d be surprised how often you end up with coincidences running at 100fps+).  So, while you can reconstruct a 3d point from as little as two rays, if you have enough cameras to spend on verification, you’ll get less fake markers by requiring that 3 or more rays agree.

This is often referred to as Triangulation

Its probably worth noting that this is not the typical triangulation you’ll use when say, calculating the epicenter of an earthquake by knowing its distance from known points.  Its a different type of triangulation or should I say, a different subset of triangulation operations. 

Residuals 

Sorry, did I say those rays intersected?  That was a mistake.  The rays never intersect.  They just pass closely together.  See, that error I was talking about gets in the way.  So what your typical mocap app will do, is deal with residuals to basically say "its close enough."  When you give a NaturalPoint system a residual for its point cloud reconstruction, you are telling it that rays that pass within a distance below the residual, should be considered having intersected where the residual is the lowest.  A high residual, could suck discreet markers together into one larger marker if they are close enough.  A low residual could keep rays from intersecting and result in low marker counts per frame.  You’ll want to balance your residual against the overall accuracy in your volume.  You can get an idea of the accuracy of your volume by looking at the residuals that it gives you at the end of the calibration process.  Or, you can just mess around with it.  You’ll also want to pay attention to the units.  Different systems measure residuals in different units.  Meters, Centimeters, Millimeters etc.

Angles

There are other factors that play into the accuracy of a reconstruction.  If two rays have a similar angle (they are more parallel then perpendicular) the accuracy of their reconstruction goes down significantly.  Its harder to determine accurately at what distance they intersect, as a little inaccuracy in the angles translates to a potentially long distance.  Most of the inaccuracy plays into the depth axis.  If you have rays that are more perpendicular, their inaccuracy is spread evenly along all three axis of potential movement, rather than the one depth axis.  Therefore, most NaturalPoint reconstruction parameters include a threshold for the minimum angle between rays.  Rays that intersect but are closer than the minimum angle, are ignored.  The units are important here as well.  I believe they tend to be in radians rather than degrees.

Min and Max Distance 

These are simple.  After a point has been reconstructed, it can be tossed due to its distance from the cameras from which they have been cast.  One really good reason for this, is that the light sources on the cameras can flare out objects that are too close, generating WAY too many phantom blips.  Ignoring blips that are reconstruct so close is a safe bet.  Likewise, throwing out markers that reconstruct far into the distance is also safe, though often not needed.

Hopefully, these basic concepts help explain what is gonig on inside the black box.  This should give you plenty of concepts with which to start working with camera placement to get better performance out of an optical mocap system. An obvioues freebie would be:  don’t place cameras too close together that are meant to cover the same space.  The angle between rays seeing the same object will be too low to get an accurate reconstruction.

How Mocap Works Series

I’m going to write a series of blog entries about optical motion capture and how it works.  Knowing whats going on inside a mocap system can help an operator better utilize it.  The series will focus on NaturalPoint’s OptiTrack cameras and systems, with references to other mocap systems and ideas.  It will also occasionally diverge into descriptions of Kinearx because my mind is a bit jumbled.  Sorry.

WPF 3D’s Wrong Direction

So there’s this new kid on the UI block.  Used to be called Avalon.  Now its called WPF.  Its probably the most unsung and misunderstood part of Vista by the general public and programmers alike.  Its actually pretty awesome.  I’ll explain what it is and how I use it in a bit.  There’s a problem though.  WPF is the next generation UI for windows, and as such, it seeks to integrate 3d as a first class citizen into the UI environment.  It fails miserably and I’ll give an in depth explanation and suggestions as to what needs to change IMHO.

What is WPF and XAML?

WPF is the brand spankin new UI system in Vista, that has apparently been backported to XP as well.  Its part of the often misunderstood .net framework.  Make no mistake though, WPF is meant to replace the venerable win32 forms GUI that we’ve been living with in some form or another since before time (read: windows95 and a little earlier in some form or another).  WPF is the future of windows programming.

Rather than just being a revision, WPF is a new approach to GUI construction.  Its based on a dependency graph model.   People who are familiar with Maya or XSI are familiar with dependency graphs, as 3d scenes in those programs are actually large dependency graphs.  Basically you build your gui as a dependency graph and you connect it to your application data which is also best done as a dependency graph.  What this means, is that your GUI is not really hardcoded.  Its authored like you would author an illustrator graphic or 3d scene.  It just needs to know how to connect itself to your app’s data and it kind of runs on its own.  It will then proceed to display itself based on your data, and if you authored it to, change your data.

XAML is the XML based file format in which you can author WPF dependency graphs (read UIs and UI pieces).

Its all about connections.  You define data sources and UI widgets and you connect them all up.  And when you run it, it just works.  Its great.

So what’s wrong with 3D?

There’s nothing wrong with adding 3d to this mix.  In fact, its a great idea.  Its HOW they did it that bothers me.  One of the cardinal rules of 3d programming is that 3d is usually not any harder than 2d.  Its just one more dimension.  So if you can solve a problem in 2d, you can usually adapt that solution to 3d by adding an extra dimension.  This can be as simple as adding an extra parameter, or one more layer of nested looping.  You name it.

So, lets take an example usage scenario.  Lets say that I’ve got a series of 2d points (unknown quantity) that I need displayed as dots on a black rectangle.  Oh I dunno… maybe its 2d blips coming from my mocap cameras.  Seems like a reasonable scenario Wink.  These points will be replaced with a new collection of points say, 100 times a second, so this is essentially a real time data flow.

WPF makes this easy with what is known as a ItemsControl.  You tell the ItemsControl where to find the collection of Point objects and it in turn, keeps an eye on that collection.  When the collection changes, it rebuilds part of the dependency graph.  What it does, is take a template for a panel and adds children to it.  Those children are also built from a template. Each of the children is given a Point from the collection as their data source.  So, if I make the panel a simple canvas panel (Cartesian coordinate based layout panel) and I make each item an ellipse that gets its coordinate from the point, I’ve got my 2d mocap viewer.

So thats great.  It works.  Not only does it draw them, but each of them is their own control and can respond to clicks and events and all kinds of wonderful things.  Go WPF!

But that was just the 2d viewer for camera data.  Lets build us a 3d viewer for the point cloud.  That should be no problem right?  WPF does 3d!  In fact, without knowing anything about WPF, you should already sort of know how this should work. I should have a panel thats a 3d canvas (3d Cartesian tri ordinate space) and an ItemsControl that puts some templated 3d controls into it.  Right?  OOPS!  WRONG

See, the problem is, 3D objects don’t inherit from Controls.  There are no Panels either.  You lay down a control that is a 3d view port, and then everything going on inside is part of a completely different set of objects that don’t interact with the 2d control object model at all unless its hardcoded in object by object.  You can’t template these 3d objects for inclusion into other 3d objects at all.  This is a HUGE problem.

If you look around at every 3d WPF demo around, you’ll see them all skirting over a very important issue.  They NEVER create a instance of a 3d object for each member of a collection.  They’ll hardcode the number of objects.  They’ll do the work in 2d and then render that 2d image onto a 3d object as a texture.  They’ll manipulate the vertex buffers of a single 3d object based on a collection.  Anything but actually spawn a true 3d object for each member of a collection.  And don’t get me started about the layout of these controls.  Its the same problem.

What they’ve done, is created a 3d engine inside of WPF, that can have static data defined by XAML and can have its parameters bound to dependency objects, but they didn’t actually extend the WPF into 3d.

Its actually a mistake that has been made before.  Back in the day, Macromedia had a product called Director.  Director owned the "multi-media" market for a while, until the web and flash took over.  In Director’s hayday, they wanted to add 3d to it.  Director had an extremely rich UI that allowed the author to manipulate controls (sprites) on a stage (panel) and control them via animation and scripting.  So you’d think the obvious direction to go, would be to expand the stage, to be able to hold 3d objects and textures and cameras and lights along with 2d sprites.  And then extend the animation keyframing environment to be able to keyframe in 3d.  And then make sure the scripting environment had the same control over the new objects as it did the 2d objects.  That’s not what they did.

Instead, they made one new type of 2d sprite that was a 3d engine.  There was no way to get inside the 3d engine except via scripting.  They made a valiant effort to fake it by authoring a lot of "behaviors" which are pre packaged scripts you can apply to sprites.  But that never made 3d break out of its sprite and onto the stage.  Director still exists… its just that its market has shrunk to near nothing compared to its potential.

This is EXACTLY the mistake Microsoft seems to have made with WPF’s 3d.  They put 3d in a black box as an engine rather than breaking open the box and making it part of the actual WPF controls API.  In so donig, all they’ve really done is created yet another 3d programming API thats in competition with DirectX and OpenGL when they should have been creating a 3d UI API. 

So what should they do?

Well, they should start over.  Not WPF, just the 3d part, which happens to be relegated to a separate namespace anyway.  The existing mess can continue to exist for all I care just as long as they make a new namespace and actually extend WPF into 3d rather than blackboxing 3d into a control in WPF.  Its really quite obvius.  Subclass Panel (or refactor panel to a higher level interface that you can implement) into a Panel3D.  Subclass Control (or refactor etc.) to Control3D.  And move in from there.  Panels and Controls work together to create layout and the rest is simple.  Ju
st make sure the ItemsControl works with it and templates work with it.  Just as in normal 2d WPF, Panels are not all cartesian based.  My favorite, the DockPanel can easily be expanded into 3d based on bounding boxes and scales for example.  Make sure NOT to seal the Panel3D, so developers can create their own layouts for say… spinning cylindrical layout etc.

What about your 3d point cloud viewer?

Luckily, my UI is a prototype UI meant to be replaced.  And luckily WPF’s overall design lets me replace the UI with very little effort.  So what I did for now, is write a 3d object that takes a collection of markers as a binding.  It then goes through its vertex buffer and moves a bunch of verticies around to make a bunch of sphere shapes (read: vertex animation).  Unfortunately, since these are not independent objects, they don’t have individual color or ability to detect mouse clicks.  They’re all one big mesh.  Its good enough for now.  When it ceases to be good enough, I may have to break out of WPF and write a pure DirectX viewer, which would be a shame since its clear WPF is supposed to be meant for this kind of thing. 

Kinearx is Coming: Humble Processing

One of the design departures that separates Kinearx from the competition is its approach to processing.  I’ve seen a number of motion capture solutions in multiple software packages and I’ve identified what I believe to be the single trait that holds most of them back: Arrogance.  Don’t get me wrong, I’m pretty arrogant myself Smile  I’m talking about a very specific kind of arrogance.  Its arrogant to assume that one’s algorithm is going to be so good, that it will be able to make sense of mocap data on its own, in a single pass, and be right about it.

Now you might be thinking, "Brad, thats silly.  All these programs let you go back and edit their decisions.  They all let you manually fix marker swaps and such.  They’re not assuming anything.  You’re blowing things out of proportion."  Ah, but then I ask you, why should an algorithm make a decision at all?  Why should you need to fix a marker swap that the algorithm has put into reality?

Kinearx approaches the way mocap data is processed in a way I would term a "humbled" approach.  Kinearx knows what it doesn’t know.  The design acknowledges that everything is a guess and its completely willing to give up on assumptions, should evidence point to the contrary.  The basic data structure that operators work on is one of recommendations, statistics and heuristics, rather than one of "the current state of the data. and what I’m going to change about it."  A typical labeling process can consist of running some highly trusted heuristic  algorithms  that make recommendations on labeling at points of high confidence.  It can also consist of applying less trusted heuristics that are wider in temporal scope.  The recommendations are weighted accordingly when they are eventually committed to the data as decisions.  Algorithms can peek at the existing recommendations to hint them along.  Manual labeling operations an be added to the list of recommendations as having extremely high confidence.  Algorithms can even go as far as to cull recommendations.  The difference between Kinearx and other mocap apps, is that this recommendation data lives from operation to operation.  It lives through manual intervention and as such, is open to being manipulated by the user, either procedurally or manually.

The power of this system will become apparent when looking at the pipeline system, which allows a streamlined processing environment in which to apply processing operations to data both procedurally and manually. 

Kinearx is Coming: Influences

Kinearx will be FIE’s flagship motion capture software offering.  Its currently in development and not quite ready for show.  However, it is taking a strong shape and is becoming functional, and I’d like to explain the goals and underpinnings of the system.  So, I’ll start with a blog entry about influences.

Software isn’t created in a vacuum of knowledge.  In this period of particularly vicious intellectual property warfare, it might even be dangerous to acknowledge any influence whatsoever.  That would not sit right be me in the long run however.  Also, I think acknowledging and explaining influences can keep design and goals on track.  So here they are in no particular order:

  • Natural Point‘s commodtization of the Motion Capture hardware market:  The commoditization of the hardware means one of two things.  Either a wide portion of the consumer market will have NP MC hardware, or a wide portion of the consumer market will have some form of competing commodity MC hardware.  Either way, the market will grow.  And more importantly, the size of the new market will so overshadow the old market, that previously existing systems will become mostly irrelevant.  So it will basically wipe the slate clean.  This means Kinearx is in a good position to come in and take a large chunk of the new commodity MC software market that will accompany the hardware market.  Its important not to lose focus on this target market.
  • Vicon IQ’s generalization and configurability:  IQ doesn’t recognize humans, it recognizes kinematic mechanical models.  It then has useful templates of those models for humans. 
    Generally, every operation exposes just about every parameter you could imagine, and many you just don’t want to know about.  It provides reasonable defaults.
  • Vicon IQ’s flexible processing:  While I think there are some serious flaws in their design, the overall concept of the post processing pipeline is a good one.  IQ stops short where it should expand further.
  • Vicon Blade:  Blade is a combination of IQ and HOM Diva.  The Diva half of it, I’m sure is fine.  The IQ half of it is really disappointing.  In combining it into Blade, they’ve lost sight of what was good about it, and in turn lost functionality.  I’ll be using IQ on Vicon system sessions for some time to come as I suspect a large portion of their existing customer base will be.
  • Giant‘s Autonomous Reliability: Giant’s software systems are proprietary, so you’d have to spend some time on set with them to observe this.  I had the opportunity to spend weeks at a time on The Incredible Hulk with one of their crews/systems.  The specifics of the software aside, the fact of the matter is, they were able to generate extremely clean data from arbitrarily complex takes overnight with minimal human intervention.  What does this look like?  Well, from what I could tell, the operator would scroll through the take, letting the system try to ID the markers (which it kinda did right on its own most of the time anyway).  When they found a frame that it got right, they marked it, and sent it off for batch processing.  The system took care of the rest.  I’d come back in the morning and they’d offload a day’s worth of cleaned up mocap directly to my hard drive.  Were there points where it interpolated the fingers?  Sure.  Did it matter?  No.  To put this in perspective, it was able to do this with two guys freestyle sparring roman greco wrestling style.  So you know, rolling around on the floor grappling.
  • Softimage XSI user interface: XSI has problems.  Believe me, I can list them.  But their UI is something special.
  • Motion Builder‘s Ubiquity:  Motion Builder is an amazing piece of software, but its meant to come into the mocap process too late and isn’t really trying to solve the hard problems early in the process.  Optical cleanup is missign half the tools it should have.  However, in the industry, Motion Builder is synonymous with motion capture.  Unless you are having your mocap vendor do all the cleanup and re targeting, you kind of need it (really big jobs can license software from Giant in some circumstances and Blade seeks to take MB’s place but those solutions have not solidified as general solutions just yet).
  • Motion Builder’s Bad Reputation:  Fact of the matter is, not many people like motion builder.  When I mention it, I get snears.  I’ve had a lot of trouble reconciling the volume of sneers I get from otherwise reasonable professionals, with the software and its production capabilities that I’ve come to know and rely on.  It is my opinion, that MB’s bad reputation comes from the complete and utter lack of production worthy training and demo material.  No one knows how to use it or what is actually in there.  Its all rather well hidden and unless someone trains you and shows you through it, you wont know its there.  Also, at this point it should be a no brainer to plug MB directly into a Maya character animation job.  However, I had to develop a ton of custom tools and workflows to accomplish it for PLF.  Add that to the fact that the 3d industry tends to be more religious and less scientific about its tools and techniques, and well, you end up in a bad spot.