Stereoscopic Imaging Device

Stereoscopic Imaging utilizes the same properties as the Human Visual System to enable depth perception and add dimension to the world. This appearance of the environment is accomplished by taking two offset image feeds captured by the eyes and combining them into a three dimensional visualization within the brain. The effect can be seen by observing an object with one eye closed and noticing the addition of the various elements when the other eye is opened. Without two image inputs the output is only a two dimensional view like a photograph taken with a camera. As technology rapidly advances and the cutting edge of computing pushes the limits of the imagination there will be a need to utilize the properties of the human visual system in a digital form. It can be employed in a number of potential applications, ranging from vision for artificially intelligent robots to three dimensional mapping and modeling of space. The foundation of this technology is based heavily in trigonometric equations and the practical application of imaging systems.

Prototype Design and Fabrication

The first task in the prototype design process was deciding on a camera which would not only provide adequate resolution and quality but would also be cost effective. For this project two Microsoft VX-3000 cameras were used, these webcams have a native resolution of 640 x 480 pixels. The sensor inside these devices is the OmniVision 7663 CMOS imaging chip. It has an active area of 2.76mm x 2.05mm with a pixel pitch of 4.2μm. This model was chosen because it provided adequate output for image processing and was easy to deconstruct for mounting purposes.

Since the device would be used repeatedly to image, measure, and map three dimensional spaces the separation of the cameras must remain constant. This distance once applied in the image analysis software would have to remain static for the program to calculate accurately. A mount was designed to maintain the static separation which enabled the stereoscopic images to be captured consistently. To maintain the geometry of the system and enable measurements to be taken the optical axis of the two cameras must be parallel with respect to each other.

The first version was a lightweight piece of aluminum angle bar and the webcams were left inside the webcam case. This design was extremely fragile and had a significant amount of movement which resulted in a shift of the optical axis of the two cameras making measurements inaccurate and inconsistent.

This design failure gave insight of the need for an extremely rigid camera mount to ensure that variation in the system was minimized. The second design used a wider and thicker bar of aluminum which was more solid. In addition the cameras were removed from their plastic housings so that only the circuit boards and USB connections remained. The cameras were then attached using 2-56 threaded bolts. To attach the cameras a 1/16” hole was drilled to line up with the existing mounting holes on the camera circuit board, which has to be enlarged slightly so the hardware would fit. Having the camera attached in this manner allowed for fine tuning adjustments to be made so that the optical axis would line up correctly.

The next modification to the camera mounting design involved adding a hinge to the midpoint of the aluminum bar. This allowed for the two cameras to pivot at the same distance around a central point. This was done so that the separation angle between the cameras could be modified so that close objects could be imaged with a significant amount of parallax. This change resulted in the system becoming inconsistent, requiring that a holder be fabricated which allowed the mount to be placed at known angles. This allowed for an accurate and measured angle between the range of 180 to 140 to be selected by the user.

This device can be attached to a tripod and set to a height which places the optical axis at the desired location for measurement. For proper functionality the device must be attached to a computer via USB which has IDL programming language installed and the program compiled and running.

Image Analysis Program

The first step in designing an application for image analysis is to determine the necessary functionality and the desired user interaction of the program. The easier and more intuitive a software application is the more likely it will be adopted by users.

As seen in Figure 7. there are two windows which hold the images to be analyzed. Both windows have a browse button which allows the user to search via the native operating system file management system for the desired file. When the correct image is selected for both the left and right image they are displayed when the user selects the display button located underneath the image window.

After the left and right images are selected the user then chooses which angle was used during capture. This applies the base geometry used to take the pictures so the program calculates accordingly, the different options allow for many different imaging situations to be used if desired. The user then selects the same object point in both the left and right images which define that point in space as the target for measurement.

When the calculate button is selected the program determines the size of the images used and normalizes the data so that the center pixel of the two images becomes the relative origin respectively. Then from these two dimensional Cartesian planes the x and y coordinates from the left and right images are converted to polar coordinates. In polar format the selected point, the left relative origin, and the right relative origin form a triangle in absolute space. The absolute origin lies at the midpoint between the left and right optical axis, which theoretically holds constant as the all the points remain parallel as they approach infinity. With only the angles and length in pixels derived from the x and y coordinates in respect to the positive x axis; along with the known distance between the two webcams the distance from the imaging device can be calculated.

This employs the concept of angular subtense which is how the human visual system perceives the surroundings. To determine the length in pixels from the selected object point to the absolute origin the Law of Cosines was used. Since the absolute origin was set between the two optical axis, the midpoint of that line to the object point would yield the absolute distance from the object point to the absolute origin. This was accomplished in two calculations, since when bisected the two triangles created shared a common side. First, the shared side from the left image triangle and the right image triangle were solved for by setting both Law of Cosine calculations equal to each other, shown in equation 1.

XAbsOrigintoOpticalAxis = (XLeft ^2 - XRight^2) / (-2* XRight * COS(ΘRight) + 2* XLeft * COS(ΘLeft))       (1)

Then using the answer calculated in equation 1 it was inserted into the Law of Cosines of each side individually then averaged for an accurate result for the object point to absolute origin distance, shown in equations 2a, 2b, and 2c.

XAbsOrigintoObjectPointLeft = SQRT(XAOtoOA^2 + XLeft^2 - 2* XAOtoOA * XLeft * COS(ΘLeft))       (2a)
XAbsOrigintoObjectPointRight = SQRT(XAOtoOA^2 + XRight^2 - 2* XAOtoOA * XRight * COS(ΘRight)       (2b)
XAbsOrigintoObjectPoint = (XAbsOrigintoObjectPointLeft + XAbsOrigintoObjectPointRight) / 2       (2c)

After the true distance in pixels was determined then it was necessary to calculate the number of pixels which comprised an angular subtense of one degree. This could be done by calculating the diagonal distance of the image plane and dividing it by the published field of view of fifty-five degrees diagonally, shown in equation 3.

NPixelsperOneDegree = SQRT(NHorizontalPixels^2 + NVerticalPixels^2) / ΘCameraFieldofView       (3)

Next using the pixel distance calculated in equation 1 and the actual distance between the cameras the millimeter length per pixel was calculated, equation 4.

ΘAbsOrigintoObjectPoint = XAbsOrigintoObjectPoint / NPixelsperOneDegree        (4)

Using the millimeter distance of a pixel multiplied by the pixel distance between the object point and the absolute origin the actual distance from the absolute origin to the object point was calculated, shown in equation 5.

XAbsOrigintoObjectPoint = XAbsOrigintoObjectPoint * (XCameratoCamera / (2 * XAOtoOA))       (5)

To solve for the distance from the center of the stereoscopic imaging device to the object point in space, the angular subtense from equation 3 and the actual distance from equation 5 were used.

XCameraToObject = XObjectToAbsOrigin / TAN(ΘObjectToAbsOrigin)       (6)

The result from equation 6 is the distance in mm, which could be used to calculate the object point location in three dimensional space, using the calculated x, y, and z values.

Initial Testing Results

The proof of concept was successful, demonstrating that with two webcams and a programming environment a digital three-dimensional stereoscopic imaging device can be created for a very low cost. The accuracy of the prototype is within a tenth of a meter and has shown accuracy in both near to far images in different quadrants of the images.

Future Work

Real Time Video/Camera Connectivity
Having real time input would enable a system like this to map a three dimensional space and allow the area to be rapidly updated. This could be used to interact with a computer as a motion tracing and gesture recognition program.

Complete System Automation
For this technology to be viable it needs to run all the calculations initiated simply by a calculate button. It would provide a continuous stream of output constantly updating the end product.

Enhanced Image Processing
To achieve a fully automated system it would be necessary for the program to dynamically edge detect the objects in the scene. After finding the edges in a horizontal format and a vertical format, the vertices where object planes intersect must be recognized. These pointes would be placed inside an array as x and y coordinates then run through the image processing program. The output would be the distances away along with the [x,y,z] values with respect to the absolute origin.

Angle Functionality
The angle selection must be completed to allow for varying distances to be mapped. This has the potential to three dimensionally model small items assuming a high quality camera was used.

Copyright © 2009 - 2010 TechniCapture, Inc.