Class Note for EECS 841 with Professor Potetz at KU
Class Note for EECS 841 with Professor Potetz at KU
Popular in Course
Popular in Department
This 35 page Class Notes was uploaded by an elite notetaker on Friday February 6, 2015. The Class Notes belongs to a course at Kansas taught by a professor in Fall. Since its upload, it has received 22 views.
Reviews for Class Note for EECS 841 with Professor Potetz at KU
Report this Material
What is Karma?
Karma is the currency of StudySoup.
You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!
Date Created: 02/06/15
A Taxonomy and Evaluation of Dense TwoFrame Stereo Correspondence Algorithms Daniel Scharstein Dept of Math and Computer Science Middlebury College Middlebury VT 05753 scharmiddleburyedu Abstract Stereo matching is one of the most active research areas in computer vision While a large number of algorithms for stereo correspondence have been developed relatively lit tle work has been done on characterizing their performance In this paper we present a taxonomy of dense twoframe stereo methods Our taxonomy is designed to assess the dif ferent components and design decisions made in individual stereo algorithms Using this taxonomy we compare exist ing stereo methods and present experiments evaluating the performance of many different variants In order to estab lish a common software platform and a collection of data sets for easy evaluation we have designed a standalone exible C implementation that enables the evaluation of individual components and that can easily be extended to in clude new algorithms We have also produced several new multiframe stereo data sets with ground truth and are mak ing both the code and data sets available on the Web Finally we include a comparative evaluation of a large set of today s bestperforming stereo algorithms 1 Introduction Stereo correspondence has traditionally been and continues to be one of the most heavily investigated topics in computer vision However it is sometimes hard to gauge progress in the eld as most researchers only report qualitative results on the performance of their algorithms Furthermore a sur vey of stereo methods is long overdue with the last exhaus tive surveys dating back about a decade 7 37 25 This paper provides an update on the state of the art in the eld with particular emphasis on stereo methods that I operate on two frames under known camera geometry and 2 pro duce a dense disparity map ie a disparity estimate at each pixel Our goals are twofold 1 To provide a taxonomy of existing stereo algorithms Richard Szeliski Microsoft Research Microsoft Corporation Redmond WA 98052 szeliskimicr0s0ftc0m that allows the dissection and comparison of individual algorithm components design decisions 2 To provide a test bed for the quantitative evaluation of stereo algorithms Towards this end we are plac ing sample implementations of correspondence algo rithms along with test data and results on the Web at www middlebury edustereo We emphasize calibrated twoframe methods in order to fo cus our analysis on the essential components of stereo cor respondence However it would be relatively straightfor ward to generalize our approachto include In any multiframe methods in particular multiplebaseline stereo 85 and its planesweep generalizations 30 113 The requirement of dense output is motivated by modern applications of stereo such as view synthesis and image based rendering which require disparity estimates in all im age regions even those that are occluded or without texture Thus sparse and featurebased stereo methods are outside the scope of this paper unless they are followedby a surface tting step eg using triangulation splines or seedand grow m ethods We begin this paper with areview of the goals and scope of this study which include the need for a coherent taxonomy and a well thoughtout evaluation methodology We also review disparity space representations which play a central role in this paper In Section 3 we present our taxonomy of dense twoframe correspondence algorithms Section 4 discusses our current test bed implementation in terms of the major algorithm components their interactions and the parameters controlling their behavior Section 5 describes our evaluation methodology including the methods we used for acquiring calibrated data sets with known ground truth In Section 6 we present experiments evaluating the different algorithm components while Section 7 provides an overall comparison of 20 current stereo algorithms We conclude in Section 8 with a discussion of planned future work 2 Motivation and scope Compiling a complete survey of existing stereo methods even restricted to dense twoframe methods would be a formidable task as a large number of new methods are pub lished every year It is also arguable whether such a survey would be of much value to other stereo researchers besides being an obvious catchall reference Simply enumerating different approaches is unlikely to yield new insights Clearly a comparative evaluation is necessary to assess the performance of both established and new algorithms and to gauge the progress of the eld The publication of a simi lar study by Barron et a1 8 has had a dramatic effect on the development of optical ow algorithms Not only is the per formance of commonly used algorithms better understood by researchers but novel publications have to improve in some way on the performance of previously publishedtech niques 86 A more recent study by Mitiche and Bouthemy 78 reviews a large number of methods for image ow com putation and isolates central problems but does not provide any experimental results In stereo correspondence two previous comparative pa pers have focused on the performance of sparse feature matchers 54 19 Two recent papers 111 80 have devel oped new criteria for evaluating the performance of dense stereo matchers for imagebased rendering andtelepresence applications Our work is a continuation of the investiga tions begun by Szeliski and Zabih 116 which compared the performance of several popular algorithms but did not provide a detailed taxonomy or as complete a coverage of algorithms A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and MultiBaseline Vision 99 An evaluation of competing algorithms has limited value if each method is treated as a black box and only nal results are compared More insights can be gained by exam ining the individual components of various algorithms For example suppose a method based on global energy mini mization outperforms other methods Is the reason a better energy function or a better minimization technique Could the technique be improved by substituting different matching costs In this paper we attempt to answer such questions by providing a taxonomy of stereo algorithms The taxonomy is designed to identify the individual components and de sign decisions that go into a published algorithm We hope that the taxonomy will also serve to structure the eld and to guide researchers in the development of new and better algorithms 21 Computational theory Any vision algorithm explicitly or implicitly makes as sumptions about the physical world and the image formation process In other words it has an underlying computational theory 74 72 For example how does the algorithm mea sure the evidence that points in the two images match ie that they are projections of the same scene point One com mon assumption is that of Lambertian surfaces ie surfaces whose appearance does not vary with viewpoint Some al gorithms also model speci c kinds of camera noise or dif ferences in gain or bias Equally important are assumptions about the world or scene geometry and the visual appearance of objects Starting from the fact that the physical world consists of piecewisesmooth surfaces algorithms have builtin smoothness assumptions often implicit without which the correspondence problem would be underconstrained and ill posed Our taxonomy of stereo algorithms presented in Sec tion 3 exam ines bothmatching assumptions and smoothness assumptions in order to categorize existing stereo methods Finally most algorithms make assumptions about cam era calibration and epipolar geometry This is arguably the best understood part of stereo vision39 we therefore assume in this paper that we are given a pair of recti ed images as input Recent references on stereo camera calibration and recti cation include 130 70 131 52 39 22 Representation A critical issue in understanding an algorithm is the represen tation used internally and output externally by the algorithm Most stereo correspondence methods compute a univalued disparity function dr7 y with respect to a reference image which could be one of the input images or a cyclopian view in between some of the images Other approaches in particular multiview stereo meth ods use multivalued 113 voxelbased 101 67 34 33 24 or layerbased 125 5 representations Still other ap proaches use full 3D models such as deformable models 120 121 triangulated meshes 43 or levelset methods 38 Since our goal is to compare a large number of methods within one comm on framework we have chosen to focus on techniques that produce a univalued disparity map dr7 y as their output Central to such methods is the concept of a disparity space 17y7d The term disparity was rst intro duced in the human vision literature to describe the differ ence in location of corresponding features seen by the left and right eyes 72 Horizontal disparity is the most com monly studiedphenomenon butvertical disparity is possible if the eyes are verged In computer vision disparity is often treated as synony mous with inverse depth 20 85 More recently several re searchers have de ned disparity as a threedimensional pro jective transformation collineation or homography of 3D space X7 Y7 Z The enumeration of all possible matches in such a generalized disparity space can be easily achieved with a plane sweep algorithm 30 113 which for every disparity d projects all images onto a common plane using a perspective projection homography Note that this is different from the meaning of plane sweep in computational geometry In general we favor the more generalized interpretation of disparity since it allows the adaptation of the search space to the geometry of the input cameras 113 94 we plan to use it in future extensions of this work to multiple images Note that plane sweeps can also be generalized to other sweep surfaces such as cylinders 106 In this study however since all our images are taken on a linear path with the optical axis perpendicular to the cam era displacement the classical inversedepth interpretation will suf ce 85 The 17y coordinates of the disparity space are taken to be coincident with the pixel coordinates of a reference image chosen from our input data set The corre spondence between a pixel my in reference image 7 and a pixel 1 y in matching image m is then given by I I8dIy 49 1 where s i1 is a sign chosen so that disparities are always positive Note that since our images are numbered from leftmost to rightmost the pixels move from right to left Once the disparity space has been speci ed we can intro duce the concept ofa disparity space image orDSI 127 18 In general aDSI is any image or function de ned over a con tinuous or discretized version of disparity space I y d In practice the DSI usually represents the con dence or log likelihood ie cost of a particular match implied by dI y The goal of a stereo correspondence algorithm is then to produce a univalued function in disparity space dr7 y that best describes the shape of the surfaces in the scene This can be viewed as nding a surface embedded in the dispar ity space image that has some optimality property such as lowest cost and best piecewise smoothness 127 Figure 1 shows examples of slices through a typical DSI More gures of this kind can be found in 18 3 A taxonomy of stereo algorithms In order to support an informed comparison of stereo match ing algorithms we develop in this section a taxonomy and categorization scheme for such algorithms We present a set of algorithmic building blocks from which a large set of existing algorithms can easily be constructed Our taxonomy is based on the observation that stereo algorithms generally perform subsets of the following four steps 97 96 1 matching cost computation 2 cost support aggregation39 3 disparity computation optimization39 and 4 disparity re nement The actual sequence of steps taken depends on the speci c algorithm For example local windowbased algorithms where the disparity computation at a given point depends only on intensity values within a nite window usually make implicit smoothness assumptions by aggregating support Some of these algorithms can cleanly be broken down into steps 1 2 3 For example the traditional smnofsquareddifferences SSD algorithm can be described as 1 the matching cost is the squared difference of intensity values at a given disparity39 2 aggregation is done by summing matching cost over square windows with constant disparity39 3 disparities are computedby selecting the minimal win ning aggregated value at each pixel Some local algorithms however combine steps 1 and 2 and use a matching cost that is based on a support region eg normalized crosscorrelation 51 19 and the rank transform 129 This can also be viewed as a preprocessing step39 see Section 31 On the other hand global algorithms make explicit smoothness assumptions and then solve an optimization problem Such algorithms typically do not perform an ag gregation step but rather seek a disparity assignment step 3 that minimizes a global cost function that combines data step 1 and smoothness terms The main distinction be tween these algorithms is the minimization procedure used eg simulated annealing 75 6 probabilistic mean eld diffusion 97 or graph cuts 23 In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized but whose behavior mimics closely that of iterative optimization algorithms 73 97 132 Hierar chical coarseto ne algorithms resemble such iterative al gorithm s but typically operate on an image pyramid where results from coarser levels are used to constrain a more local search at ner levels 126 90 11 31 Matching cost computation The most common pixelbased matching costs include squared intensity di erences SD 51 1 77 107 and abso lute intensity di erences AD 58 In the video processing community these matching criteria are referred to as the meansquared error MSE and mean absolute di erence MAD measures the term displaced frame di erence is also often used 118 More recently robust measures including truncated quadratics and contaminated Gaussians have been proposed 15 16 97 These measures are useful because they limit the in uence of mismatches during aggregation Figure 1 Slices through a typical disparity space image DSI a original color image b groundtruth disparities cie three 1 y slicesfor d 107 167 21 e an m d slicefor y 151 the dashed line in Figure 17 Di erent dark matching regions are visible in Figures cie e g the bookshelves table and cans and head statue while three di erent disparity levels can be seen as horizontal lines in the m d slice Figure w Note the dark bands in the various DSIs which indicate regions that match at this disparity Smaller dark regions are often the result of textureless regions Other traditional matching costs include normalized crosscorrelation 51 93 19 which behaves similar to sum ofsquareddifferences SSD and binary matching costs ie match no match 73 based on binary features such as edges 4 50 27 or the sign of the Laplacian 82 Bi nary matching costs are not commonly used in dense stereo methods however Some costs are insensitive to differences in camera gain or bias for example gradientbased measures 100 95 and nonparametric measures such as rank and census transforms 129 Of course it is also possible to correct for di er ent camera characteristics by performing a preprocessing step for biasgain or histogram equalization 48 32 Other matching criteria include phase and lterbank responses 74 63 56 57 Finally Birch eld and Tomasi have pro posed a matching cost that is insensitive to image sampling 12 Rather than just comparing pixel values shifted by inte gral amounts which may miss a valid match they compare each pixel in the reference image against a linearly interpo lated function of the other image The matching cost values over all pixels and all disparities form the initial disparity space image CO 30 y d While our study is currently restricted to twoframe methods the ini tial DSI can easily incorporate information from more than two images by simply summing up the cost values for each matching image m since the D81 is associated with a xed reference image r Equation 1 This is the idea behind multiplebaseline SSSD and SSAD methods 85 62 81 As mentioned in Section 22 this idea can be generalized to arbitrary camera con gurations using a plane sweep algo rithm 30 113 32 Aggregation of cost Local and windowbased methods aggregate the matching cost by summing or averaging over a support region in the D81 Cac y d A support region can be either two dimensional at a xed disparity favoring frontoparallel surfaces or threedimensional in acyd space supporting slanted surfaces Twodimensional evidence aggregation has been implemented using square windows or Gaussian convolution traditional multiple windows anchored at dif ferent points ie shiftable windows 2 18 windows with adaptive sizes 84 60 124 61 and windows based on connected components of constant disparity 22 Three dimensional support functions that have been proposed in clude limited disparity difference 50 limited disparity gra dient 8 8 and Prazdny s coherence principle 89 Aggregation with a xed support region can be performed using 2D or 3D convolution C967y7d w967 y d 00967 y d 2 or in the case of rectangular windows using ef cient mov ing average box lters Shiftable windows can also be implemented ef ciently using a separable sliding min lter Section 42 A different method of aggregation is itera tive dijj usion ie an aggregation or averaging operation that is implemented by repeatedly adding to each pixel s cost the weighted values of its neighboring pixels costs 114103 97 33 Disparity computation and optimization Local methods In local methods the emphasis is on the matching co st computation and on the co st aggregation steps Computing the nal disparities is trivial simply choose at each pixel the disparity associated with the minimum cost value Thus these methods perform a local winnertake all WTA optimization at each pixel A limitation of this approach and many other correspondence algorithms is that uniqueness of matches is only enforced for one image the reference image while points in the other image might get matched to multiple points Global optimization In contrast globalmethods perform almost all of their work during the disparity computation phase and often skip the aggregation step Many global methods are formulated in an energyminimization frame work 119 The objective is to nd a disparity function d that minimizes a global energy Ed Edi aw AEsmaath 3 The data term Edam measures how well the disparity function d agrees with the input image pair Using the dis parity space formulation Eda1ad Z Comm 4 gay where C is the initial or aggregated matching cost DST The smoothness term Esmaalh d encodes the smooth ness assumptions made by the algorithm To make the opti mization computationally tractable the smoothness term is often restricted to only measuring the differences between neighboring pixels disparities Esmoothd Z Pd179 dx1vy W Mm 7 down 5 where p is some monotonically increasing function of dis parity difference An alternative to smoothness functionals is to use a lowerdimensional representation such as splines 112 In regularizationbased vision 87 p is a quadratic func tion which makes d smooth everywhere and may lead to poor results at object boundaries Energy functions that do not have this problem are called discontinuitypreserving and are based on robust p functions 119 16 97 Geman and Gem an s seminal paper 47 gave a Bayesian interpreta tion of these kinds of energy functions 1 10 and proposed a discontinuitypreserving energy function based on Markov Random Fields MRFs and additional line processes Black and Rangarajan 16 show how line processes can be often be subsumed by a robust regularization framework The terms in ES oath can also be made to depend on the intensity differences eg Mamaidr17y IHIIyy71I17yll7 6 where p is some monotonically decreasing function of in tensity differences that lowers smoothness costs at high in tensity gradients This idea 44 42 18 23 encourages dis parity discontinuities to coincide with intensitycolor edges and appears to account for some of the good performance of global optimization approaches Once the global energy has been de ned a variety of al gorithms can be used to nd a local minimum Traditional approaches associated with regularization and Markov Ran dom Fields include continuation 17 simulated annealing 47 75 6 highest con dence rst 28 and mean eld an nealing 45 More recently max ow and graphcut methods have been proposed to solve a special class of global optimiza tion problems 92 55 23 123 65 Such methods are more ef cient than simulated annealing and have produced good results Dynamic programming A different class of global opti mization algorithms are those based on dynamic program ming While the 2Doptimization of Equation 3 can be shown to be NPhard for common classes of smoothness functions 123 dynamic programming can nd the global minimum for independent scanlines inpolynomialtime Dy nam ic programming was rst used for stereo vision in sparse edgebased methods 3 83 More recent approaches have focused on the dense intensitybased scanline optimization problem 10 9 46 31 18 13 These approaches work by computing the minimumcost path through the matrix of all pairwise matching costs between two corresponding scan lines Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image Figure 2 shows one such example Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the di lculty of enforcing interscanline consistency al though several methods propose ways of addressing the lat ter 83 9 31 18 13 Another problem is that the dynamic programming approach requires enforcing the monotonicity or ordering constraint 128 This constraint requires that the relative ordering of pixels on a scanline remain the same between the two views which may not be the case in scenes containing narrow foreground objects Cooperative algorithms Finally cooperative algo rithms inspired by computational models of human stereo vision were among the earliest methods proposed for dis parity computation 36 73 76 114 Such algorithms it eratively perform local computations but use nonlinear op erations that result in an overall behavior similar to global optimization algorithms In fact for some of these algo rithms it is possible to explicitly state a global function that is being minimized 97 Recently a promising variant of Marr and Poggio s original cooperative algorithm has been developed 132 34 Re nement of disparities Most stereo correspondence algorithms compute a set of disparity estimates in some discretized space eg for inte ght xmn ne gt a b c a e r g k L scanline Figure 2 Stereo matching using dynamic programming For each pair of corresponding scanlines a minimizing path through the matrix of all pairwise matching costs is selected Lowercase letters wk symbolize the intensities along each scanline Uppercase letters represent the selected path through the matrix Matches are indicated by M while partially occluded points which have a xed cost are indicated by L and R corresponding to points only visible in the left and right image respectively Usually only a limited disparity range is considered which is H in the gure indicated by the nonshaded squares Note that this diagram shows an unskewed xd slice through the DSI ger disparities exceptions include continuous optimization techniques such as optic ow 11 or splines 112 For ap plications such as robot navigation or people tracking these may be perfectly adequate However for imagebased ren dering such quantized maps lead to very unappealing view synthesis results the scene appears to be made up of many thin shearing layers To remedy this situation many al gorithms apply a subpixel re nement stage after the initial discrete correspondence stage An alternative is to simply start with more discrete disparity levels Subpixel disparity estimates can be computed in a va riety of ways including iterative gradient descent and t ting a curve to the matching costs at discrete disparity lev els 93 71 122 77 60 This provides an easy way to increase the resolution of a stereo algorithm with little addi tional computation However to work well the intensities being matched must vary smoothly and the regions over which these estimates are computed must be on the same correct surface Recently some questions have been raised about the ad visability of tting correlation curves to integersampled matching costs 105 This situation may even be worse when samplinginsensitive dissimilarity measures are used 12 We investigate this issue in Section 64 below Besides subpixel computations there are of course other ways of postprocessing the computed disparities Occluded areas can be detected using crosschecking comparing left toright and righttoleft disparity maps 29 42 A median lter can be appliedto clean up spurious mismatches and holes due to occlusion can be lled by surface tting or by distributing neighboring disparity estimates 13 96 In our implementation we are not performing such cleanup steps since we want to measure the performance of the raw algorithm components 35 Other methods Not all dense twoframe stereo correspondence algorithms can be described in terms of our basic taxonomy and rep resentations Here we brie y mention some additional al gorithms and representations that are not covered by our framework The algorithms described in this paper rst enumerate all possible matches at all possible disparities then select the best set of matches in some way This is a useful approach when a large amount of ambiguity may exist in the com puted disparities An alternative approach is to use meth ods inspired by classic in nitesimal optic ow computa tion Here images are successively warped and motion esti mates incrementally updated until a satisfactory registration is achieved These techniques are most often implemented within a coarseto ne hierarchical re nement framework 90118112 A univalued representation of the disparity map is also not essential Multivalued representations which can rep resent several depth values along each line of sight have been extensively studied recently especially for large multi view data set Many of these techniques use a voxelbased representation to encode the reconstructed colors and spatial occupancies or opacities 113 101 67 34 33 24 Another way to represent a scene with more complexity is to use mul tiple layers each of which can be represented by a plane plus residual parallax 5 14 117 Finally deformable surfaces of various kinds have also been used to perform 3D shape reconstruction from multiple images 120 121 43 38 36 Summary of methods Table 1 gives a summary of some representative stereo m atching algorithm s and their corresponding taxonomy i e the matching cost aggregation and optimization techniques used by each The methods are grouped to contrast different matching costs top aggregation methods middle and op timization techniques third section while the last section lists some papers outside the framework As can be seen from this table quite a large subset of the possible algorithm design space has been explored over the years albeit not very systematically 4 Implementation We have developed a standalone portable C implemen tation of several stereo algorithms The implementation is closely tied to the taxonomy presented in Section 3 and cur rently includes windowbased algorithms diffusion algo Method Matching cost Aggregation Optimization SSD traditional squared difference square Window WTA Hannah 51 crosscorrelation square Window WTA Nishihara 82 binarized lters square Window WTA Kass 63 lter banks none WTA Fleet et al 40 phase none phasematching Jones and Malik 57 lter banks none WTA Kanade 58 absolute difference square Window WTA Scharstein 95 gradientbased Gaussian WTA Zabih and Wood ll 129 rank transform square Window WTA Cox et al 32 histogram eq none DP Frohlinghaus and Buhm ann 41 wavelet phase none phasematching Birch eld and Tomasi 12 shifted abs diff none DP Marr and Poggio 73 binary images iterative aggregation WTA Prazdny 89 binary images 3D aggregation WTA Szeliski and Hinton 1 14 binary images iterative 3D aggregation WTA Okutomi and Kanade 84 squared difference adaptive Window WTA Yang et al 127 crosscorrelation nonlinear ltering hier WTA Shah 103 squared difference nonlinear diffusion regularization Boykov et al 22 thresh abs diff connectedcomponent WTA Scharstein and Szeliski 97 robust sq diff iterative 3D aggregation mean eld Zitnick and Kanade 132 squared difference iterative aggregation WTA Veksler 124 abs diff avg adaptive Window WTA Quam 90 crosscorrelation none hier warp Barnard 6 squared difference none SA Geiger et al 46 squared difference shiftable Window DP Belhum eur 9 squared difference none DP Cox et al 31 squared difference none DP lshikawa and Geiger 55 squared difference none graph cut Roy and Cox 92 squared difference none graph cut Bobick and lntille 18 absolute difference shiftable Window DP Boykov et al 23 squared difference none graph cut Kolmogorov and Zabih 65 squared difference none graph cut Birch eld and Tomasi 14 shifted abs diff none GC planes Tao et al 1 17 squared difference color segmentation WTA regions Table 1 Summary taxonomy of several dense twoframe stereo correspondence methods The methods are grtmped to contrast di rent matching costs top aggregation methods middle and optimization techniques third section The last section lists some papers mtside our amework Key to abbreviations hier 7 hierarchical coarseto ne WTA 7 winnertakeall DP 7 dynamic programming SA 7 simulated annealing GC7 graph cut rithms as well as global optimization methods using dy namic programming simulated annealing and graph cuts While many published methods include special features and postprocessing steps to improve the results we have chosen to implem ent the basic versions of such algorithms in order to assess their respective merits most directly The implementation is modular and can easily be ex tended to include other algorithms or their components We plan to add several other algorithms in the near future and we hope that other authors will contribute their methods to our framework as well Once a new algorithm has been inte grated it can easily be compared with other algorithms using our evaluation module which can measure disparity error and reprojection error Section 51 The implementation contains a sophisticated mechanism for specifying parame ter values that supports recursive script les for exhaustive performance comparisons on multiple data sets We provide a highlevel description of our code using the same division into four parts as in our taxonomy Within our code these four sections are optionally executed in sequence and the perform ancequality evaluator is then in voked A list of the most important algorithm parameters is given in Table 2 41 Matching cost computation The simplest possible matching cost is the squared or ab solute difference in color intensity between corresponding pixels matchfn To approximate the effect of a robust matching score 16 97 we truncate the matching score to a maximal value matchmax When color images are being compared we sum the squared or absolute intensity differ ence in each channel before applying the clipping 1f frac tional disparity evaluation is being performed dispstep lt 1 each scanline is rst interpolated up using either a linear or cubic interpolation lter matchinteip 77 We also optionally apply Birch eld and Tom asi s sampling insensi tive intervalbased matching criterion match interval 12 ie we take the minimum of the pixel matching score and the score at istep displacements or 0 if there is a sign change in either interval We apply this criterion separately to each color channel which is not physically plausible the subpixel shift must be consistent across channels but is easier to implement 42 Aggregation The aggregation section of our test bed implements some commonly used aggregation methods aggnfn 0 Box lter use a separable moving average lter add one rightbottom value subtract one lefttop This im plementation trick m akes such windowbased aggrega tion insensitive to window size in terms of computation tim e and accounts for the fast perform ance seen in re al time matchers 59 64 Figure 3 Shiftable window The e ect oftrying all 3 X 3 shifted windows armnd the black pixel is the same as taking the minimum matching score across all centered nonshifted windows in the same neighborhood Only 3 ofthe neighboring shifted windows are shown here for clarity o Binomial lter use a separable FIR nite impulse re sponse lter We use the coef cients 1151 4 6 4 1 the same ones used in Burt and Adelson s 26 Lapla cian pyramid Other convolution kernels could also be added later as could recursive bidirectional HR ltering which is a very ef cient way to obtain large window sizes 35 The width of the box or convolution kernel is controlled by aggr window size To simulate the effect of shiftable windows 2 18 117 we can follow this aggregation step with a separable square min lter The width of this lter is controlled by the param eter aggnmin lter The cascaded effect of a box lter and an equalsizedmin lter is the same as evaluating a complete set of shifted windows since the value of a shifted window is the same as that of a centered window at some neighboring pixel Figure 3 This step adds very little additional com putation since a moving 1D min lter can be computed ef ciently by only recomputing the min when a minimum value leaves the window The value of aggrmin lter can be less than that of aggrwind0wsize which simulates the effect of a partially shifted window The converse doesn t make much sense since the window then no longer includes the reference pixel We have also implemented all of the diffusion methods developed in 97 except for local stopping ie regular dif fusion the membrane model andBayesian m ean eld dif fusion While this last algorithm can also be considered an optimization method we include it in the aggregation mod ule since it resembles other iterative aggregation algorithms closely The maximum number of aggregation iterations is controlled by aggriter Other parameters controlling the diffusion algorithms are listed in Table 2 43 Optimization Once we have computed the optionally aggregated costs we need to determine which discrete set of disparities best represents the scene surface The algorithm used to deter Name Typical values Description disp min 0 smallest disparity disp max 15 largest disparity disp step 05 disparity step size match fh SD AD matching function match interp Linear Cubic interpolation function matchmax 20 maximum difference for truncated SADSSD match interval false 12 disparity match 12 aggrfh Box Binomial aggregation function aggr windowsize 9 size of Window aggr min lter 9 spatial min lter shiftable Window aggriter 1 number of aggregation iterations di Jambda 015 parameter A for regular and membrane diffusion di ibeta 05 parameter 5 for membrane diffusion di scalecost 001 scale of cost values needed for Bayesian diffusion di imu 05 parameter M for Bayesian diffusion di sigmaP 04 parameter Up for robust prior of Bayesian diffusion di epsP 001 parameter 5 for robust prior of Bayesian diffusion opt fh WTA DP SA GC optimization function optsmoothness optgrad thresh opt grad penalty opt occlusion cost 10 80 20 20 weight of smoothness term A threshold for magnitude of intensity gradient smoothness penalty factor if gradient is too small cost for occluded pixels in DP algorithm optsavar Gibbs Metropolis simulated annealing update rule opt sa start T 100 starting temperature optsa end T 0 01 ending temperature optsa schedule Linear annealing schedule re ne subpix true t subpixel value to local correlation eval bad thresh 10 acceptable disparity error eval textureless width 3 box lter Width applied to V751 2 eval texturelessthresh 40 threshold applied to ltered HVgEI HQ eval disp gap 20 disparity jump threshold eval discont width 9 Width of discontinuity region eval ignoreborder 10 number of border pixels to ignore evalpartial shuf e 02 analysis interval for prediction error Table 2 The most important stereo algorithm parameters of our implementation mine this is controlled by optfn and can be one of o winnertakeall W TA39 0 dynamic programming DP 0 scanline optimization SO 0 simulated annealing SA39 graph cut GC 0 The winnertakeall method simply picks the lowest aggre gated matching cost as the selected disparity at each pixel The other methods require in addition to the matching cost the de nition of a smoothness cost Prior to invoking one of the optimization algorithms we set up tables containing the values of pd in Equation 6 and precompute the spa tially varying weights p I These tables are controlled by the parameters opt smoothness which controls the over all scale of the smoothness term ie A in Equation 3 and the param eters opt grad thresh and opt grad penalty which control the gradientdependent smoothness costs We currently use the smoothness terms de ned by Veksler 123 if AI lt optgradthresh if AI 2 optgraidthresh7 7 MM f where p optgradpenalty Thus the smoothness cost is multiplied by p for low intensity gradient to encourage disparity jumps to coincide with intensity edges All of the optimization algorithms minimize the same objective func tion enabling a more meaningful comparison of their per formance Our rst global optimization technique DP is a dynamic programming method similar to the one proposed by Bobick and lntille 18 The algorithm works by computing the minimumcost path through each xd slice in the DST see Figure 2 Every point in this slice can be in one of three states M match L leftvisible only or R rightvisible only Assuming the ordering constraint is being enforced a valid path can take at most three directions at a point each associated with a deterministic state change Using dynamic programming the minimum cost of all paths to a point can be accumulated e lciently Points in state M are simply charged the matching cost at this point in the DST Points in states L and R are charged a xed occlusion cost opt occlusion cost Before evaluating the nal disparity map we ll all occluded pixels with the nearest background disparity value on the same scanline The DP stereo algorithm is fairly sensitive to this param eter see Section 6 Bobick and lntille address this problem by precomputing ground control points GCPs that are then used to constrain the paths through the DST slice GCPs are highcon dence matches that are computed using SAD and shiftable windows At this point we are not using GCPs in our implementation since we are interested in comparing the 10 basic version of different algorithms However GCPs are potentially useful in other algorithms as well and we plan to add them to our implementation in the future Our second global optimization technique scanline opti mization S0 is a simple and to our knowledge novel ap proach designed to assess different smoothness terms Like the previous method it operates on individual xd DSl slices and optimizes one scanline at a time However the method is asymmetric and does not utilize visibility or ordering con straints Instead a d value is assigned at each point I such that the overall cost along the scanline is minimized Note that without a smoothness term this would be equivalent to a winnertakeall optimization The global minimum can again be computed using dynamic programming however unlike in traditional symmetric DP algorithms the order ing constraint does not need to be enforced and no occlusion cost parameter is necessary Thus the SO algorithm solves the same optimization problem as the graphcut algorithm described below except that vertical smoothness terms are ignored Both DP and SO algorithms suffer from the wellknown dif culty of enforcing interscanline consistency resulting in horizontal streaks in the computed disparity map Bo bick and lntille s approach to this problem is to detect edges in the DST slice and to lower the occlusion cost for paths along those edges This has the effect of aligning depth dis continuities with intensity edges In our implementation we achieve the same goal by using an intensitydependent smoothness cost Equation 6 which in our DP algorithm is charged at all LM and R M state transitions Our implementation of simulated annealing supports both the Metropolis variant where downhill steps are always taken and uphill steps are sometimes taken and the Gibbs Sampler which chooses among several possible states ac cording to the full marginal distribution 47 In the latter case we can either select one new state disparity to ip to at random or evaluate all possible disparities at a given pixel Our current annealing schedule is linear although we plan to add a logarithmic annealing schedule in the future Our nal global optimization method GC implements the 15 swap move algorithm described in 23 123 We plan to implement the aexpansion in the future We ran domize the 15 pairings at each inner iteration and stop the algorithm when no further local energy improvements are possible 44 Re nement The subpixel re nement of disparities is controlled by the boolean variable re nesubpix When this is enabled the three aggregated matching cost values around the winning disparity are examined to compute the subpixel disparity estimate Note that if the initial DSl was formed with frac tional disparity steps these are really subsubpixel values A more appropriate name might be oating point disparity values A parabola is t to these three values the three end ing values are used if the winning disparity is either dispmin or dispmwc If the curvature is positive and the minimum of the parabola is within a halfstep of the winning disparity and within the search limits this value is used as the nal disparity estimate In future work we would like to investigate whether initial or aggregated matching scores should be used or whether some other approach such as LucasKanade might yield higherquality estimates 122 5 Evaluation methodology In this section we describe the quality metrics we use for evaluating the performance of stereo correspondence algo rithms and the techniques we used for acquiring our image data sets and ground truth estimates 51 Quality metrics To evaluate the performance of a stereo algorithm or the effects of varying some of its parameters we need a quan titative way to estimate the quality of the computed corre spondences Two general approaches to this are to compute error statistics with respect to some ground truth data 8 and to evaluate the synthetic images obtained by warping the reference or unseen images by the computed disparity map 1 l 1 In the current version of our software we compute the following two quality measures based onknown groundtruth data 1 RMS rootmeansquared error measured in disparity units between the computed disparity map do I y and the ground truth map dT I ie gilammiaWMZ R any 8 where N is the total number of pixels 2 Percentage of bad matching pixels Bwamwiammgtai o T where 60 evalbadthresh is a disparity error toler ance For the experiments inthis paper we use 60 10 since this coincides with some previously published studies 116 13265 In addition to computing these statistics over the whole image we also focus on three different kinds of regions These regions are computed by preprocessing the reference image and ground truth disparity map to yield the following three binary segmentations Figure 4 ll Name Description RMS disparity error quot no occlusions rms error all rms error nonocc R5 rms error 000 R0 quot at occlusions rms error textured R7 quot textured rms error textureless R quot textureless rms error discont Rp quot near discontinuities bad pixels all B bad pixel percentage badpixe lsnonocc B5 quot no occlusions badpixe ls 000 B0 quot at occlusions badpixe ls textured B7 quot textured badpixe ls textureless B quot textureless bad pixels discont Bp quot near discontinuities predicterrnear P view extr error near predicterrmiddle P1 2 view extr error mid predicterrmatch P1 view extr error match predicterrfar P view extr error far Table 3 Error quality statistics computed by our evaluator See the notes in the text regarding the treatment ofoccluded regions 0 textureless regions 7 regions where the squared hor izontal intensity gradient averaged over a square win dow of a given size eval textureless width is below a given threshold eval textureless thresh o occluded regions 0 regions that are occluded in the matching image ie where the forwardmapped dis parity lands at a location with a larger nearer disparity39 and o depth discontinuity regions D pixels whose neighbor ing disparities differ by more than eval disp gap di lated by a window of width eval discont width These regions were selected to support the analysis of match ing results in typical problem areas For the experiments in this paper we use the values listed in Table 2 The statistics described above are computed for each of the three regions and their complements eg 1 BT N 2 Wm 7 Mam lt 6d myET and so on for R7 B R5 Table 3 gives a complete list of the statistics we collect Note that for the textureless textured and depth discontinu ity statistics we exclude pixels that are in occluded regions on the assumption that algorithms generally do not pro duce meaningful results in such occluded regions Also we exclude a border of evalignoreborder pixels when com puting all statistics since many algorithms do not compute meaningful disparities near the image boundaries u wanna wwlwrux w meWidwwim mm a mummmmmmmm mm mum nwmuwrmmmuwwt m m u WM rM imamaux m m am nu ma nuam mwndwm xomt at 9 mm mm awryWWW za 7mm WW1 um m u Wm m a tnmlwzm M am mu m a um im 4 am im p d mwl an 5 mm am mammmaw ma amwrww p rm ma mmmmw mm mum o mmpm s mum n mmwum v m p The second major approach to gauging the quality of re construction algorithms is to use the color images and dis parity maps to predict the appearance of other views 111 Here again there are two major avors possible 1 Forward warp the reference image by the computed disparity map to a different potentially unseen view Figure 5 and compare it against this new image to obtain a forward prediction error 2 Inverse warp anew view by the computed disparity map to generate a stabilized image Figure 6 and compare it against the reference image to obtain an inverse pre diction error There are pros and cons to either approach The forward warping algorithm has to deal with tearing problems if a singlepixel splat is used gaps can arise even between adjacent pixels with similar disparities One pos sible solution would be to use a twopass renderer 102 Instead we render each pair of neighboring pixel as an in terpolated color line in the destination image ie we use Gouraud shading 1f neighboring pixels differ by more that a disparity of evaldispgap the segment is replaced by sin gle pixel spats at both ends which results in a visible tear light magenta regions in Figure 5 For inverse warping the problem of gaps does not oc cur lnstead we get ghosted regions when pixels in the reference image are not actually visible in the source We eliminate such pixels by checking for visibility occlusions rst and then drawing these pixels in a special color light magenta in Figure 6 We have found that looking at the inverse warped sequence based on the groundtruth dispari ties is a very good way to determine if the original sequence is properly calibrated and recti ed 1n computing the prediction error we need to decide how to treat gaps Currently we ignore pixels agged as gaps in computing the statistics and report the percentage of such missing pixels We can also optionally compensate for small misregistrations 111 To do this we convert each pixel in the original and predicted image to an interval by blend ing the pixels value with some fraction evalpariial shu ie of its neighboring pixels min and max values This idea is a generalization of the samplinginsensitive dissimilarity measure 12 and the shuf e transformation of 66 The reported difference is then the signed distance between the two computed intervals We plan to investigate these and other samplinginsensitive matching costs in the future 115 52 Test data To quantitatively evaluate our correspondence algorithms we require data sets that either have a ground truth dispar ity map or a set of additional views that can be used for prediction error test or preferably both 13 We have begun to collect such a database of images build ing upon the methodology introduced in 1 16 Each image sequence consists of 9 im ages taken at regular intervals with a camera mounted on a horizontal translation stage with the camera pointing perpendicularly to the direction of motion We use a digital highresolution camera Canon G1 set in manual exposure and focus mode and rectify the images us ing tracked feature points We then downsample the original 2048 x 1536 images to 512 x 384 using a highquality 8tap lter and nally crop the images to normalize the motion of background objects to a few pixels per frame All of the sequences we have captured are made up of piecewise planar objects typically posters or paintings some with cutout edges Before downsampling the images we handlabel each image into its piecewise planar compo nents Figure 7 We then use a direct alignment technique on each planar region 5 to estimate the af ne motion of each patch The horizontal component of these motions is then used to compute the ground truth disparity In future work we plan to extend our acquisition methodology to han dle scenes with quadric surfaces eg cylinders cones and spheres Of the six image sequences we acquired all of which are available on our web page we have selectedtwo Sawtooth and Venus for the experimental study in this paper We also use the University of Tsukuba head and lamp data set 81 a 5 x 5 array of images together with handlabeled integer groundtruth disparities for the center image Finally we use the monochromatic Map data set rst introduced by Szeliski and Zabih 116 which was taken with a Point Grey Research trinocular stereo cam era and whose ground truth disparity m ap was computed using the piecewise planar technique described above Figure 7 shows the reference image and the groundtruth disparities for each of these four sequences We exclude a border of 18 pixels in the Tsukuba images since no groundtruth disparity values are provided there For all other images we use evalignoreborder 10 for the experiments reported in this paper In the future we hope to add further data sets to our collection of standard test images in particular other se quences from the University of Tsukuba and the GRASP Laboratory s Buffalo Bill data set with registered laser range nder ground truth 80 There may also be suitable images among the C1TU Computer Vision Home Page data sets Unfortunately we cannot use data sets for which only a sparse set of feature matches has been computed 19 54 It should be noted that highquality groundtruth data is critical for a meaningful performance evaluation Accurate subpixel disparities are hard to come by however The groundtruth data for the Tsukuba images for example is strongly quantized since it only provides integer disparity estimates for a very small disparity range d 5 14 This is clearly visible when the images are stabilized using planar reglons goundeh39uLh dxspanues planar reglons goundeh39uLh dxspanues groundrtruLh auspanues groundrtruLh auspanues Fxgure 7 Stereo Image thh gmmdtrwh wedm hx Iudy The Sawtoth aha VentJ Image are two exam hew a ahe terea eqaehee afplamzr object The gare haw the mf mme Image the pzaha39 Vegan labelmg aha the gmundrtrwh dupenhe We elm 112 the zmxlmr Txulmba head aha lamp dam Et aha the heheeheehahe Map xmagepm 14 the groundtruth data and viewed in a video loop In contrast the groundtruth disparities for our piecewise planar scenes have high subpixel precision but at the cost of limited scene complexity To provide an adequate challenge for the bestperforming stereo methods new stereo test images with complex scenes and subpixel ground truth will soon be needed Synthetic images have been used extensively for quali tative evaluations of stereo methods but they are often re stricted to simple geometries and textures e g randomdot stereograms Furthermore issues arising with real cam eras are seldom modeled eg aliasing slightmisalignment noise lens aberrations and uctuations in gain and bias Consequently results on synthetic images usually do not extrapolate to images taken with real cameras We have experimented with the University of Bonn s synthetic Cor ridor data set 41 but have found that the clean noisefree images are unrealistically easy to solve while the noise contaminated versions are too dif cult due to the complete lack of texture in much of the scene There is a clear need for synthetic photorealistic test imagery that properly models realworld imperfections while providing accurate ground truth 6 Experiments and results In this section we describe the experiments used to evalu ate the individual building blocks of stereo algorithms Us ing our implementation framework we examine the four main algorithm components identi ed in Section 3 match ing cost aggregation optimization and subpixel tting In Section 7 we perform an overall comparison of a large set of stereo algorithms including other authors implemen tations We use the Tsukuba Sawtooth Venus and Map data sets in all experiments and report results on subsets of these images The complete set of results all experi ments run on all data sets is available on our web site at www middlebury edustereo Using the evaluation measures presented in Section 51 we focus on common problem areas for stereo algorithms Of the 12 groundtruth statistics we collect Table 3 we have chosen three as the most important subset First as a measure of overall performance we use B5 the percent age of bad pixels in nonoccluded areas We exclude the occluded regions for now since few of the algorithms in this study explicitly model occlusions and most perform quite poorly in these regions As algorithms get better at matching occluded regions 65 however we will likely focus more on the total matching error B The other two important measures are E and Bp the percentage of bad pixels in textureless areas and in areas near depth discontinuities These measures provide important in formation about the performance of algorithms in two criti cal problem areas The parameter names for these three mea 15 sures are bad pixelsmanacc bad pixels textureless and badpixelsdiscanl and they appear in most of the plots below We prefer the percentage of bad pixels over RMS disparity errors since this gives a better indication of the overall performance of an algorithm For example an al gorithm is performing reasonably well if B5 lt 10 The RMS error gure on the other hand is contaminated by the potentially large disparity errors in those poorly matched 10 of the image RMS errors become important once the percentage of bad pixels drops to a few percent and the qual ity of a subpixel t needs to be evaluated see Section 64 Note that the algorithms always take exactly two images as input even when more are available For example with our 9frame sequences we use the third and seventh frame as input pair The other frames are used to measure the prediction error 61 Matching cost We start by comparing different matching costs including absolute differences AD squared differences SD trun cated versions of both and Birch eld and Tomasi s 12 samplinginsensitive dissimilarity measure BT An interesting issue when trying to assess a single algo rithm component is how to x the parameters that control the other components We usually choose goodvalues based on experiments that assess the other algorithm components The inherent bootstrapping problem disappears after a few rounds of experiments Since the best settings for many pa rameters vary depending on the input image pair we often have to compromise and select a value that works reasonably well for several images Experiment 1 In this experiment we compare the match ing costs AD SD ADBT and SDBT using a local al gorithm We aggregate with a 9 x 9 window followed by winnertakeall optimization ie we use the standard SAD and SSD algorithms We do not compute subpixel esti mates Truncation values used are 1 2 5 10 20 50 and 00 no truncation these values are squared when truncating SD Results Figure 8 shows plots of the three evaluation mea sures Ba B and Bp for each of the four matching costs as a function of truncation values for the Tsukuba Saw tooth and Venus images Overall there is little difference betweenAD and SD Truncation matters mostly for points near discontinuities The reason is that for windows con taining m ixed populations both foreground and background points truncating the matching cost limits the in uence of wrong matches Good truncation values range from 5 to 50 typically around 20 Once the truncation values drop below the noise level eg 2 and 1 the errors become very large Using Birch eldTomasi BT helps for these small trunca tion values but yields little improvem ent for good truncation 60 50 Tsukuba 40 I 9x9 30 A bad7plxelsinonooc 20 7 bad7plxelsitextureless bad7plxelsidscont 10 0 Egaewwr Egaewwr Egaewwr EE SWNF SAD SSD SADBT SSDBT matchimax 60 I 50 II Sawtooth 40 9x9 30 bad7plxelsinonooc 20 7 bad7plxelsitextureless bad7plxelsidscont 10 i 0 Egaemr Egaemr Egaemr Egaemr SAD SSD SADBT SSDBT matchimax 60 K r 50 l f Venus 40 9x9 30 bad7plxelsinonooc 20 7 bad7plxelsitextureless M bad7plxelsidscont 10 0 Egaemr Egaemr Egaemr Egaemr SAD SSD SADBT SSDBT matchimax Figure 8 Experiment 1 Performance of d rent matching costs aggregated with a 9 X 9 window as a Anction of truncation values matchmax for three di erent image pairs Intermediate truncation values 20 yield the best results Birch eld Tomasi helps when truncation values are low 16 values The results are consistent across all data sets how ever the best truncation value varies We have also tried a window size of 21 with similar results Conclusion Truncation can help for AD and SD but the best truncation value depends on the images signaltonoise ratio SNR since truncation should happen right above the noise level present see also the discussion in 97 Experiment 2 This experiment is identical to the previ ous one except that we also use a 9 X 9 min lter in effect we aggregate with shiftable windows Results Figure 9 shows the plots for this experiment again for Tsukuba Sawtooth and Venus images As before there are negligible differences between AD and SD Now how ever the nontruncated versions perform consistently the best In particular for points near discontinuities we get the lowest errors overall but also the total errors are com parable to the best settings of truncation in Experiment 1 BT helps bring down larger errors but as before does not signi cantly decrease the best nontruncated errors We again also tried a window size of 21 with similar results Conclusion The problem of selecting the best truncation value can be avoided by instead using a shiftable window min lter This is an interesting result as both robust matching costs trunctated functions and shiftable windows have been proposed to deal with outliers in windows that straddle object boundaries The above experiments suggest that avoiding outliers by shifting the window is preferable to limiting their in uence using truncated cost functions Experiment 3 We now assess how matching costs affect global algorithms using dynamic programming DP scan line optimization SO and graph cuts GC as optimization techniques A problem with global techniques thatminimize a weighted sum of data and smoothness terms Equation 3 is that the range of matching cost values affects the optimal value for A ie the relative weight of the smoothness term For example squared differences require much higher values for A than absolute differences Similarly truncated differ ence functions result in lower matching costs and require lower values for A Thus in trying to isolate the effect of the matching costs we are faced with the problem of how to choose A The cleanest solution to this dilemma would per haps be to nd a different optimal A independently for each matching cost under consideration and then to report which matching cost gives the overall best results The optimal A however would not only differ across matching costs but also across different images Since in a practical matcher we need to choose a constant A we have done the same in this experiment We use A 20 guided by the results dis cussed in Section 63 below and restrict the matching costs to absolute differences AD truncatedby varying amounts For the DP algorithm we use a xed occlusion cost of 20 17 Results Figure 10 shows plots of the bad pixel percent ages Ba B and Bp as a function of truncation values for Tsukuba Sawtooth and Venus images Each plot has six curves corresponding to DP DPBT SO SOBT GC GCBT It can be seen that the truncation value affects the performance As with the local algorithms if the truncation value is too small in the noise range the errors get very large Intermediate truncation values of 5075 depending on algorithm and image pair however can sometimes improve the performance The effect of Birch eldTomasi is mixed39 as with the local algorithms in Experiments 1 and 2 it limits the errors if the truncation values are too small It can be seen that ET is most bene cial for the SO algorithm how ever this is due to the fact that SO really requires a higher value of A to work well see Experiment 5 in which case the positive effect of ET is less pronounced Conclusion Using robust truncated matching costs can slightly improve the performance of global algorithms The best truncation value however varies with each image pair Setting this parameter automatically based on an estimate of the image SNR may be possible and is a topic for fur ther research Birch eld and Tomasi s matching measure can improve results slightly lntuitively truncation should not be necessary for global algorithms that operate on un aggregated matching costs since the problem of outliers in a window does not exist An important problem for global algorithms however is to nd the correct balance between data and smoothness terms see Experiment 5 below Trun cation can be useful in this context since it limits the range of possible cost values 62 Aggregation We now turn to comparing different aggregation methods used by local methods While global methods typically op erate on raw unaggregated costs aggregation can be useful for those methods as well for example to provide starting values for iterative algorithms or a set of highcon dence matches or graundcanz ralpaints GCPs l 8 usedto restrict the search of dynamicprogramming methods In this section we examine aggregation with square win dows shiftable windows min lter binomial lters reg ular diffusion and membrane diffusion 97 Results for Bayesian diffusion which combines aggregation and opti mization can be found in Section 7 Experiment 4 In this experiment we use nontruncated absolute differences as matching cost and perform a winner takeall optimization after the aggregation step no subpixel estimation We compare the following aggregation meth ods 1 square windows with window sizes 3 5 7 2939 2 shiftable square windows min lter with window sizes 3 5 7 2939 60 50 Tsukuba 40 9x9 min filter 30 badiplxelsinonooc 20 badiplxelsitextureless badiplxelsidlscont 10 0 Egaewwr Egaewwr ES SWNF EE SWNF SAD SSD SADBT SSDBT matchimax 60 50 Sawtooth 40 9x9 min filter 30 A badiplxelsinonooc 20 1 f badiplxelsitextureless badiplxelsidlscont m l 0 E3aewwr Egaewwr Egaewwr Egaewwr SAD SSD SADBT SSDBT matchimax 60 50 f Venus 40 9x9 min filter 30 badiplxelsinonooc 20 badiplxelsitextureless badiplxelsidscont 10 0 Egaewwr Egaewwr Egaewwr Egaewwr SAD SSD SADBT SSDBT matchimax Figure 9 Experiment 2 Performance of dij erent matching costs aggregated with a 9 X 9 shiftable window min lter as a xnction of truncation values matchmaxfor three di erent image pairs Large truncation values no truncation work best when using shiftable windows 18 60 50 Tsukuba 40 30 bad7pxelsinonocc 20 bad7pxelsitextureless bad7pxelsidsoont 10 0 E882 N aneww E882 N E882 N 882 E883 DP DPBT SO SOBT GC GCBT matchimax r r r I 50 A m Sawtooth 40 1quot fl 1 III k 30 bad7pxelsinon0cc 200 bad7pxelsitextureless o bad7pxelsidsoont 10 0 Egaeww Egaeww Egaeww EE SWN E382 E382 DP DPBT SO SOBT GC GCBT matchimax H II M r 50 I I Venus 40 1 30 bad7pxelsinonocc 20 quot bad7pxelsitextureless bad7pxelsidsoont 10 39 0 E882 N aneww E882 N E882 N DP DPBT SO Inf 50 C 20 CBT Figure 10 Experiment 3 Performance of dij erent matching costs for global algorithms as a Anction of truncation values matchmax for three di erent image pairs Intermediate truncation values N 20 can sometimes improve the performance 19 50 40 Tsukuba 30 bad7plxelsinonocc bad7plxelsitextureless bad7plxelsidlscont 20 10 0 MW NW NW comm WW mewm bm manages 50 T 40 30 7 window w windowminf w binomial f iter diffusion iter membrane 3 Sawtooth J bad7plxelsinonocc 20 bad7plxelsitextureless bad7plxelsidlscont 10 0 WMWWWWW WWW window w windowminf w binomial f iter diffusion iter membrane B 50 40 Venus 30 bad7plxelsinonocc bad7plxelsitextureless bad7plxelsidlscont 20 10 0 window w windowminf w binomial f iter diffusion iter membrane 3 Figure 11 Experiment 4 Performance of dt erent aggregation methodt as a xnetion of spatial extent window size number of iterations and dt xsion Larger window extents do worse near discontinuitiex butbetter in textureless areas which tend to dominate the overall statistics Near discontinuitiex shiftable windows have the best performance 20 3 iterated binomial 14641 lter for 2 4 6 28 iterations 4 regular diffusion 97 for 10 20 30 150 iterations 5 membrane diffusion 97 for 150 iterations and 09 08 07 00 Note that for each method we are varying the parameter that controls the spatial extent of the aggregation ie the equiv alent of window size In particular for the binomial lter and regular diffusion this amounts to changing the number of iterations The membrane model however converges af ter suf ciently many iterations and the spatial extent of the aggregation is controlled by the parameter 5 the weight of the original cost values in the diffusion equation 97 Results Figure 11 shows plots of B5 B and Bp as a function of spatial extent of aggregation for Tsukuba Saw tooth and Venus images Each plot has ve curves corre sponding to the ve aggregation methods listed above The most striking feature of these curves is the opposite trends of errors in textureless areas B and at points near discon tinuities B1 Not surprisingly more aggregation larger window sizes or higher number of iterations clearly helps to recover textureless areas note especially the Venus images which contain large untextured regions At the same time too much aggregation causes errors near object boundaries depth discontinuities The overall error in nonoccluded regions B5 exhibits a mixture of both trends Depending on the image the best performance is usually achieved at an intermediate amount of aggregation Among the ve ag gregation methods shiftable windows clearly perform best most notably in discontinuity regions but also overall The other four methods square windows binomial lter regu lar diffusion and membrane model perform very similarly except for differences in the shape of the curves which are due to our somewhat arbitrary de nition of spatial extent for each method Note however that even for shiftable win dows the optimal window size for recovering discontinuities is small while much larger windows are necessary in untex tured regions Discussion This experiment exposes some of the funda mental limitations of local methods While large windows are needed to avoid wrong matches in regions with little texture windowbased stereo methods perform poorly near object boundaries ie depth discontinuities The reason is that such methods implicitly assume that all points within a window have similar disparities If a window straddles a depth boundary some points in the window match at the foreground disparity while others match at the background disparity The aggregated cost function at a point near a depth discontinuity is thus bimodal in the d direction and stronger of the two modes will be selected as the winning disparity Which one of the two modes will win This de 21 pends on the amount of horizontal texture present in the two regions Consider rst a purely horizontal depth discontinuity top edge of the foreground square in Figure 12 Whichever of the two regions has more horizontal texture will create a stronger mode and the computed disparities will thus bleed into the lesstextured region For nonhorizontal depth boundaries however the most prominent horizontal texture is usually the object boundary itself since differ ent objects typically have different colors and intensities Since the object boundary is at the foreground disparity a strong preference for the foreground disparity at points near the boundary is created even if the background is textured This is the explanation for the wellknown foreground fat tening effect exhibited by windowbased algorithms This can be seen at the right edge of the foreground in Figure 1239 the left edge is an occluded area which can t be recovered in any case Adaptive window methods have been developed to com bat this problem The simplest variant shiftable windows min lters can be effective as is shown in the above ex periment Shiftable windows can recover object boundaries quite accurately if both foreground and background regions are textured and as long as the window ts as a whole within the foreground object The size of the min lter should be chosen to match the window size As is the case with all lo cal methods however shiftable windows fail in textureless areas Conclusion Local algorithms that aggregate support can perform well especially in textured even slanted regions Shiftable windows perform best in particular near depth dis continuities Large amounts of aggregation are necessary in textureless regions 63 Optimization In this section we compare the four global optimization tech niques we implemented dynamic programming DP scan line optimization SO graph cuts GC and simulated an nealing SA Experiment 5 In this experiment we investigate the role of apLsmathness the smoothness weight A in Equation 3 We compare the performance of DP SO GC and SA for A 5 10 20 50 100 200 500 and 1000 We use unag gregated absolute differences as the matching cost squared differences would require much higher values for A and no subpixel estimation The number of iterations for simulated annealing SA is 500 Results Figure 13 shows plots of B5 B and Bp as a function of A for Tsukuba Venus and Map images To show more varied results we use the Map images instead of Sawtooth in this experiment Since DP has an extra parameter the occlusion cost we include three runs for True disparities SAD error SADMF SADMF error Figure 12 Illustration of the foreground fattening e ect using the W image pair and a 21 X 21 SAD algorithm with and without a min lter The error maps encode the signed disparity error using gray for 0 light for positive errors and dark for negative errors Note that without the min lter middle column the foreground region grows across the vertical depth discontinuity towards the right With the min lter right column the object boundaries are recover fairly well optiocclusionicost 20 50 and 80 Using as before 35 bad pixels nonocc as our measure of ovemll performance it can be seen that the graphcut method GC consistently performs best while the other three DP SO and SA per form slightly worse with no clear mnking among them GC also performs best in textureless areas and near discontinu ities The best performance for each algorithm however requires different values for A depending on the image pair For example the Map images which are well textured and only contain two planar regions require high values aron 500 while the Tsukuba images which contain many objects at different depths require smaller values 207200 also de pending on the algorithm The occlusion cost parameter for the DP algorithm while not changing the performance dramatically also affects the optimal value for A Although GC is the clear winner here it is also the slowest algorithm DP and SO which operate on each scanline independently typically run in less than 2 seconds while GC and SA re quire 10730 minutes Conclusion The graphcut method consistently outper forms the other optimization methods although at the cost of much higher running times GC is clearly superior to sim ulated annealing which is consistent with other published results 23 116 When comparing GC and scanline meth ods DP and SO however it should be noted that the latter solve adifferent easier optimization problem since vertical smoothness terms are ignored While this enables the use of 22 highly e icient dynamic programming techniques it nega tively affects the performance as exhibited in the character istic streaking in the disparity maps see Figures 17 and 18 below Several authors have proposed methods for increas ing interscanline consistency in dynamicprogramming ap proaches eg 9 31 13 We plan to investigate this area in future work Experiment 6 We now focus on the gmphcut optimiza tion method to see whether the results can be improved We try bothBirch eldTomasi matching costs and a smoothness cost that depends on the intensity gradients Results Figure 14 shows the usual set of performance mea sures Bg Bf and ED for four different experiments for Tsukuba Sawtooth Venus and Map images We use a smoothness weight of A 20 except for the Map images where A 50 The matching cost are nontruncated abso lute differences The parameters for the gradientdependent smoothness costs are opLgrad thresh 8 same in all exper iments and opt grad penalty 1 2 or 4 denoted p1 p2 and p4 in the plots Recall that the smoothness cost is multi plied by opt grad penalty if the intensity gradient is below opLgrad thresh to encoumge disparity jumps to coincide with intensity edges Each plot in Figure 14 shows 4 runs p1 p1B T p2BT and p4B T In the first run the penalty is 1 ie the gradient dependency is tuined off This gives the same results as in Experiment 5 In the second run we 60 50 40 30 20 10 0 Map badiplxelsinonooc ba dipixelsitexturel e55 badiplxelsidscont g5 mooooooo mooooooo mooooooo mooooooo mooooooo mooooooo FNLnoooo FNLnoooo FNLnoooo meoooo FNLnoooo me0000 ero ero ero ero ero ero DP 20 DP 50 DP 80 SO GC SA optismoothness 60 50 40 30 20 10 0 Tsukuba be diplxel sinon use be diplxel sitextureless badiplxelsidlsoont 70000000 70000000 70000000 70000000 70000000 70000000 FNADOOOO FNADOOOO Fwwoooo Fwwoooo FNADOOOO Fwwoooo Fwwo FNADO FNADO ero Fwwo FNADO DP 20 DP 50 DP so so so SA optismoothness 60 50 40 30 H e V 7 Venus be diplxel sinon use be diplxel sitextureless 20 badiplxelsidsoont 10 39 A quot 0 00 70000000 70000000 70000000 70000000 70000000 FNADOOOO FNADOOOO Fwwoooo Fwwoooo FNADOOOO Fwwoooo Fwwo FNADO FNADO ero Fwwo FNADO DP 20 DP 50 DP so so so SA optismoothness Figure 13 Experiment 5 Performance of global optimization techniques as a Anction of the smoothness weight A opt smoothness for Map T sukuba and Venus images Note that each image pair requires a dijferent value of A for optimal performance 23 15 12 Tsukuba 9 badp1xe1snonooo I badp1xe1slexmre1ess badp1xe1sa1soom 5 3 0 p1 p1 ET p2BT pABT 15 12 Map 9 badp1xe1snonooo I badp1xe1slexmre1ess badp1xe1sa1soom 5 3 0r p1 p1 ET p2BT pABT 15 12 Sawtooth 9 badp1xe1snonooo I badp1xe1slexmre1ess badp1xe1sa1soom 5 z Q p1 p1 ET p2BT pABT 3 0 15 p1 p1 ET 12 Venus 9 badp1xe1snonooo I badp1xe1slexmre1ess badp1xe1sa1soom 5 3 0 p2BT pABT Figure 14 Experiment 6 Performance of the graph cut optimization technique with di rent gradient dependent smoothness penalties Q71 p2 p4 and with and without Bireh eld Tomasi add Birch eldTom asi still without a penalty We then add a penalty of 2 and 4 in the last two runs It can be seen that the lowgradient penalty clearly helps recovering the dis continuities and also in the other regions Which of the two penalties works better depends on the im age pair Birch eld Tom asi also yields a slight improvement We have also tried other values for the threshold with mixed results In future work we plan to replace the simple gradient threshold with an edge detector which should improve edge localization The issue of selecting the right penalty factor is closely re lated to selecting the right value for A since it affects the overall relationship between the data term and the smooth ness term This also deserves more investigation Conclusion Both Birch eldTomasi s matching cost and the gradientbased smoothness cost improve the perfor mance of the graphcut algorithm Choosing the right parameters threshold and penalty remains dif cult and imagespeci c We have perform ed these experiments for scanlinebased optimization methods DP and SO as well with similar results Gradientbased penalties usually increase perfor mance in particular for the SO method Birch eldTomasi always seems to increase overall performance but it some times decreases performance in textureless areas As be fore the algorithms are highly sensitive to the weight of the smoothness term A and the penalty factor 24 64 Sub pixel estimation Experiment 7 To evaluate the performance of the sub pixel re nem ent stage and also to evaluate the in uence of the matching criteria and disparity sampling we cropped a small planar region from one of our image sequences Figure 15a second column of images The image itself is a page of newsprint mounted on cardboard with high frequency text and a few lowfrequency white and dark re gions These textureless regions were excluded from the statistics we gathered The disparities in this region are in the order of 0858 pixels and are slanted both vertically and horizontally Results We rst run a simple 9 x 9 SSD window Fig ure 15b One can clearly see the discrete disparity levels computed The disparity error map second column of im ages shows the staircase error and the histogram of dispari ties third column also shows the discretization If we apply the subpixel parabolic t to re ne the disparities the dis parity map becomes smoother note the drop in RMS error in Figure 15c but still shows some soft staircasing which is visible in the disparity error map and histogram as well These results agree with those reported by Shim izu and Oku tomi 105 In Figure 15d we investigate whether using the Birch eldTomasi samplinginvariant measure 12 im proves or degrades this behavior For integral sampling their idea does help slightly as can be seen by the reduced disp re ne Birchf preproc RMS disp disp disp step subpiX Tomasi blur disp error histogram a ground truth 0 L b 1 no no no 0296 1 c 1 yes no no 0088 M d 1 yes yes no 0082 N e 1 yes no yes 0135 k f yes no no 005 1 M g i no no no 0087 M h yes no no 0046 M Figure 15 RMS disparity errors for cropped image sequence planar region of newspaper The reference image is shown in row a in the disp error column The columns indicate the disparity step the subpixel re nement option Birch eldTomasi s samplinginsensitive matching option the optional initial blur and the RMS disparity error from ground truth T he rst image column shows the computed disparity map the second shows the signed disparity error and the last column shows a histogram of computed disparities 25 RMS disparity error 040 035 030 025 020 015 010 005 000 o rmserrormtegral rmserrorrenned Inverse prediction error middle 122 120 118 116 114 112 110 108 106 predraerrmegral prearaerrrerned ground truth Figure 16 Plots ofRil1S disparity error and inverse prediction errors as a metion of disp step and match interval The even data points are with samplinginsensitive matching matchinterval turned on The second set of plots in each gure is with preprocblur enabled I blur iteration RMS value and the smoother histogram in Figure 15d In all other instances it leads to poorer performance see Fig ure 16a where the samplinginvariant results are the even data points In Figure 15e we investigate whether lightly blurring the input images with a 147 127 14 kernel helps subpixel re nem ent because the rst order Taylor series expansion of the imaging function becomes more valid Blurring does indeed slightly reduce the staircasing effect compare Fig ure 15e to Figure 15c but the overall RMS performance degrades probably because of loss of highfrequency detail We also tried 12 and 14 pixel disparity sampling at the initial matching stages with and without later subpixel re nem ent Subpixel re nement always helps to reduce the RMS disparity error although it has negligible effect on the inverse prediction error Figure 16b From these prediction error plots and also from visual inspection of the inverse warped stabilized image sequence it appears that using subpixel re nement after any original matching scheme is suf cient to reduce the prediction error and the appearance of jitter or shearing to negligible levels This is de spite the fact that the theoretical justi cation for subpixel re nement is based on a quadratic t to an adequately sam pled quadratic energy function At the moment for global methods we rely on the perpixel costs that go into the opti mizationto do the subpixel disparity estimation Alternative approaches such as using local plane ts 5 14 117 could also be used to get subpixel precision Conclusions To eliminate staircasing in the computed disparity map and to also eliminate the appearance of shear ing in reproj ected sequences it is necessary to initially eval uate the matches at a fractional disparity 12 pixel steps ap pear to be adequate This should be followed by nding the minima of local quadratic ts applied to the computed matching costs 7 Overall comparison We close our experimental investigation with an overall com parison of 20 different stereo methods including 5 algo rithms implemented by us and 15 algorithms implemented by their authors who have kindly sent us their results We evaluate all algorithms using our familiar set of Tsukuba Sawtooth Venus and Map images All algorithms are run with constant parameters over all four images We do not perform subpixel estimation in this comparison Among the algorithms in our implementation framework we have selected the following ve 1 SSD 7 21 x 21 shiftable window SSD 2 DP 7 dynamic programming 3 SO 7 scanline optimization 4 GC 7 graphcut optimization and 5 Bay 7 Bayesian diffusion We chose shiftablewindow SSD as bestperforming repre sentative of all local aggregationbased algorithms We are not including simulated annealing here since GC solves the same optimization problem better and more ef ciently For each of the ve algorithms we have chosen xed parame ters that yield reasonably good performance over a variety of input images see Table 4 We compare the results of our implementation with results provided by the authors of the following algorithms 6 Max ow mincut algorithm Roy and Cox 92 91 one of the rst methods to formulate matching as a graph ow problem 7 Pixeltopixel stereo Birch eld and Tomasi 13 scan line algorithm using gradientmodulated costs fol lowed by vertical disparity propagation into unreliable areas SSD DP SO GC Bay Matching cost match fn SD AD AD AD AD Truncation no no no no no Birch eld Tom asi no yes yes yes no Aggregation aggr window size 21 i i i i aggnmin lter 21 i i i i aggriter 1 i i i 1000 di imu i i i i 05 di JigmaP i i i i 04 di epsP i i i i 001 di Jcale cost 7 i i i 001 Optimization opon WTA DP SO GC Bayesian optsmoothness A i 20 50 20 i opt occlusion cost 7 20 i i i opt grad thresh i 8 8 8 i optgradpenalzy i 4 2 2 i Table 4 Parameters for the ve algorithms implemented by us 8 Multiway cut Birch eld and Tomasi 14 alternate segmentation and nding af ne parameters for each segment using graph cuts 9 Cooperative algorithm Zitnick and Kanade 132 a new variant of Marr and Poggio s algorithm 73 10 Graph cuts Boykov et al 23 same as our GC method but a much faster implem entation 11 Graph cuts with occlusions Kolmogorov and Zabih 65 an extension of the graphcut framework that ex plicitly models occlusions 12 Compact windows Veksler 124 an adaptive window technique allowing nonrectangular windows 13 Genetic algorithm Gong and Yang 49 a global opti mization technique operating on quadtrees 14 Realtime method Hirschmi39iller 53 9 X 9 SAD with shiftable windows followed by consistency checking and interpolation of uncertain areas 15 Stochastic diffusion Lee et al 68 a variant of Bayesian diffusion 97 16 Fast correlation algorithm Muhlmann et al 79 an ef cient implementation of correlationbased matching with consistency and uniqueness validation 17 Discontinuitypreserving regularization Shao 104 a multiview technique for virtual view generation 18 Maximumsurface technique Sun 108 a fast stereo algorithm using rectangular subregions 27 19 Belief propagation Sun et al 109 aMRF formulation using Bayesian belief propagation 20 Layered stereo Lin and Tom asi 69 a preliminary ver sion of an extension of the multiwaycut method 14 Some of these algorithms do not compute disparity esti mates everywhere in particular those that explicitly model occlusions 3 and 11 but also 16 which leaves low con dence regions unmatched In these cases we ll un matched areas as described for the DP method in Section 43 Table 5 summarizes the results for all algorithms As in the previous section we report B5 badpixels nonocc as a measure of overall performance as well as B bad pixels textureless and Bp bad pixels discont We do not report B for the Map images since the images are textured almost everywhere The algorithms are listed roughly in order of overall performance The disparity maps for Tsukuba and Venus images are shown in Figures 17 and 18 The full set of disparity maps as well as signed and binary error maps are available on our web site at www middl ebury edustereo Looking at these results we can draw several conclu sions about the overall performance of the algorithms First global optimization methods based on 2D MRFs generally perform the best in all regions of the image overall tex tureless and discontinuities Most of these techniques are based on graphcut optimization 4 8 10 11 20 but belief propagation 19 also does well Approaches that explic itly model planar surfaces 8 20 are especially good with piecewise planar scenes such as Sawtooth and Venus True disparities ll 7 GC occlusions 20 7 Layered stereo 10 7 Graph cuts 4 7 Graph cuts 13 7 Genetic algorithm 9 7 Cooperative alg 2 7 Dynamic progr 14 7 Realtime SAD 3 7 Scanline opt 7 7 Pixeltopixel stereo 5 7 Bayesian diffusion 8 7 Multiway cut 17 7 Discpres regul 16 7 Fast Correlation Figure 17 Comparative results on the T sukuba images The results are shown in decreasing order of overall performance Ba Algorithms implemented by us are marked with a star 28 k True disparities 8 Multiway cut 19 Belief propagation 20 Layered stereo 14 Realtime SAD 12 Compact windows 10 Graph cuts 4 Graph cuts 3 6 Max flow 15 Stochastic diffusion 13 Genetic algorithm 9 Cooperative alg 11 GC occlusions 1 SSDlF 18 Max surface tn an 17 Discpres regul 7 Pixeltopixel stereo 16 Fast Correlation 3 Scanline opt Figure 18 Comparative results on the Venus images The results are shown in decreasing order of overall performance B5 Algorithms implemented by us are marked with a star 29 Tsukuba Sawtooth Venus Map B5 B Bp B5 B Bp B5 B Bp B5 Bp 20 Layered 158 3 106 4 882 3 034 1 000 1 335 1 152 3 29610 262 2 037 6 524 6 4 Graphcuts 194 5 109 5 949 5 130 6 006 3 634 6 179 7 261 8 691 4 031 4 388 4 19 Beliefprop 115 1 042 1 631 1 098 5 030 5 483 5 100 2 076 2 913 6 08410 527 7 11 GCocc1 127 2 043 2 690 2 036 2 000 1 365 2 27912 53913 254 1 17913 100812 10 Graphcuts 186 4 100 3 935 4 042 3 014 4 376 3 169 6 230 6 540 3 23916 93510 8 Multiw cut 80817 65314 253318 061 4 046 8 460 4 053 1 031 1 806 5 026 3 327 3 12 Compactwin 336 8 354 8 1291 9 161 9 045 7 787 7 167 5 218 4 1324 9 033 5 394 5 14 Realtime 42512 44712 150513 132 7 035 6 921 8 153 4 180 3 1233 7 081 9 113515 5 Bay diff 64916 116219 1229 7 145 8 072 9 929 9 40014 72116 183913 020 1 249 2 9 Cooperative 349 9 365 9 147711 20310 22914 134113 25711 35211 263817 022 2 237 1 1 SSDMF 52315 38010 246617 22111 07210 139715 37413 68215 1294 8 066 8 93510 15 Stoch diff 39510 40811 154915 24514 09011 105810 245 9 241 7 218415 13112 779 9 13 Genetic 296 6 266 7 149712 22112 27616 139614 24910 289 9 230416 10411 109114 7 Pixtopix 51214 70617 146210 23113 17912 149317 63017 113718 145710 050 7 683 8 6 Max ow 298 7 200 6 151014 34715 30017 141916 216 8 224 5 217314 31317 159818 3 Scanl opt 50813 67815 1194 6 40616 26415 119011 94419 145919 182012 18414 102213 2 Dyn prog 41211 46313 1234 8 48419 37119 132612 1010 20 150120 171211 33318 140417 17 Shao 96718 70416 3563 19 425 17 31918 3014 20 60116 67014 439120 23615 330120 16 FastCorrel 97619 1385 20 243916 47618 18713 224918 64818 103617 312918 84220 126816 18 Max surf 111020 107018 4199 20 55120 556 20 273919 43615 47812 411319 41719 278819 Table 5 Comparative performance of stereo algorithms using the three performance measures Ba badpixels nonocc B badpixelstextureless and BD badpixels discont All algorithms are run with constant parameter settings across all images The small numbers indicate the rank of each algorithm in each column The algorithms are listed rmghly in decreasing order of overall performance and the minimum best value in each column is shown in bold Algorithms implemented by us are marked with a star Next cooperative and diffusionbased methods 5 9 15 do reasonably well but often get the boundaries wrong espe cially on the more complex Tsukuba images On the highly textured and relatively simple Map sequence however they can outperform some of the full optimization approaches The Map sequence is also noisier than the others which works against algorithms that are sensitive to internal pa rameter settings In these experiments we asked everyone to use a single set of parameters for all four datasets Lastly local 1 12 14 16 and scanline methods 2 3 7 perform less well although 14 which performs additional consistency checks and cleanup steps does reasonably well as does the compact window approach 12 which uses a so phisticated adaptive window Simpler local approaches such as SSDMF 1 generally do poorly in textureless areas if the window size is small or near discontinuities if the win dow is large The disparity maps created by the scanline based algorithms DP and SO are promising and show a lot of detail but the larger quantitative errors are clearly a result of the streaking due to the lack of interscanline consistency 30 To demonstrate the importance of parameter settings Ta ble 6 compares the overall results B5 of algorithms 15 for the xed parameters listed in Table 4 with the best results when parameters are allowed to vary for each image Note that we did not perform a true optimization over all param eters values but rather simply chose the overall best results among the entire set of experiments we performed It can be seen that for some of the algorithms the performance can be improved substantially with different parameters The global optimization algorithms are particularly sensitive to the parameter A and DP is also sensitive to the occlusion cost parameter This is consistent with our observations in Section 63 Note that the Map image pair can virtually be solved using GC Bay or SSD since the images depict a simple geometry and are well textured More challenging data sets with many occlusions and textureless regions may be useful in future extensions of this study Finally we take a brief look at the ef ciency of dif ferent methods Table 7 lists the image sizes and num ber of disparity levels for each image pair and the run ning times for 9 selected algorithms Not surprisingly Tsukuba Sawtooth Venus Map xed best xed best xed best xed best 1 SSD 523 523 221 155 374 292 066 022 2 DP 412 382 484 370 1010 913 333 121 3 SO 508 466 406 347 944 831 184 104 4 GO 194 194 130 098 179 148 031 009 5 Bay 649 649 145 145 400 400 020 020 Table 6 Overall performance Ba badpixelSnonocc for algorithms 175 both using xed parameters across all images and best parameters for each image Note that for some algorithms signi cant performance gains are possible parameters are allowed to vary for e ach image the speedoptimized methods 14 and 16 are fastest fol lowed by local and scanlinebased methods 1 7 SSD 2 7 DP 3 7 SO Our implementations of Graph cuts 4 and Bayesian diffusion 5 are several orders of magni tude slower The authors implementations of the graph cut methods 10 and 11 however are much faster than our implementation This is due to the new max ow code by Boykov and Kolmorogov 21 which is available at www cs cornell eduPeopIevrrk software html In summary if ef ciency is an issue a simple shiftable window method is a good choice In particular method 14 by Hirschmi39iller 53 is among the fastest and produces very good results New implementations of graphcut methods give excellent results and have acceptable running times Further research is needed to fully exploit the potential of scanline methods without sacri cing their ef ciency 8 Conclusion In this paper we have proposed a taxonomy for dense two frame stereo correspondence algorithms We use this tax onomy to highlight the most important features of existing stereo algorithms and to study important algorithmic com ponents in isolation We have implemented a suite of stereo matching algorithm components and constructed a test har ness that can be used to combine these to vary the algo rithm parameters in a controlled way and to test the perfor mance of these algorithm on interesting data sets We have also produced some new calibrated multiview stereo data sets with handlabeled ground truth We have perform ed an extensive experimental investigation in order to assess the impact of the different algorithmic components The ex periments reported here have demonstrated the limitations of local methods and have assessed the value of different global techniques and their sensitivity to key parameters We hope that publishing this study along with our sample code and data sets on the Web will encourage other stereo researchers to run their algorithms on our data and to report their comparative results Since publishing the initial ver sion of this paper as a technical report 98 we have already received experimental results disparity maps from 15 dif 31 ferent research groups and we hope to obtain more in the future We are planning to maintain the on line version of Table 5 that lists the overall results of the currently best performing algorithms on our web site We also hope that some researchers will take the time to add their algorithms to our framework for others to use and to build upon In the long term we hope that our efforts will lead to some set of data and testing methodology becoming an accepted stan dard in the stereo correspondence community so that new algorithms will have to pass a litmus test to demonstrate that they improve on the state of the art There are many other open research questions we would like to address How important is it to devise the right cost function e g with better gradientdependent smooth ness terms in global optimization algorithms vs how im portant is it to nd a global minimum What are the best samplinginvariant matching metrics What kind of adap tiveshiftable windows work best Is it possible to auto matically adapt parameters to different images Also is prediction error a useful metric for gauging the quality of stereo algorithms We would also like to try other existing data sets and to produce some labeled data sets that are not all piecewise planar Once this study has been completed we plan to move on to study multiframe stereo matching with arbitrary cam era geometry There are many technical solutions possi ble to this problem including voxel representations layered representations and multiview representations This more general version of the correspondence problem should also prove to be more useful for imagebased rendering applica trons Developing the taxonomy and implementing the algo rithmic framework described in this paper has given us a much deeper understanding of what does and does not work well in stereo matching We hope that other researchers will also take the time to carefully analyze the behavior of their own algorithms using the framework and methodology developed in this paper and that this will lead to a deeper understanding of the complex behavior of stereo correspon dence algorithms Tsukuba Sawtooth Venus Map width 384 434 434 284 height 288 380 383 216 disparity levels 16 20 20 30 Time seconds 14 iRealtime 01 02 02 01 16 iEf cient 02 03 03 02 1 isSDMF 11 15 17 08 27DP 10 18 19 08 3750 11 22 23 13 10 iGC 236 483 513 223 11 7 GCocclusions 698 1544 2399 640 4 7 GC 6620 7350 8290 4800 5 eBay 10550 20490 20470 12360 Table 7 Running times ofselected algorithms on thefour image pairs Acknowledgements Many thanks to Ramin Zabih for his help in laying the foundations for this study and for his valuable input and suggestions throughout this project We are grateful to Y Ohta andY Nakamura for supplying the groundtruth im agery from the University of Tsukuba and to Lothar Her mes for supplying the synthetic images from the Univer sity of Bonn Thanks to Padma Ugbabe for helping to la bel the image regions and to Fred Lower for providing his paintings for our image data sets Finally we would like to thank Stan Birchfreld Yuri Boykov lIinglung Gong Heiko Hirschmi39iller Vladimir Kolm ogorov Sang Hwa Lee lIike Lin Karsten Muhlm ann S bastien Roy Juliang Shao Harry Shum Changming Sun Jian Sun Carlo Tomasi Olga Vek sler and Larry Zitnick for running their algorithms on our data sets and sending us the results and Gary Bradski for helping to facilitate this comparison of stereo algorithms This research was supported in part by NSF CAREER grant 9984485 References 1 P Anandan A computational framework and an algorithm for the measurement ofvisual motion UCV 232837310 l 989 2 R D Arnold Automated stereo perception Technical Re port AIM351 Arti cial Intelligence Laboratory Stanford University 1983 3 H Baker and T Binford Depth from edge and intensity based stereo In UCAI8I pages 631436 1981 4 H H Baker Edge based stereo correlation In L S Bau mann editor Image Understanding Workshop pages 1687 175 Science Applications International Corporation 1980 5 S Baker R Szeliski and P Anandan A layered approach to stereo reconstruction In CWR pages 4347441 1998 32 6 S T Barnard Stochastic stereo matching over scale UCV 31177321989 S T Barnard and M A Fischler Computational stereo ACM Comp Surveys l445537572 1982 J L Barron D J Fleet and S S Beauchemin Performance of optical ow techniques UCV 12143777 1994 P N Belhurneur A Bayesian approach to binocular stere opsis IJCV1932377260 1996 P N Belhurneur and D Mumford A Bayesian treatment of the stereo correspondence problem using halfoccluded regions In CWR pages 5067512 1992 J R Bergen P Anandan K J Hanna and R Hingorani Hi erarchical modelbased motion estimation InECCV pages 2377252 1992 S Birch eld and C Tomasi Depth discontinuities by pixel to pixel stereo In ICCV pages 107371080 1998 S Birch eld and C Tomasi A pixel dissimilarity mea sure that is insensitive to image sampling IEEE TPAMI 2044017406 1998 S Birch eld and C Tomasi Multiway cut for stereo and motion with slanted surfaces In ICCV pages 4897495 1999 M J Black and P Anandan A framework for the robust estimation of optical ow In ICCV pages 2317236 1993 M J Black and A Rangarajan On the uni cation of line processes outlier rejection and robust statistics with appli cations in early vision UCV 19157791 1996 A Blake and A Zisserman Visual Reconstruction MIT Press Cambridge MA 1987 A F Bobick and S S Intille Large occlusion stereo UCV 333 1817200 1999 R C Bolles H H Baker and M J Hannah The JISCT stereo evaluation In DARPA Image Understanding Work shop pages 2637274 1993 20 R C Bolles H H Baker and D H Marimont Epipolar plane image analysis An approach to determining structure from motion IJCV 1755 1987 21 Y Boykov and V Kolmogorov A new algorithm for energy 7 8 9 10 11 12 13 14 15 16 17 18 19 minimization with discontinuities In Intl Workshop on En ergyMinimization Methods in Computer Vision and Pattern Recognition pages 2057220 2001 Y Boykov O Veksler and R Zabih A variable window approach to early vision IEEE TPAMI 2012128371294 1998 Y Boykov O Veksler and R Zabih Fast approxi mate energy minimization via graph cuts IEEE TPAMI 2311l222il239 2001 A Broadhurst T Drurnmond and R Cipolla A probabilis tic frarnework for space carving In ICCV volume 1 pages 3887393 2001 L G Brown A survey of image registration techniques Computing Surveys 2443257376 1992 P J Burt and E H Adelson The Lap1acian pyramid as a compact image code IEEE Transactions on Communica tions COM3145327540 1983 J F Canny A computational approach to edge detection IEEE TPAMI 8634413 1986 P B Chou and C M Brown The theory and practice of Bayesian image labeling IJCV 431857210 1990 S D Cochran and G Medioni 3D surface description from binocular stereo IEEE TPAMI 14109817994 1992 30 R T Collins A spacesweep approach to true multiimage matching In CWR pages 3587363 1996 31 I J Cox S L Hingorani S B Rao and B M Maggs A maximum likelihood stereo algorithm CVIU 6335427 567 1996 I J Cox S Roy and S L Hingorani Dynamic his togram warping of image pairs for constant image bright ness In IEEE International Conference on Image Process ing ICE3 95 volume 2 pages 3667369 IEEE Computer Society 1995 B Culbertson T Malzbender and G Slabaugh Generalized voxel coloring In International Workshop on Vision Algo rithms pages 1007114 Kerkyra Greece 1999 Springer 34 J S De Bonet and P Viola Poxels Probabilistic voxelized volume reconstruction In ICCV pages 4184125 1999 R Deriche Fast algorithms for lowlevel vision IEEE TPAMI l2l78787 1990 P Dev Segmentation processes in visual perception A cooperative neural model COINS Technical Report 74C5 University of Massachusetts at Amherst 1974 U R Dhond and J K Aggarwal Structure from stereoi a review IEEE Trans on Systems Man and Cybern l96l489il510 1989 O Faugeras and R Keriven Variational principles surface evolution PDE s level set methods and the stereo problem IEEE Trans Image Proc 733367344 1998 39 O Faugeras and QT Luong The Geometry ofMultiple Images MIT Press Cambridge MA 2001 40 D J Fleet A D Jepson and M R M Jenkin Phasebased disparity measurement CVGIP 5321987210 1991 41 T Frohlinghaus and J M Buhrnann Regularizing phase based stereo In ICPR volume A pages 4517455 1996 42 P Fua Aparallel stereo algorithm that produces dense depth maps and preserves image features Machine Vision and Applications 635749 1993 22 23 24 25 26 27 28 29 32 33 35 36 37 38 33 43 P Fua andY G Leclerc Objectcentered surface reconstruc tion Combining multiimage stereo and shading IJCV 1635756 1995 44 E Gamble andT Poggio Visual integration and detection of discontinuities the key role of intensity edges A 1 Memo 970 A1 Lab MIT 1987 45 D Geiger and F Girosi Parallel and deterministic algo rithms for MRF s Surface reconstruction IEEE TPAMI l3540174l2 1991 46 D Geiger B Ladendorf and A Yuille Occlusions and binocular stereo In ECCV pages 4257433 1992 47 S Geman and D Geman Stochastic relaxation Gibbs dis tribution and the Bayesian restoration of images IEEE TPAMI 6672li74l 1984 48 M A Gennert Brightnessbased stereo matching In ICCV pages 1397143 1988 49 M Gong and YH Yang Multibaseline stereo matching using genetic algorithm In IEEE Workshop on Stereo and MultiBaseline Vision 2001 IJCV this issue 50 W E L Grimson Computational experiments with a feature based stereo algorithm IEEE TPAMI 7117734 1985 51 M J Hannah Computer Matching ofAreas in Stereo Im ages PhD thesis Stanford University 1974 R I Hartley and A Zisserrnan Multiple View Geometry Cambridge University Press Cambridge UK 2000 H Hirschm ller Improvements in realtime correlation based stereo vision In IEEE Workshop on Stereo andMulti Baseline Vision 2001 IJCV this issue Y C Hsieh D McKeown and F P Perlant Performance evaluation of scene registration and stereo matching for car tographic feature extraction IEEE TPAMI 1422147238 1992 H Ishikawa and D Geiger Occlusions discontinuities and epipolar lines in stereo InECCV pages 2327248 1998 M R M JenkinA D Jepson and J K Tsotsos Techniques for disparity measurement CVGH Image Understanding 53114730 1991 D G Jones and J Malik A computational framework for determining stereo correspondence from a set of linear spa tial lters In ECCV pages 3957410 1992 58 T Kanade Development of avideorate stereo machine In Image Understanding Workshop pages 5497557 Monterey CA 1994 Morgan Kaufmann Publishers T Kanade et al A stereo machine for videorate dense depth mapping and its new applications In CWR pages 1967202 1996 T Kanade and M Okutomi A stereo matching algorithm with an adaptive window Theory and experiment IEEE TPAMI l699207932 1994 S B Kang R Szeliski and J Chai Handling occlusions in dense multiview stereo In CWR 2001 S B Kang J Webb L Zitnick and T Kanade A multi baseline stereo system with active illumination and realtime image acquisition In ICCV pages 88793 1995 M Kass Linear image features in stereopsis 143577368 1988 64 R Kimura et al A convolverbasedrealtime stereo machine SAZAN In CVPR volume 1 pages 4577463 1999 52 53 54 55 56 57 59 60 61 62 631 IJCV 65 V Kolmogorov and R Zabih Computing visual correspon dence with occlusions using graph cuts In ICCV volume II pages 5087515 2001 66 K N Kutulakos Approximate Nview stereo In ECCV volume 1 pages 67783 2000 K N Kutulakos and S M Seitz A theory ofshape by space carving IJCV 3831997218 2000 S H LeeY Kanatsugu andJI Park Hierarchical stochas tic diffusion for disparity estimation In IEEE Workshop on Stereo andMultiBaseline Vision 2001 IJCV this issue M Lin and C Tomasi Surfaces with occlusions from lay ered stereo Technical report Stanford University 2002 In preparation C Loop andZ Zhang Computing rectifying homographies for stereovision In CJPR volume I pages 1257131 1999 B D Lucas and T Kanade An iterative image regis tration technique with an application in stereo vision In Seventh International Joint Conference onArti cial Intelli gence UCAI 8I pages 6747679 Vancouver 1981 D Marr Vision W H Freeman and Company New York 1982 D Marr and T Poggio Cooperative computation of stereo disparity Science 1942837287 1976 D C Marr andT Poggio A computational theory of human stereo vision Proceedings of the Royal Society ofLondon B 2043017328 1979 J Marroquin S Mitter andT Poggio Probabilistic solution of illposed problems in computational vision Journal of theAmerican StatisticalAssociation 8239776789 1987 J L Marroquin Design of cooperative networks Working Paper 253 A1 Lab MIT 1983 L Matthies R Szeliski and T Kanade Kalman lterbased algorithms for estimating depth from image sequences IJCV 32097236 1989 78 A Mitiche and P Bouthemy Computation and analysis of image motion A synopsis of current problems and methods IJCV l9l29755 1996 K Muhlmann D Maier J Hesser and R Manner Cal culating dense disparity maps from color stereo images an e icient implementation In IEEE Workshop on Stereo and MultiBaseline Vision 2001 IJCV this issue J Mulligan V Isler and K Daniilidis Performance evalu ation of stereo for telepresence In ICCV volume II pages 5587565 2001 Y Nakamura T Matsuura K Satoh andY Ohta Occlusion detectable stereo i occlusion patterns in camera matrix In CVPR pages 3717378 1996 H K Nishihara Practical realtime imaging stereo matcher Optical Engineering 2355367545 1984 Y Ohta and T Kanade Stereo by intra and inter scanline search using dynamic programming IEEE TPAMI 721397154 1985 M Okutorni and T Kanade A locally adaptive window for signal matching IJCV 721437162 1992 M Okutorni and T Kanade A multiplebaseline stereo IEEE TPAMI 1543537363 1993 86 M Otte and HH Nagel Optical ow estimation advances and comparisons In ECCV volume 1 pages 5140 1994 67 68 69 70 71 72 73 74 75 76 77 79 80 81 82 83 84 85 34 87 T Poggio V Torre and C Koch Computational vision and regularization theory Nature 31760353147319 1985 S B Pollard J E W Mayhew and J P Frisby PMF A stereo correspondence algorithm using a disparity gradient limit Perception 144497470 1985 K Prazdny Detection of binocular disparities Biological Cybernetics 52293799 1985 L H Quarn Hierarchical warp stereo In Image Understanding Workshop pages 1497155 New Orleans Louisiana 1984 Science Applications International Corpo ration S Roy Stereo without epipolar lines A maximum ow formulation IJCV 3423l47il6l 1999 92 S Roy and I J Cox A maximum ow formulation of the Ncamera stereo correspondence problem In ICCV pages 4927499 1998 93 T W Ryan R T Gray and B R Hunt Prediction of cor relation errors in stereopair images Optical Engineering l933l27322 1980 94 H Saito and T Kanade Shape reconstruction in projective grid space fromlarge number of images In CJPR volume 2 pages 4954 1999 95 D Scharstein Matching images by comparing their gradient elds In ICPR volume 1 pages 5727575 1994 96 D Scharstein View Synthesis Using Stereo Vision vol ume 1583 ofLecture Notes in Computer Science LNCS SpringerVerlag 1999 D Scharstein and R Szeliski Stereo matching with nonlin ear diffusion IJCV 282 1557174 1998 D Scharstein and R Szeliski A taxonomy and evaluation of dense twoframe stereo correspondence algorithms Tech nical Report MSRTR200181 Microso Research 2001 D Scharstein R Szeliski and R Zabih A taxonomy and evaluation of dense twoframe stereo correspondence algo rithms In IEEE Workshop on Stereo and MultiBaseline Vision 2001 P Seitz Using local orientation information as image prim itive for robust object recognition In SPIE Visual Com munications and Image Processing IV volume 1199 pages 163071639 1989 S M Seitz and C M Dyer Photorealistic scene reconstruc tion by voxel coloring IJCV 3521723 1999 J Shade S Gortler LW He and R Szeliski Layered depth images In SIGGRAPH pages 2317242 1998 J Shah A nonlinear diffusion model for discontinuous dis parity and halfocclusion in stereo In CJPR pages 34410 1993 J Shao Combination of stereo motion and rendering for 3d footage display In IEEE Workshop on Stereo andMulti Baseline Vision 2001 IJCV this issue M Shimizu and M Okutorni Precise subpixel estimation on areabased matching In ICCV volume 1 pages 90797 2001 HY Shum and R Szeliski Stereo reconstruction from multiperspective panoramas In ICCV pages 14721 1999 107 E P Simoncelli E HAdelson andD J Heeger Probability distributions of optic ow In CJPR pages 3107315 1991 108 C Sun Rectangular subregioning and 3d maximum 88 89 90 91 97 98 99 100 101 102 103 104 105 106 surface techniques for fast stereo matching In IEEE Work shop on Stereo andMultiBaseline Vision 2001 IJCV this issue 109 J Sun H Y Shum andN N Zheng Stereo matching using belief propagation InECCV 2002 Submitted 110 R Szeliski BayesianModeling ofUneertainty inLow Level Vision Kluwer Academic Publishers Boston MA 1989 111 R Szeliski Prediction error as a quality metric for motion and stereo In ICCV pages 7817788 1999 112 R Szeliski and J Coughlan Splinebased image registra tion IJCV 223l9972l8 1997 113 R Szeliski and P Golland Stereo matching With trans parency and matting IJCV 321454l 1999 Special Issue for Mair Prize papers 1 l 4 R Szeliski and G Hinton Solving randomdot stereograms using the heat equation In CVPR pages 2847288 1985 115 R Szeliski and D Scharstein Symmetric subpixel stereo matching In ECCV 2002 Submitted 116 R Szeliski and R Zabih An experimental comparison of stereo algorithms In International Workshop on VisionAl gorithms pages 1719 Kerkyra Greece 1999 Springer 117 H Tao H SaWhney and R Kumar A global matching framework for stereo computation InICCV volume I pages 5327539 2001 118 M Tekalp Digital Video Processing Prentice Hall Upper Saddle River NJ 1995 119 D Terzopoulos Regularization of inverse visual prob lems involving discontinuities IEEE TPAMI 844l34124 1986 120 D Terzopoulos and K Fleischer Deformable models The Visual Computer 463067331 1988 121 D Terzopoulos and D Metaxas Dynamic 3D models With local and global deformations Deformable superquadrics IEEE TPAMI l3770377l4 1991 122 Q Tian and M N Huhns Algorithms for subpixel registra tion CVGH 352207233 1986 123 O Veksler E eient Graphbased Energy Minimization Methods in Computer Vision PhD thesis Comell Univer sity 1999 124 O Veksler Stereo matching by compact Windows via mini mum ratio cycle In ICCV volume I pages 5407547 2001 125 J Y A Wang and E H Adelson Layered representation for motion analysis In CWR pages 3617366 1993 126 A Witkin D Terzopoulos and M Kass Signal matching through scale space UCV ll33il44 1987 127 Y Yang A Yuille and J Lu Local global and multilevel stereo matching In CVPR pages 2747279 1993 128 A L Yuille and T Poggio A generalized ordering constraint for stereo correspondence AI Memo 777 Al Lab MIT 1984 129 R Zabih and J Wood ll Nonparametric local transforms for computing visual correspondence In ECCV volume II pages 1517158 1994 130 Z Zhang Determining the epipolar geometry and its uncer tainty ArevieW UCV 2721617195 1998 131 Z Zhang A exible new technique for camera calibration IEEE TPAMI 221 1 133071334 2000 132 C L Zitnick and T Kanade A cooperative algorithm for 35 stereo matching and occlusion detection IEEE TPAMI 2276757684 2000
Are you sure you want to buy this material for
You're already Subscribed!
Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'