### Create a StudySoup account

#### Be part of our community, it's free to join!

Already have a StudySoup account? Login here

# Computer Vision EECS 841

KU

GPA 3.74

### View Full Document

## 26

## 0

## Popular in Course

## Popular in Elect Engr & Computer Science

This 213 page Class Notes was uploaded by Melissa Metz on Monday September 7, 2015. The Class Notes belongs to EECS 841 at Kansas taught by Brian Potetz in Fall. Since its upload, it has received 26 views. For similar materials see /class/186758/eecs-841-kansas in Elect Engr & Computer Science at Kansas.

## Popular in Elect Engr & Computer Science

## Reviews for Computer Vision

### What is Karma?

#### Karma is the currency of StudySoup.

#### You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 09/07/15

EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 10 Active Contours Energy Models Suggested Reading D H Ballard C M Brown quotComputer Vision Sect 45 A Approach I L G Shapiro G C Stockman quotComputer Vision Section 143 Energy Model Kass Witkin Terzopoulos quotSnakes Active Contour Models IJCV 1988 Energy Model From Edges to Weighted Graphs One way to de ne a directed graph Directed graph Original image Gradient magnitude SEEE lVIxyl Let the cost of each arc be M V1an where Xy is the point on the image corresponding to the destination node of the arc Finding the LeastCost Path The A algorithm 1 Initialize the queue with the path from the source vertex to itself 2 Until the rst path in the queue reaches the destination vertex i Remove the rst path from the queue For each neighbor of the last node in this path create a new path If a new path terminates in a node that has already been explored and no path in the queue terminates in that node delete that new path 39 If a new path terminates in a node that has already been explored and there is a path in the queue that terminates in that node delete the path that has the greatest cost Sort the queue by the cost of the paths 2 Improving Search Performance When performing A let the cost of each path be the sum of the weights of all arcs traversed plus the some lowerbound estimate of the cost of traversing the remaining distance to the destination node in fact the search algorithm is not called A unless it uses this technique minwj e gt 0 Cost1 2 acn w12 w23 wgnil gn E 39 ibdestz39natz39on Energy Minimization De ne a continuous curve through the image 718 01 gt R2 718 948 W Minimize some energy function of that curve 1 Ev Fsvvv ds 0 Typical Snake Energies Numerical Solution Elasticity Eelastz c39U O as lvsMst 1 1 Euler Lagrange Equation Stiffness Estiffnessv 50 38 lv sl2ds 1 If Ev fFsvv39vquotds is minimized then Edge PrOXImIty Ewe e lVIzsyltsl2ds 10 a 62 User interaction Eusem O Userzsysds F FI EEquot 0 6 62 Fy ngr as 2Fyu 0 Exampe Facial Tracking Using Snakes Conditional Distribution PI G Marginal PI Gradient Magnitude ML 7 ans U 05 7 D 04 n is 39 JAEquot A Pixel 2 HRe j tiiqu A K K J Intensity quot39F391 V hist fullhist2DGM2 im 0 r39 in 1a ll 25 an 35 u 45 hist flipud hist hist histsumhist PiligiveniG hist repmat surn hist2 1 sizehist2 platsumh15t39 1 imshowP717giveniG 7 8 Marginal PG Conditional Probab39 ty Definition am Iannd B are events in S and PB gt 0 then the i conditional probability of A given B written PAB is U035 we I i A i PA B tins 4 lt7r 39 rquot i A 39 1 W 0025 I j DEIZ ears 1 i i The Chain Rule am I A simple rearrangement of the above equation yields quot quotquot Tk3 z n 2 s 5n 3 sin 4 5 5 P A B P A B P B a plot surn hist2 9 Cut n 11 12 Overfitting Density How to Make a Ba es Net PMpg mpg maueiyear maker ES If this ever happens It means bad 70W mime U I P9 d there are ain combinations we mm bad eumpe omsmstI PHorselmpg that we learn are osslble mm mm mm asia 0025mm 9 10wgood o90 PAccellMpgHorse Europe 00357143 P lowl bad 021 M anss1224 Phighlgood 010 Pslowlgodd 10W 097 m Neva Phighl bad 079 Pslowlgoodhigh 015 euvupe Never Pslowl bad low 090 good man Emerita umozmrl Pslowl badhigh 005 asia mum Pfastlgood 10W 003 eurape 00459154 Pfastlgoodhigh 085 75m amenca 0030512quot PfaStl bad low 03910 Pfastl badhigh 095 asre a meme amps 90357sz 781033 asra o n7142gt35 wrupe 00353143 Cm 39 ll ll Cal 39 ilI II How to Make a Bayes Net How to Make a Bayes Net P Mpg P Mpg P 900d P good Thank you domain expert P bad Thank you domain expert P bad Now only need to P Horse mpg Now only need to P Horse IMpg learn 5 parameters learn 5 parameters P lowlgood instead of 7 from my data P lowl bad 83921 instead of 7 from my data 2 129223 3392 Phighgood 011 Phighgood 011 Phigh bad 079 o I Phigh bad My parameter estimates My parameter estimates Will be more accurate as Will be more accurate as a result a result P Accel Horse P Accel Horse Pslowl low 022 Pslow low 093 same 32 2h2 33 Pfastlhigh 0 36 Pfasthigh 081 CamegieMellon CarnegieMellon Bayes Nets in Computer Vision Bayes Nets in Computer Vision Yuille s Deformable Eye Template The First Eye Template imagey A Whites l l image x 65 n k B Typical Snake Energies S a 65 as a ayes Net Elasticity amok asv s2ds Stiffness Eswfnessv 0 55 Iv sl2ds Q 0 Edge Proximity Eedgew 01IVI5ySi2d5 0 User interaction Emmi1Userrsysds a 0 Snakes as a Bayes Net P011 1N Snakes as a Bayes Net Pv11N Pv1v2 1371 1372 i711 Snakes as a Bayes Net P011 1N Pv1v2 1 CWMD PM P011 Z8 Snakes as a Bayes Net P011 1N Pv1v2 PW1 13in 5Cmm2 Cvi h Ced92v1gtv2 Cum 0102 Celastzcv1gtv2 1 Cemmm 7 V1tv1 1 itv2 2dt 0 1 Cusemv1v2 7 U56Ttv1 1 itv2dt 0 Celastzcv1v2 ivz ivli Snakes as a Bayes Net P011 1N Pv2iv1 7P gzvcwz p 1 43042ng veivz Z5 pwvgy 5Cmms Solving Snakes Vla Dynamic Programming 0 Cv1v2 Ced92v1v2 Cuswltv1gtv2gt Celastzcv1v2 C17 Cv1gt1 2 Cvz a Cva hi 0 0mm min min min min 05 11 112 v m Solving Snakes Via Dynamic Programming C 1gt 2 09d99v1gt12 CusemltU1gtU2gt Cgmmv1v2 C17 Cv1v2 Cv2vg C1314 0mm ruin minimin minC17 m 42 u m minmin Cv1v2 mjnCv2vg I l CU3U4 m ng u m Solving Snakes Vra Dynamic Programming 0 Cltv1gtv2gt Cemltv1gtv2gt Cwmm Cmmom C77 Cv1gtv2 Cvz a Cve hi 0mm nynnginngjnngin C17 133111013111 004710 91HltCltU2Ug gln0vg7v4 rnjnrnin Cv1v2 rnjnCv2vg CAM u 112 113 Solving Snakes Via Dynamic Programming CUN Ced92vigtv2 Cusemv112 02mm v1 v2 C17 Cv1v2 Cv2vg 001374 0mm ruin minimin rnanw 111111011111 Cvwg minCv2vg minCltvgv4 minltmin 0mm mmlt0ltvmgt Clog minmin0v1v2 Celt 2 m 112 Solving Snakes Vra Dynamic Programming 0 Cv1v2 Gamma cummm celmmbvg C17 C 1gt 2 Cvzwg 00va 0 0mm 11431111 min Hin g C17 ngn 39gin Cv112 3 n0v2vg ngnqvbv Weincvwn ngnlt0ltv2vegt Cm a 113mg com 03 For each intermediate function Cvi71 store both the a minimal cost and the best v for each Vr l Go back down the chain to nd each optimal v Solving Snakes Via Dynamic Programming Cltv1gtv2gt Ced92v1gtv2 Ctmv1v2 cmmcnng C17 Cv1v2 Cv2vg Cvgv4 0mm ruin minimin minC17 111111011111 Cvwg ll CU2Ug H nCvgv4 111111011111 C 1gt 2 min0v2vg 04013 minrnin Cv1v2 031 For each intermediate function Cvi71 store both the minimal cost and the best V for each Vii Go back down the chain to nd each optimal v This technique will Work for all noncyclic Bayes nets Belief Propagation You have just derived a powerful statistical inference technique called Belief Propagation Belief Propagation works by passing messages up a tree from the leaves and then down from the root 771194960 13111 44 9 mw xj kEAjz39 Before we get into the details lets generalize our models of probability EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 14 Principal Component Analysis Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approach Chapter 76 Forsyth amp Ponce quotComputer Vision A Modern Approach Chapter 221 223 Filtering Images as Points Feature detection E x1 Points in KXL dimensional space i quoth nyL Matching involves deciding how far apart they are in this space Negative Sum Squared Error SSW y 2W 3 gm z y m2 13339 W onessizetemplate NSSE 2filter2template im filter2w imA2 Normalized Correlation UUzlullvlcose cos6 W onessizetemplate NSSE filter2template im sqrtfilter2w imA2 Images as Points Feature Detection Images as Points Feature Detection X randn500l Distribution of PXy 40 00 y1xg 4 p 1 00 40 22 x y 37 scatterZ2l Z22 39r39 2 COWZZ gt1 80 00 09701 09596 1ndependent 22 09596 12049 0 OI 4 0 corrcoefZ2 4 10000 08876 40 10 08876 10000 2 1 dependent 23 4 2 0 2 4 10 40 X m X randn500l X randn500l g randn500l 05 1 g randn500l 05 1 ylxg7 ylxg7 Xhatinvvec Z2 X Y Distribution of PXy Z2 X Y scatterZ2l Z22 39r39 1 scatterZ2l Z22 39r39 4 Xhat X meanx y meany Xhat X meanx y meany S Xhat Xhat S Xhat Xhat L vec val eigS 2 vec val eigS h vec gt 17 vec 0 08723 04890 08723 04890 04890 08723 0 04890 08723 h 2 val 4 val 2243 0 2243 0 0 14420 392 0 14420 4 A g 4 4 2 0 2 4 vec val vec S X vec val vec S 1 1 1 1 1 1 1 1 1n n Eigenvectors x randn500l g randn500l 05 l l x y g39 Xhat Invvec sqrtnvva Z2 X Y scatterZ2l Z22 39r39 M 1 t h o gtk X at x meanx y meany 003 S Xhat Xhat we in k I X vec val e1gS 004 Hy rat 1 FW 002 Exziuwigr H at M vec u 23 MEWEm galng t O8723 04890 m H gm 1E Q raifk t 4 jgiffgw 04890 08723 4104 2 a HH g 4 008 if E h F 4 val 03903 1 i 1 2243 O 701 0 14420 n1 nos n n55 01 135 vec val vec S l l l l Line Fitting using PCA X1124223129394152 X0 meanX39 A Xc 1 x01 X0 2 x02 U s eigA A nd the larger eigenvalue in S and extract from U the correct eigenvector s i maXdiagS a V i al X0 4 0 a a2 X0 4 0 a el a1 a239 plotell el2 39r39 this is the line that reduce the shortest distance from the point to the line Dimensional Reduction We can throw V3 away and keep wVl V2 and can still represent the information almost equally well V1 and V2 also provide good dimensions in which different objects textures form nice clusters in this 2D space Suggested Reading Shape from Shading EECS 841 Computer VlSlon Zhang Tsai Cryer and Shah quotShape from shading A surveyquot IEEE Trans Pattern Analysis Brian POtEtZ and Machine Intelligence 218690 706 1999 Fall 2008 Lecture 22 Shape fromShading Homework 2 Homework 2 Homework 2 I Homework 2 Hongliang Fei Constraint lntegrability Recall that integrating surface normals was overconstrained Two equations per surface normal Not every needle map corresponds exactly to a real 3D surface 2 2 Real needle maps have zero curl 92 92 Bray By z Zmy Zym Remember that convolution is commutative Harlmntel Derivative Venice Derivative Just like multiplimtiun in the Fourier domain Filler Filler m Just like snakes Energy Minimization for SFS Re ectance Map For Lam bertian surfaces this forces the surface normal to lie along a cone Emmmim I 7 Mo 42 dzdy HM Rpryqry2drdy lntegrability zxy zyx a a 2 Einteorooimy 107 a gt dxdy Now nd plxvl and owl that minimize E REreconstruction IEintegrability Is That Enough Suppose I have an image of size H x W Size of p H x W1 Size of q H1 x W Total number ofunknowns ZHW H W Is That Enough Suppose I have an image of size H x W Size of p H x W1 Size ofq H1 x W Total number ofunknowns ZHW H W Image Reconstruction HW equations lntegrability HW equations Total number of equations 2HW H W degrees of freedom left over What ifthere are points in shadow Is That Enough Suppose I have an image of size H x W Size of p H x W1 Size of q H1 x W Total number ofunknowns ZHW H W Image Reconstruction HW equations lntegrability HW equations Total number of equations 2HW H W degrees of freedom left over What ifthere are points in shadow Solution Include depth cues besides shading Consider statistical priors What shapes are more common in real scenes Smoothness Constraint In nature objects are cohesive and typically have smooth surfaces Smoothness constraint relates surface normals of neighboring surface poin s Minimize to 175 q o5 dxdy penalize rapid changes in surface normals over the image EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 23 Project Ideas EECS 841 Final Project What am I really looking for Find something that interests you Play around with it See if you can make it better Write about it Present it in class EECS 841 Final Project What am I really looking for Find something that interests you Play around with it See if you can make it better Write about it Present it in class Could be an application an algorithm an idea Lots of potential topic ideas to follow EECS 841 Final Project What am I really looking for Find something that interests you Play around with it See if you can make it better Write about it Present it in class Implement it make it work EECS 841 Final Project What am I really looking for Find something that interests you Play around with it See if you can make it better Write about it Present it in class How can you get the best performance How can you taylor an approach to a particular application Get creative But don39t get risky Start simple amp work upwards EECS 841 Final Project Grading 35 ofyour nal grade Writeup 75 Oral Presentation 75 Project content 20 Understanding ofthe material Quaity of results Creativity Thorough literature search Sticking to the proposal is encouraged See me if you get stuc EECS 841 Final Project Timeline Project Proposals Due Mon 102708 Written Projects Due Mon 120108 Oral Presentations 123 1210 No milestone reports are required But if you get stuck come talk to me emailappointmehtof ce hours Unexpected hangups are viewed more sympathetically if you come to me early EECS 841 Final Project Project Proposals Due Mon 102708 No need to wait until Monday I will provide feedback anytime Informal Couple of paragraphs You re not graded on it This is to help you Have some concrete ideas Know one or two papers in the area Some prior research will help you decide what issues are hard amp what problems are doable EECS 841 Final Project In return I will Let you know if the topic is appropriate Let you know if the scope is on target Provide you with some references to get you started Point out possible pitfalls EECS 841 Final Project The written report Around 6 pages for reasonable figure density Pretend it is a CVPR submission only shorter amp less ambitious httgcvgr2009orga uthorkitcvgrkitIgZ Don t sweat formatting details just follow the basic style Now is a good time to learn latex amp bibtex You are responsible for a thorough literature search EECS 841 Final Project The presentation 10 minute talks with 23 minutes for questions Provide a brief overview of the field But don tjust regurgitate the literature Talk about your own work Show your results Discuss limitationsproblems with the approach Don t cover every mathematical derivation Keep it high level High level topics from the projects will be on the nal exam EECS 841 Final Project Other miscellaneous points Any language Matlab C etc is fine Implement your own code Use code from online only with my permission Matlab amp Image toolbox are OK Image Segmentation Image Colorization httpwww cshuji acilyweissColorization Normalized Cuts Image Inpainting Image Inpainting Parametric methods Fields of Experts Nonparametric examplebased methods Roth amp Black quotFields of Experts A Framework for Learning Image Priors 2005 C minisi Perez Toyama Region filling and ObjeCt reIIlOVal by exemplarbased inpainting 2004 16 15 Texture Synthesis Single Image SuperResolution utitbccoms ham m I u wuutuu qu LUulIu lu l at uus ua LEW am 1 laundi sdf I ms dam Atnda wears mune39l39ring roomsquot as Hefthe fast nditl v was mnqu as House Du as at meals ninseas xihed 1t lastnthestbeehan AIL TT an rquot 11m A1 mquot as Lewing questies last aticaxsticall He isrllan quot H Tunnquot 39 39I an nedianicallHoorew1 ng momsquot as House De ale f De 391 quot39 39 39 39H39v fall 39lTef 39 39 T e 39n 11 m Athe left liming questiol inda Trippquot Thatnowseer quot39 39 nl39F39nr I icazs cuecums 39 39 a a Thas Fain mom stooniseatnowea xe left a mouse bouestof MIe lelfta Le39st fastngine lauuesticazs Hef 15 it zipquot T ouself a zirtginditsomstidita ring q e astical cnis oreyears of Moung fall He ribof Mause xeyears ofanda Trippquot That hedian A1 Lest fasee yea Mia Tripp39s iolitical mmedian le39the fewse Iing que ulitical com 1e years at the smreaxs ofa39s l Frat nica I ms Lew se lesta rim 1 He fas questnging of at beou Nonparametric examplebased methods htt ra hicscscmuedu e0 le efros research EfrosLeun html 17 Freeman Pasztor Carmichael quotLearning LowLevel Vision 2000 18 Single Image SuperResolution Image Denoising Freeman Pasztor Carmichael quotLearning Low LeveVision 2000 W Spectral Matting a Input image b Hard segmentation c Alpha matte Levin Rav Acha Lischinski quotSpectral Matting CVPR 2006 25 Spectral Matting Painted Poorly scanned httpgraicswashingtoneduproiectsquerv 26 Adaboost Car detector UIUC Image Database for Car Detection This database consists of 550 car training images 500 noncar training images approximately 170 test images in which cars are present at the same scale as in training and approximately 170 test images in which cars are present at more than one scale Space Carving Kutulakos amp Seitz A Theory of Shape by Space Carving 2000 28 b aiyhih httpiveabscomphotosvnth httpphototourcswashingtonedu httpphototourcswashingtonedu httpivelabscomphotosvnth Nevertheless some real progress has been made G a mes PlayStation 2 EyeToy Play 2 PlayStation 2 EyeToy Sega SuperStars Vinua Fighter Wexler Shechtman amp Irani quotSpace Time Video Completionquot CVPR 2004 33 Spacetime video completion In lap1x wqucncc r A 2 2 1 9 r IlH llmludh lemma Wexler Shechtman amp Irani quotSpaceTime Video Completionquot CVPR 2004 Zelnik Manor amp Irani httpwwwwisdomweizmannac ilmathusersvisionVideoAnaIysisDemos EventDetectionEventDetectionhtml 34 Other Ideas for Projects Ideas related to your own research It has to be computer vision Books Forsyth amp Ponce quotComputer Visionquot Shapiro amp Stockman quotComputer Visionquot Davies quotMachine Vision Theory Algorithms Practices Here is a list of project ideas from another course httpwww and rew cmueducourse167202008projectshtml 35 Have Fun EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 20 Photometric Stereo Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approachquot Chapter 4 5 Computing Pixel Brightness from the BRDF Em hqzi lrradiance entering Surface in direction 941 Lquotquotquot e r Radiance leaving Surface in direction er r 771067 y LSWWWW lt1 2 Ms abs 6 TgtESDWS lt6 st 6 Diffuse Re ection and Lambertian BRDF Light rays Shlnlng on a surface Diffuse Re ection and Lambertian BRDF source intensity incident direction S viewing direction v surface element Surface appears equally bright from ALL directions independent of V alhedo Lambertian BRDF is simply a constant f9 ier r pi 7 SurfaceRadiance A lmsel I 7 N source intensity Commonly used in Vision and Graphics Diffuse Reflection and Lambertian BRDF Diffuse or Lambertian surface is an idealization of a matt surface Its radiance depends on only the illumination and not the viewing direction Each point appears equally bright from all directions Take light incident from any direction and scatter to all directions eagle Specular Reflection and Mirror BRDF source intensity specularmirror direction In 6 I r r viewing gt direction V 6v v incident gt direction S Balk gt normal 11 surface element Valid for very smooth surfaces gt gt All incident light energy reflected in a SINGLE direction only when v r Mirror BRDF is simply a doubledelta function specular albedo fei i6v9 v ps 66139 6v6ii 75 iv Surface Radiance L I ps 661 6v 317 v Specular Lobes source intensity specularmirror direction r e I r9 r viewing gt direction V 6v v incident gt direction S 9i i gt normal 11 surface element Shiny surfaces are not always perfect mirrors from point source specular direction Diffuse and Specular Reflection diffuse specular diffusespecular Re ectance map For a fixed lighting arrangement Viewing direction and surface BRDF the Re ectance Map Rp q describes the pixel intensity given a particular surface normal For a surface of constant albedo and BRDF if the illumination conditions don t vary across the surface then M367 y Rq1 yp7 y where L is pixel intensity If albedo varies across the surface but the BRDF is otherwise static we can write L013 y p013 yRCI7 ypway Computing Surface Normals Suppose that points x y Z in the 3D scene were described by Zf7y Slope of the tangent line Computing Surface Normals Suppose that points 1162 in the 3D scene were described by z w SI fth t tl39 39 62 opeo e angen me in 7 Computing Surface Normals Suppose that points 1162 in the 3D scene were described by zfw SI fth t tl39 39 62 OPSO e angen Il39le a 7 32 3714 Computing Surface Normals Suppose that points my 2 in the 3D scene were described by z ampw 32 7 Slope ofthe tangent line a 7 p 32 7 37y q Tangent vector Computing Surface Normals Suppose that points 1362 in the 3D scene were described by z w SI fth t t I39 39 62 opeo e angen Il39le a 7 32 7 4 mm 3 Tangent vector Computing Surface Normals Suppose that points 1162 in the 3D scene were described by z mw SI fth t tl39 39 62 OPSO e angen Il39le ax 717 32 7 q r AM 3 Tangent vector Ax 0 A2 X LQm Computing Surface Normals Suppose that points 1162 in the 3D scene were described by zfw SI fth t tl39 39 62 OPSO e angen Il39le ax 717 32 7 4 mm 3 Tangent vector Ax 0 A2 X LQm Surface normal is perpendicular to the tangent plane U MXQL in Note that this is not a unit vector Computing Surface Normals Suppose that points 1162 in the 3D scene were described by Z ny SI f h I39 39 62 i Z 1 opeo t etangent me a 717 A2 A 32 7 q rma 32 Tangent vector Ax 0 A2 X 1 010 Surface normal is perpendicu ar to the tangent plane 07171 X 170717qu7 1 Note that his is not a unit vector Re ectance map For a xed lighting arrangement viewing direction and surface BRDF the Re ectance Map Rp q describes the pixel intensity given a particular surface normal For a surface of constant albedo and BRDF if the illumination conditions don t vary across the surface then L06 y Rqr yPw y Where L is pixel intensity Ifalbedo varies across the surface but the BRDF is otherwise static we can W11 e Maw PyRqxiyipxi 31 Lambertian Re ectance Map Let us derive the re ectance map of a Lambertian surface With lighting direction pnqnl Scene radiance is L Eli COS 6i 7r But cone is the inner product of the unit lighting vector andthe unit surface norm Mel 1ppqq I 11p1qzllp2qz Rp7qg1 1pspqsq 7 Zx1p2q21p q Lambertian Reflectance Map Lambertian Equation L 1 cos 9i PI 1pspqsq Lambertian R 7 Re ectance Map ptq E V V s s Lambertian Reflectance Map Lambertian Equation L 1 cos 9i PI 1pspqsq Lambertian R 7 Reflectance Map p q E s s Surface ndrmal mav he ammeve on the surface mm Dune Lambertian Reflectance Map Lambertian Equation L 1 cos 9i Lambertian p 1 psp qsq Re ectance Map RC 1 rm 1pp m c 1p q 1pz 117 F orLambertian surface contours of constant brightness Rpqc are nested conic sections in the p q plane Maximum ofRCpq is at pq Surface ndrmal mav he anvwhere on the surface atrhis cane Lambertian Reflectance Map isobrightn ess ntour 9 90quot PPsqqs139o 03 Note Rpq is maximum when pq psqs Lambertian Reflectance map Don t forget Re ectance Maps are dependent on light source Lambertian RM for ligh ing direction 1 051 Terminator straight line 39 separates the shadowed region surface orienta ions from the illuminatedregion orientations Under what condition would a terminator not exist Lambertian RM for lighting direction 1051 55 Lambertian Reflectance map Don t forget Re ectance Maps are dependent on light source Lambertian RM for lighting direction 1 051 Lambertian RM for gt i lighting direction 1051 l 5 4 6 NonLambertian Reflectance Map Glossy surfaces TorranceSparrow reflectance model Imgt n cose diffuse term specular term q Diffuse peak 5 Specular peak P Rpq 05 Photometric Stereo What if I have different images of The same scene From the same viewing angle Lit by different lighting conditions The BRDF is known for every point in the scene Generally the BRDF will be constant throughout the scene and The illumination conditions are known throughout the scene Generally this means no shadows lighting directions are known and all incoming light rays are parallel 30 Photometric Stereo Photometric Stereo Photometric Stereo Photometric Stereo Photometric Stereo Photometric Stereo Lamber ian case For 0W Lambertian case F WOW p assume lt gt p assume Li 712v 0056i Ii 7r A V L 712v 0056i L 7r 7r ltED 7r Image irradiance V Image irradiance L1 39a1 L1 39 1 L2 P 77 39 52 Photometric Stereo Lamberian case FormoWl m p A V L 712v 0056i Ii 7r lt p 7r V n Image irradiance n a at a a 1 quot a 4v lt L3 P 77 39 53 Photometric Stereo F A Lambertlagcase aggunrgg d L 71239 cos 6i Ii 7r 7r 7 Image irradiance L1 P a 39 51 L2 P 77 39 52 L3 P 77 39 53 We can write this in matrix form L1 L2 7 p s 71 L3 g5 Solving the Equations L1 31 L2 g pn L3 gig PH 14 311x 3x3 3x1 73 S lL Solving the Equations L1 31 L2 g pn L3 37 W N L 11 3x1 3x3 3x1 73 S lL p 73 g 73 73 n T 7 n p More than Three Light Sources Get better results by using more lighting angles L1 Jf 39 p LN 12 L 73 lemXle More than Three Light Sources Get better results by using more lighting angles L1 g E m7 LN g fv L 73 AhaMk STL STS 73 STSHSTL Solve for pn as before More than Three Light Sources Get better results by using more lighting angles L1 Jf E m LN g fv L 73 lemXle STL STS a STS 1ST MuurErPEnruse pseudu inverse Solve for pn as before EECStdlztomputerVision Brian Potetz ia2008 lecture 26 Opdcal Flow Suggested Reading Stereo iorsytthonce Chapter 11 ScharsteinhSzelishi itiaxonomyand Evaluadon oi Dense twoFrame Stereo Correspondence Algorithms 20 2 httpvisionmiddleburyedustereo Medan ShapirohltochmanChapter9 Vergence held of view of stereo uncertainty of scenepoint onie pixel wwwwwwwl wwwwwwl r Field of view decreases with increase in baseline and wergence r Accuracy increases with haselhe and wergence Strength 0t Stereo Cues Mam who ww hmlww hm Y I hill 3 hinooriwdlewnltine is wmeww l fquot hwclwaion rohitiawoiee l sorrel reconnection r Lilliililtttt r its l density E rootiwowrrawrrtiwo it quot ri39l I I I 39 m l id ldd ldlid ldddh Distance iron dishwasher iromtutdngdhishton 1995 Basic Stereo Algorithm Wards reconcile concatenation of t39nltrrldtwterizi39 WaddMMm ideMMMdmw 39 comparewithewewpixelonsameepipolarlneinrightimage 39 pick pixel with minimum match cost hmmmwmmwmw 39 This should oohlamiar 39 Correlation Sum of Squared Ditterence SSD etc Size of Matching window Smaller windows are sensitive to noise largerwindows do not localize well Remind you ot edge deteedon o Better results with adaptive window i Kanade and M 0kutomit Stereo D Seharstein and R Szeliski Stereo Matediogthorittrm with aoAdaotiwe matching with nonlinear diffusion IJVC Window theory and Experiment 1991 282155174 July 1998 Dynamic Programming Modern Stereo Algorithms a Often use colorbased adaptive windows Adaptive metric oferror between image patch L and R E h i 1 maximum r t g ZAawL5L5AfwRfRiAidLiA Mama ZMwLfLiA wRiREAE e Rz A5 BUT We might assume the ordeIirig of points is the same inthe le amp right eyes Now solve for the best matching of p4 given p3 etc N points have N possible correspondences Modern Stereo Algorithms Modern Stereo Algorithms Often use colorbased adaptive windows Often use colorbased adaptive windows Adaptive metric oferror between image patch L and R Clean Up reSUIts usmg a GIObaI Energy FundIon ie Minimize D Spawasa s l l r s l l r l mm WW imiarit income imiari inmuru ixe centralpixelxyand pixeionx in le t trightimagZS Edazad AEsmoozhd Ed d CIz my139hxdxyy zmwltLltixLltiA gtgtwltRltaRltiMgtgtdltLltiMgtRltiM Equot quot539 w 2 5le MWW RWW Emmi Z ltdltz 1 y a dltz 11 14 Sum ur all weights dx 1 idz kw sz lt y gt 0 Smoothness Priors smoothness Priors umeswmmmm 790K occlusion contours 4W quot m 7 7 MM MM Ad Gaussian Approach Lorentzian Approach Gaussian Approach Lorentzian Approach Emm Z we 14 my BMW Ewe 14 mm Emma1 a my Z ltdltm 1 a my Visual Cues for 3D Visual Cues for 3D Texture The Visual Cliff by William Vandivert 1960 Shading Visual Cues for 3D at 5 3 1 1 Focus quot Q4 From The Art of Photography Canon Visual Cues for 3D Motion Visual Cues for 3D Others Highights Shadows Sihouettes nterrefections Symmetry Light Polarization Shape From X X shading texture focus motion We ll focus on the motion cue Motion Field Image velocity of a point moving in the scene dr Scene pointvelocity v0 0 dr dt Image velocity v1 i dt i I TO Perspective projection Ti f To 39 2 Motion field an m mo 21 sz 39Uz39 I I f 2 875 7 0 2 Optical Flow Motion of brightness pattern in the image Ideally Optical flow Motion field Optical Flow 7 Motion Field Motion field exists but no optical flow a bi No motion field but shading changes Problem Definition Optical Flow Optical Flow Constraint Equation E xu6tyv6t Optical Flow Velocities uv xy xy H T O a I y 1m7y How to estimate pixel motion from image H to image I Find pixel correspondences Given a pixel in H look for nearby pixels ofthe same color in l Key assumptions color constancy a point in H looks the same in image For grayscale images this is brightness constancy small motion points do not move very far Displacement 5x6y u 6tv 6t time t time t 61 Optical Flow Constraint Equation xu8tyv6t y D Optical FlowVelocities uv x y Displacement time t time t5t x y u 6tv 5t Assume brightness ofpatch remains same in both images Exu 6ty v6tt6t Exyt Optical Flow Constraint Equation xubtyv5t D L Optical Flow Velocities uv X y my Displacement time t nme t a xi y 14 512v 5 Assume brightness ofpatch remains same in both images Ex u 6ty v 6tt6t Exyt Assume small motion First order Taylor expansion of E Exytax E5y 5t E mExyt 6x 6y 6t EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 35 Finishing Gibbs Sampling Starting on Modern Object Recognition Schedule Reminder Suggested Reading Gibbs Sampling Walsh Markov Chain Monte Carlo and Gibbs Sampling 2004 SIFT Features David Lowe Object Recognition from Local Scale lnvariant Features 1999 MaxFlow lVllnCut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towar s T 2 empty pipes moving away from T Because otherwise I could construct an mgmentingpath and F would not be optimal The capacity ofS equals the ow ofF Since mincut Z max ow then minscut maxs ow capacity 15 flu o wnl4 Graph Cuts Cost ofpixel p not belongingto class a Cost ofpixel p and q notbelongmg to the Cost ofpixel p not belongingto class a GraphBased Segmentation 11 33 i s 3 a r n HCV 2004 5 Common Methods of Statistical Inference in Computer Vision Gradient Descent When PX is smooth Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions Sampling Can be slow hard to detect convergence Gibbs Sampling Goal Given a multidimensional distribution try to draw samples from it If I can accumulate a large database of samples I can easily nd the most likely sample or the expected value of some variable Gibbs Sampling Guess X1 using the marginal PXl Then guess X2 using the conditional PX2X1 Then guess X3 using the conditional PX3X1 X2 Etc 1 Initialize r7 2i 1 n 2 For71T Sample will Naughty Sample lgll pr217171l1 7l7H716 Sample my N MayMY IgljllmgQl all T Sample 71 YlTlll 7 I prmnlmi Gibbs Sampling Gibbs Sampling Guess X1using the marginal PXl Then guess X2 using the conditional PX2X1 Then guess X3 using the conditional PX3X1 X2 Etc Gibbs sampling is especially suited to graphical models where conditional independencies are known MRF PX1XX1 depends only on immediate neighbors Gibbs Sampling Gibbs Sampling Guess X1using the marginal PXl Then guess X2 using the conditional PX2X1 Then guess X3 using the conditional PX3X1 X2 Etc Gibbs sampling is especially suited to graphical models where conditional independencies are known MRF Bayes Net PX1XX1 depends only PX1XX1 depends only on on immediate neighbors parents children and coparents 39 1 1 Markov Random FIE q mustmng 0m clique of the graph J J J k MRFs represent pr abilities as f Pf H divdj W 0 Unfortunately learning CD from data is no longer as D trivial as taking histograms and measuring Pchild parents 0 O Training MRFs is a major topic of machine learning A J J A J J J Many applications handtune the parameters CD Fields ofEXperts Image npainting Parametric methods Fields of Experts Roth amp Black Fields of Experts A Framework for Learning Image Priors 2005 i2 Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution Gibbs sampler Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution Gibbs sampler Gibbs sampling always moves at right angles Xi Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution 0 Moves slowly over highly correlated dimensions Gibbs sampler Gibbs Sampling Troubles with Gibbs Sampling 0 First few samples re ect the starting position How many to discard 0 Can get stuck in one mode of a multimodal distribution 0 Moves slowly over highly correlated dimensions Gibbs sampler Content Based Image Search By Aravindkumar Ilangovan Content III Introduction III Problem III Solution III Implementation III Results Introduction III Content based image search using feature extraction and learning Problem III web based image search is based on tag information associated with images Solution III Content based image search Implementation El El In this project we implemented searching obJect content In an Image example car Extract Features for the corresponding object from set of images in the database using SIFT algorithm III Learn each features separately using AdaBoost algorithm III Search for image which matches the features and the search images are sorted based on the number of features matched And Finally apply threshold to decide that image has the searching object or not Implementation Object Object Feature Feature gt Learning gt Extraction Object Detection Object training Database Search Space Result Threshold 10 Threshold 15 Threshold 20 Threshold 25 Threshold 30 P N P N P N P N P N Training Car 100 0 98 2 77 23 55 45 28 72 SamP39e NonCar 33 67 26 74 16 84 7 93 0 100 True and False positive rates 075 0 079 002 082 021 088 032 1 041 Accuracy 0835 086 0805 074 064 Threshol 10 Threshold 15 Threshold 20 Threshold 25 Threshold 30 P N P N P N P N P N NonTraining car 100 0 100 0 87 13 38 62 3 97 SamP39e NonCar 96 4 81 19 50 50 20 80 0 100 True and False positive rates 051 0 055 0 063 020 065 043 1 049 Accuracy 052 0595 0685 059 0515 Result ROC Curve Trained Images 05 Future Work and Conclusion III More images for learning III More Objects III As a filter for the current image search Question Thank you Suggested Reading Motion Shapiro amp Stockman Chapter 9 EECS 841 Computer VS0n Brian Potetz Fall 2008 Lecture 27 Optical Flow Dynamic Programming Modern Stereo Algorithms a Often use colorbased adaptive windows Adaptive metric oferror between image patch L and R ghi i mlhinmm 4 r Similarity in mlur ur Similarity in color ur pixel central pixel x and pixel x o Ax in le right images c E g 2A5wltLltwlt5AagtwltRlt Willa mmmw 5 Ri AidLi Ai Ri A5 N points have N possible correspondences BUT We might assume the ordeIing ofpoirits is the same sum all ng inthe le amp light eyes e I ll Now solve for the best matching of p4 given p3 etc Modern Stereo Algorithms Often use colorbased adaptive windows Clean up results using a Global Energy Function ie Minimize WSW 53 function L739fxy E01 Emmi t Esmoothd 2 Edatad ZCUzeszJDJMgth t dWMMD Ry 24mm ma 7 mg Z ltdltzy 1 7 dltz7ygtgt Motion Field optical Flow Image velocity of a point moving in the scene Motion of brightness pattern in the Image Ideally Optical flow Motion field dr Scene pomtveIOCIty Va 0 dr dt Image velocity v i dt I TO Perspective projection Ti f TO z Motion field vi 2 I fr0 zvo 110 zr0 875 7390 z2 Optical Flow Motion Field Optical Flow Constraint Equation E xu6tyv6t Optical Flow Velocities u v x y x y time 1 time t or Displacement 6x6y u 6t v 61 Assume brightness of patch remains same in both images Exu6tyv6tt6t Exyt Motion field exists but no optical flow No motion field but shading changes la lb Aperture Problem Aperture Problem Optical Flow Constraint Barber pole illusion z axis ltl39lilvll39l39lill l l39l l l ll lll l l llllil l lll M39lilil39l39 Barber39s pol Motionme Optical ow Optical Flow Constraint Equation xu6tyv8t L73 Optical Flow Velocities u v Jay Jay Displacement 6x5y u 5tv 5t Assume brightness of patch remains same in both images time t time t or Exu6tyv6tt6t Exyt Optical Flow Constraint Equation xu6tyv5t Optical Flow Velocities u v Jay Jay Displacement 6x5y u 6tv 5t Assume brightness of patch remains same in both images time t time t or Exu6tyv6tt6t Exyt Assume small motion First order Taylor expansion of E Exyt 6x E 5y E 6t E zExyt 6x 6y at Optical Flow Constraint Equation 6x E6y 5t 0 6x 6y 6t Divide by at and take the limit 675 gt 0 E E Q E E 0 dt 6x dt y at Constraint Equation ExuEyvEr 0 Optical Flow Constraint Equation 5x 5y 6t0 6x 6y Divide by at and take the limit 675 gt 0 u E E Q E E 0 dt 6x dt y at Constraint Equation EI u Ey v Er 0 NOTE uv must lie on a straight line We can compute Ex Ey Et using gradient operators But uv cannot be found uniquely with this constraint Optical Flow Constraint lntuitively what does this constraint mean The component ofthe ow in the gradient direction is determined The component ofthe ow parallel to an edge is unknown Optical Flow Constraint Equation Constraint Equation ExuEyvEl 0 uv must lie on a straight line We can compute Ex E E using gradient operatorsy But uv cannot be found uniquely with this constraint V1417 1w gum Optical Flow Constraint Equation Constraint Equation ExuEyvE 0 u5v must lie on a straight line We can compute E1 E Et using gradient opera orsly But uv cannot be found uniquely with this constraint VIzWiy uiv gum Optical Flow Constraint Equation Constraint Equation Ex u Ey v uv must lie on a straight line We can compute Ex E E using gradient operatorsy But uv cannot be found uniquely with this constraint wary m gang 0 Computing Optical Flow Formulate Error in Optical Flow Constraint 2 ea ffExuEyvE dxdy imaga We need additional constraints Smoothness Constraint as in shape from shading and stereo Usually motion eld varies smoothly in he image 30 penalize departure 39om smoothness 2 2 2 2 e ffux uyvx vy 1an has Find uv at each image pointthat MINIMIZES e e Ae 4 weignting 139 0 factor Discrete Optical Flow Algorithm Considerimage pixel Lj Departure from Smoothness Constraint 2 2 5g 7 ui1j uij uijl uu 4 2 2 vi1j 1 vljl vij Errorin Optical Flow constraint equation if 39v39 939 2 cij E xuij E yvij E We seek the set amp vythat minimize e 439quot Cy hOTE uampv ow up invmore thyan Discrete Optical Flow Algorithm Differentiating e wrt v amp ME and settingto zero 3935 2 14 i2x Efu va E Ef39 o 614 6 32 2 v 22 inqu 5va Ef Ef 0 VA 1 amp a are averages of uv around pixel Ll Update Rule 1 in EfuzEf vgElu H H H A Id 2 u 2 x 1 E Ey EfugEfvgEf u 39 1x 195z Ef 1E n1 V v Example Optical Flow Result 00 0O 00 0 09 00 00 00 99 00 00 00 90 00 00 000000000000000 000000099009000 OOQOOOOI AOQQQO 39 I I f I I t 1 I f f T t 39l X 00000000000000 00099990000900 Low Texture Region Bad Edges soso aperture problem gradients have small magnitude large gradients all the same High Textured Region Good Revisiting the Small Motion Assumption gradients are different large magnitudes Is this motion small enough Probably not it s much larger than one pixel 2ncl order terms dominate How might we solve this problem Reduce the Resolution Coarsetofine Optical Flow Estimation u125 pixels quot2395P Xe s u5 pixels image H 10 PiXeIJ Gaussian pyramid of image H Gaussian pyramid of image I Coarsetofine Optical Flow Estimation l l l l l l l image H l l t Gaussian pyramid of image H run itera K run iterative OF H ll ll li l I l l l Gaussian pyramid of image I EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 8 Hough Transforms Suggested Reading Forsyth amp Ponce Computer Vision A Modern Approach Chapter 151 E R Davies quotMachine Vision Chapters 911 L G Shapiro G C Stockman Computer Vision Chapter 10 MarrHildreth Edge Detection Solution Instead of thresholding the rst derivative nd zero crossings of the second derivative Common 2nd Order Derivative Operator Laplacian Af 1777 gt 7710 V 20 Laplacian of Gaussian MarrHildreth Edge Detection Solution Instead of thresholding the rst derivative nd zero crossings of the second derivative Edges can be localized to subpixel accuracy But all contours must be closed Spaghetti Localizing Edges Thresholding V 2 can result in wide poorly localized edges Especially for high a Solution For each pixel above threshold nd the local maximum of Vfl2 along the direction of the gradient Vf MarrHildreth Canny Edge Detection Edge Detection Choice of 039 Determines Features Large 0 results in coarse large scale features Small 0 results in ne small scale features 5 i Qt I I39ll arJ 39II I Canny with 0 1 Choice of 039 Determines Features Large 0 results in coarse large scale features Small 0 results in fine small scale features a I LJ39 LJ Canny with 0 2 Edges in ScaleSpace As sigma increases edges can combine but never form 5 39 If n Er l illl Willi I I 5 1 I i I 4 X The Gaussian Pyramid G4 G3 gaussian J 2 down blur n39Sam fl G2 G1gaussian I 2 G1 G0 gaussian J 2 Edge Detection using Pyramids Coarsetofine strategy Do edge detection at higher level Consider edges of finer scales only near the edges of higher scales Gaussian Pyramid Frequency Composition Level 3 Laplacian Pyramid Frequency Composition The Laplacian Pyramid is Level 3 a band pass representation vice a low pass representation for the Gaussian f Horror Photo prof dmartin EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 8 Hough Transforms Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approachquot Chapter 151 E R Davies quotMachine Vision Chapters 911 L G Shapiro G C Stockman quotComputer Vision Chapter 10 Applications of Line Detection Automated Inspection Applications of Line Detection Applications of Line Detection Automated Inspection Automated Inspection Model tting amp Object detection Model tting amp Object detection Menus Automated Driving Drive straight Steer left Steer right Applications of Line Detection Hough Transforms Equation for a line mac b Automated Inspection y Suppose I know x ampy but not m amp 9 Model tting amp Object detection What is the set of all lines that pass through x ampy Automated Driving Inferring Shape from Blocks 07 90 X Hough Transforms Hough Transforms Equation for a line y mac b Equation for a line y mac b Suppose I know x ampy but not m amp 9 Suppose I know x ampy but not m amp b What is the set of all lines that pass through x ampy What is the set of all lines that pass through x ampy In parameterspace the space of mb In parameterspace the space of mb Each point in mb specifies a line in xy space Each point in mb specifies a line in xy space Givenxampy bm y y bu y bu 0ayo 0790 Hough Transforms Equation fora line y mac 17 Suppose I knowxampy but not m amp b What is the set of all lines that passthroughx ampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy b mac 7y b y 20110 Hough Transforms Equation fora line y m 17 Suppose I knowx Sty but not m amp b What is the set of all lines that pass through xampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy bmz7y 11111 fowl Hough Transforms Equation fora line y mac 17 Suppose I knowxampy but not m amp b What is the set of all lines that passthroughx ampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy bmxiy 1211 3quot Hough Transforms Equation fora line y m 17 Suppose I knowx Sty but not m amp b What is the set of all lines that pass through xampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy b mac 7y Y Hough Transforms Equation fora line y mac 17 Suppose I knowxampy but not m amp b What is the set of all lines that passthroughx ampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy b mac 7y Y b Hough Transforms Equation fora line y m 17 Suppose I knowx Sty but not m amp b What is the set of all lines that pass through xampy In parameterspace the space of mb Each point in mb specifies a line in Xy space Givenxampy b mac 7y Y Hough Transforms Equation fora line y mm b Suppose I know xampy but not m amp b What is the set of all lines that pass through x ampy In parameterspace the space of mb Each point in mb speci es a line in xy space Givenxampy bmx y Hough Transforms Equation for a line y mm b Suppose I know x ampy but not m amp b What is the set of all lines that pass through x ampy In parameterspace the space of mb Each point in mb speci es a line in xy space Givenxampy bmx y Hough Transform Algorithm Quantize parameter space between appropriate maximum and minimum values for c and m 2 Form an accumulator array Acm initialized to O 3 For each point above threshold in the edge map increment all the points in mbspace the accumulator along the appropriate line bmI y 4 Local maxima in the accumulator correspond to lines Reparameterization of Lines Original parameterization of lines y mac b For vertical lines m goes to in nity Reparameterization of Lines Original parameterization of lines y mac b For vertical lines m goes to in nity New parameterization of lines 1 Sin y cos 0 Reparameterization of Lines Original parameterization of lines y mac b For vertical lines m goes to in nity New parameterization of lines 1 Sin y cos 0 0 yo Reparameterization of Lines Reparameterization of Lines New parameterization of lines 70 sinQ5 y cos Q5 0 New parameterization of lines 70 Sin Q5 y cos Q5 0 pTSiDltQ5 9 pTSiDltQ5 9 rJc2 l y2 tan6yx 7 Vx2y2 tan6yx y mayo p y 0790 p 5 5 Reparameterization of Lines Reparameterization of Lines 2 Example Points 3 Example Points Reparameterization of Lines Reparameterization of Lines 4 Example Points 5 Example Points 0 20 40 50 80 100 120 140 160 0 20 40 Reparameterization of Lines rho pixels from center Two Lines 30 60 50 40 20 Radon Transform Sum of image intensity fxy along each line at an angle I from xaxis Real World Example Edge Detection Original Parameter Space Mechanics of the Hough transform Quantizing p space How big should the cells be Too big and we merge quite different lines Too small and noise causes lines to be missed What counts as a line Thresholding results in clusters of lines Use nonmaximum suppression to choose only the local maximum of each cluster Which points belong to each line Often we want to find line segments not in nite lines Search along each infinite line to locate segment endpoints Each line could have several segments Mechanics of the Hough transform Don t waste the Gradient Magnitude Initially each point over threshold gets equal say Other me hods give more weight to points with greater gradient magnitude Don t waste the Gradient Direction Edge detection methods give us direction as well as gradient magnitude Weight the curve 0 7 sin 9 higher near the edge direction Other methods allow each edge point only one vote Line Space 760 25 740 2 20 15 v 0 20 1 40 05 60 0 0 20 40 50 80 100 120 140 160 Finding Circles by Hough Transform Equation of Circle x W y bgt2 r2 If radius is known 2D Hough Space AccumulatorArray Aab Question What is the advantage of this Hough transform over Convolution Red circles locus of the center of possible circles Finding Circles by Hough Transform Equation of Circle xi a2 yi b2 r2 If radius is known 2D Hough Space AccumulatorArray Aab Question What is the advantage ofthis Hough transform over Convolution Red circles locus of the Often only points in the accumulator center Of possible circles consistentwith the edge direction are incremented Finding C0ins Assuming PennySize Assuming QuarterSize Finding Coins Original Edges note noise Real World Circle Examples Crosshair indicates results of Hough transform bounding box found via motion differencing Finding Coins Edges note noise Results Finding Circles by Hough Transform Equation of Circle xi 12 yi b2 r2 If radius is unknown 3D Hough Space AccumulatorArray Aabr What is he surface in the hough space Red circles locus of the center of possible circles Finding Circles by Hough Transform Finding Circles by Hough Transform Equation of Circle x agt2 y bgt2 r2 Equation of Circle x agt2 y b2 r2 If radius is unknown 3D Hough Space If radius is unknown 3D Hough Space AccumulatorArray Aabr AccumUIatorArray 1401919 What is the surface in the hough space Alternatively use a 2D Accumulator array Aab and increment it for each radius a in some range of possible radii Red circles locus of the Red circles locus of the center of poss39ble C39rdes39 Question Why wouldn t this falsely center Of poss39ble C39rdes39 accept ellipses b W 55 56 Generalized Hough Transform Generalized Hough Transform Find contours that match a xed template Template can be any shape Works just like Hough transform for circles of known size Moat Hz 49 ivrlnng ill uiwmliicd Hmmh Isthl39muc lal nl5 clu mm 1H llmwh lV39l39HllM l39U I u l lrr mullllc shan ti Dc39mlm filrw M an39c hrr m n inchI m vynny 57 53 for each table entry for as do for each 5 and G j 39 392Xr 5Cos x 0 3 yr 3yr Ssinadz0 I Finnllystep 22bis now 2 A 5 yr S 0 A x6 yr S 9 1 M EDGE FOLLOWING AS GRAPH SEARCHING A graph is a general object that consists ofa set of nodes 11 and arcs between nodes lt gt In this section we consider graphs whose arcs may have numeri cal weights or costs associated with them The search for the boundary ohm object L for the henI quot two nodes of 39 As ume that a gradient operator is applied to the graylevel image creating the magnitude image 3 x and direction image x Now interpret the elements of the direction image tbx us nodes in a graph each with a weighting factor six es x x have arcs between t em it39the contourdirections x zb x are ap propriately aligned with the are directed in the same sense as the contour direction Figure 410 shows the interpretation To generate Figi 4llOb impose the following restrictions For a nnect from x to x x I one ofthe three possilt ble eightneighbors in front ofthe contour direction d x and furthermore g x i E n i i ml uu Interpreting a gradient Image as n graph sec iein m 4 4 mac Fnllnttmg a Craph Sparrhmu 131 gt T 54le gt T where Tis achosen constant andl x 409 mod Z7rll lt 7r2 Any or all ofthese restrictions may be modi ed to suit the requirements ola particular problem To generate a path in a graph from x to x one can apply the well known technique of heuristic scnrch Nilssnn 1971 1980 The speci c use of heuristic search to follow edges in images was rst proposed by Martelli 19721 Suppose 1 That the path should follow contours that are directed from t4 to x 2 That we have a me had for generating the successor nodes ol39a given node such as the heuristic described above 3 That we have an evaluation function le which is an estimate oflhe optimal cost path from xA to x constrained to go through x Nilsson expresses fix as the sum of two components gx the estimated c051 ofjourneying from the sum node x to x and It x the estimated cost orthe path it 39 39 39 39 alvnrithm called the A algorithm by Nilsson can be stated as Algorithm 44 Heuristic Search the A Algorithm 1 Expand the start node put the successors on a list called OPEN with pointers back to the start node emo t he de x of minimum ffrom OPEN If X xH then stop Trace back through pointers to nd optimal path lfOPEN is empty fail Else expand node x putting successors on OPEN with pointers back to x Go to 5 ep 2 N E The component h x plays an important role in the performance of the algorithm itquot h x 0 for all i the algorithm is a minimmnc0u search as opposed to a Iwurislic search Ilhixl gt It39x the actual optimal cost the algorithm may run faster but may miss the minimumcost path If hx lt h39tx the search will always produce a minimumcost path provided that It also satis es the following con sistency condition lfl39or any two nodes x and x1 kx x1 is the minimum cost ol gcttmg from x to x ifpossiblc then kxx1 139x h li With our edge elements there is no guarantee that a path can be found since there may be insurmountable gaps between x and x5 If nding lhe edge is cru cial steps should be taken to interpolate edge elements prior to the search or ga 5 may be crossed by using the edge element de nition of Martelli 1972 He de nes Ch 4 Boundary DPthmn edges on the 39 though quot 39 quot d quot39 Fig4lla a 441 Good Evaluation Functions articular task as well as A A 39 l nnctinn L p components that are relatively task independent The latter components are dis cussed her Edge strength ll edge strength is a factor the cost of adding a particular edge element at x can be include as M 5x where M max six 2 Curioum ll39 lowcurvature boundaries are desirable curvature can be meats ured as some monotonically increasing function 0 diffldzix dzlx where diff measures the angle between the edge elements at x and x1 3 Proximin Ia an approximation if an approximate boundary is known boun daries near this approximation can be favored by adding 1 dist xB to the cost measure The dist operator measures the minimum distance of the new point x to the approximate boundary B 4 51mm quot 39 I quot IillLdl points near the goal may be favored by estimating has dix rim where dis a distance measure Speci c implementations of these measures appear in Ashkar and Modestino 1978 Lester eta I978 442 Finding All the Boundaries What if the objective is to nd all boundaries in the image using heuristic search in one system Ramer i975 Hueckel s operator Chapter 3 is used to obtain l T at in let Fig JJI Successor mmcntlonxut heuttsttcs cutch lscclcut m 4 4 ldgr Fullmwu n limplv Stain hing 133 strokes another name for the magnitude and direction of the local graylevel changes Then these strokes are combined by heuristic search to form sequences 0 quotg quot streaks quot92 N m39 quot39 39 w is are used to assure a slightly broader coherence than is provided by the individual Hueckel edges A bidirectional search is used with four eightneighbors de ned in front of the edge and four eight neighbors behind the edge as shown in Fig 4t lb The search algorithm is as follows 1 Scan the stroke edge array for the most prominent edge 2 Search in front ofthe edge until no more successors exist ie a gap is encoun Iered 3 Search behind the edge until no more predecessors exist 4 If the bidirectional search generates a path of or more strokes the path is a streak Store it in a streak list and go to step 1 Strokes that are part ofa streak cannot be reused they are marked when used and subsequently skippe are other heuristic procedures for pruning the streaks to retain only prime streaks These are shown in Fig 412 They are essentially similar to the re e a K i i Z Q g 2 39J ll i 39 i i i Fig 412 Operations in the creation ofprimestreaks Ch 4 Boundary Detection Fig 413 Ramer srewtu laxation operations described in Section 335 The resull nl streaks must still be analyzed to determine the objects they represent Nevertheless this method RI mnle or anize 39 39 quot 39 imaee Fig 413 shows an example ofRamer s technique SN 4 4 ram fullnumg a Cnph 5mquot hint 413 Allernatives to the A Algorithm The primary disadvantage with the heuristic search method is that the algorithm must keep track ofa set of current best paths nodes and this set may become very large These nodes represent tip nodes for the portion of the tree of possible tages other less rigorous search procedures have proven to be more practical ve ofwhich are described below Pruning IllE Tree DfAIIEIIlUIll39ES At various points in the algorithm the tip nodes on the OPEN list can be pruned in some way For example paths that are short or have a high cost per unit length can be discriminated against This pruning operation can be carried out whenever the number ofalternzttive tip nodes exceeds some bound Modi ed DepIhFirsl Search Depth rst search is a meaningful concept ifthe search space is structuredas a tree Depth rst search means always evaluating the most recent expanded son This type ofsearch is performed if the OPEN list is structured as a stack in the A algorithm and the top node is always evaluated next Modi cations to this method use an evaluation function f to rate the successor nodes and expand the best of these Practical examples can be seen in Ballard and Sklansky 1976 Wechsler and Sklansky 1977 Persoon 1976 Leas Maximum C 0 In this elegant idea Lester 19781 only the maximumcost arc ofeach path is kept as an estimate ofgt This is like nding a mountain pass at minimum altitude The advantage is that gdoes not build up continuously with d 39 tree so that good paths may be followed for along time This technique has been applied to nding h e A 39 m quot 39 39 39 gt same results are shown in Fig 4J4 Branch altdBmmd The crux of this method is to have some upper bound on the cost oflhe path Chien and Fu I974 This maybe known beforehand or may be computed y actu ally generating a path between the desired end points Also the evaluation func tion must be monotonically increasing with the length ofthe path With these con ditions we start generating paths excluding partial paths when they exceed the current bound Modi ed Heuristic Search Sometimes an evaluation function that assigns negative costs leads to good results Thus good paths keep getting better with respect to the evaluation func tion avoiding the problem of having to look at all paths near the starting point Ch 4 Uulmddn Dormnon a b tquot 4J4 L39slng lust mtnlmum cost in hsurlstlc arch tc llntJ tcll 39nmmdmzs m nllirtv cnpc mug in A stagt in ma search process lb The complaint huummt However the price paid is the sacri ce of the mathematical guarantee of nding the leastcost path This Could be re ected in unsung uctor boundaries This method has been used in cineangingrams with salisfaclory results Ashkar 4nd Modestino 1978 45 EDGE FOLLOWING AS DYNAMIC PROGRAMMING 451 Dynamic Programming Dvnamlc programming Bellman and Dreyfus 1962 is a technique for solving in 11m Ilttln problems when not all variables in the evaluation function are interre lated simultaneously Consider the problem max mm x x qu 48 ll39 nothing is known about h the only technique that guarantees 21 global maximum is exhaustive enumeration of all cumbinalions of discrete ulucs of n V4 Suppose that h hr vxt x1 h xg Aquot ll XL m 49 x only depends on x in In Maximize over 11 in In and tabululc the heal value of II XL 1 3 for 8th x3 139 3 max 11039 tquot 410i 1 Since the values of h and hi do not depend on x they need not be considered at w J Huthilltvumr n Mmmm I mtlrammm 137 EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 11 Deformable Templates Suggested Reading L G Shapiro G C Stockman quotComputer Visionquot Section 143 Energy Model Kass Witkin Terzopoulos quotSnakes Active Contour Modelsquot IJCV 1988 Energy Model Yuille Hallinan Cohen quotFeature Extraction from Faces Using Deformable Templatesquot IJCV 1992 Felzenszwalb quotRepresentation and Detection of Deformable Shapesquot PAMI 2005 Energy Minimization De ne a continuous curve through the image 115 01 gt R2 715 mg5 Minimize some energy mction of that curve 1 Ev F5v1v ds O Typical Snake Energies Edge Proximity Ewe 71lVfzslysl7ds 0 User interaction EWM 1Userzsysds o 1 Elasticity EztachJ wily5mg 1 1 Stiffness Estszmssw Eo minisums Typical Snake Energies Edge Proximity Ewe 71lV1zslyslzds o 1 Edggoasw Mi meg Ewe EdgECostszsysds 0 EdgaOastswy 1lV1a0 3l2 1 User interaction E5v Userrsysds o 1 2 Bastcm Lyme 50 Ms it sl ds 1 1 Stiffness Emmmmj slv sl7ds Numerical Solution Euler Lagrange Equation If 1 Fmuu u ds is minimized then 0 3 32 F 7 le 141 0 a 32 Fr Ezra 143 0 Numerical Solution Euler Lagrange Equation If 1Fcvv v ds is minimized then 0 2 Fx 7 3pm a Fxn 0 as 382 a 32 F11 i EFZ 111 0 Numerical Solution Fx 7 v v EdgeCostsxs alv s2 lvquots2 04x 7 Bitl EdgeCostsx87 aiyEdgeCostS37 ay i yIII F v v v EdgeCostsxs alvs2 lvquots2 a 04x 7 atm a EdgeC39ostsaJs7ys ac C ayquot 7 521quotquot aiyEdgecostsWL yltsgtgt In practice we use a polyline instead of a continuous curve Derivatives can be approximated using nite differences Our optimality condition is now a matrix equation MxCx MyCy Example See Javascript Demo at httpwwwmarkschulzenetsnakes HELLU FINE Facial Tracking Using Snakes Comparing the Two Techniques Energy Minimization allows for more exible energy functions A is guaranteed to nd a globally optimum solution so long as conditions are met Energy minimization can get stuck in holes Energy Minimization snakes are not attracted to curves that are far away Typical Snake Energies Edge Proximity Ewe e f iVIltzltsys i2ds 0 1 EdgeCostswy M 7 lVI907 yl2 Eedgev EdgeCostszsysds 1 0 V 1 lVI967l2 EdgeOostsac7 y 1 User interaction Euserv Userzsysds O 1 Elasticity Eeiastice as lvsl2ds 1 1 Stiffness bum nesse 50 Ms iv sl2ds Facial Tracking Using Snakes Deformable Templates 0 Locating tracking and identifying specific shapes within an image Similar objects can be deformed into one another lad9 template deformations Deformable boundaries 0 Polygonal boundary model 0 Consider deformations Which preserve local properties a2 angles ratio of lengths a1 12 a0 0 Measure deformation cost in terms I I of bending and stretching a Example finding a butternut squash C template optimal match Partbased models Yuille s Deformable Eye Template 0 Pictorial structures Constellation of parts The First Eye Template 0 Rigid parts arranged in a deformable con guration 1 Image y Each part represents local Visual properties Spatial con guration captured by statistical model or springlike connections Image x Yuille s Deformable Eye Template Yuille s Deformable Eye Template The First Eye Template Original Gradient Image y Image Magnitude lrls amp Pupil l Valleys quotPeaksquot image 1 Yuille s Deformable Eye Template Lip Tracking Lip Tracking Factor Node Factor Graphs Factor Graphs are the most general graphical model 1 i Pd H di dz39 Cd Just like MRFs but CD does not have to range over a graph clique Variable Node Shape from Shading Using LBP Image Constraint Nodes single pixe Potential function is a level set of the Shading a 6 equation e 0 s 9 R Any BRDF could be used 0 0 J 9 Po qli e o o N a J ef Plcb U e ll 6 CD 0 p Shape from Shading Using LBP Integrability Constraint Nodes Overcomplete Representation x 44 pmay 1113774 qx l ly pxy l l O Linear dependencies between variables Real surfaces have zero curl 9 Smoothness Nodes Shape from Shading is highly ambiguous Requires a strong understanding of what 3D shapes are typical in real scenes Known as a Statistical Prior For now just exploit Smoothness Common Methods of Statistical Inference in Computer Vision Gradient Descent When PX is smooth and all else fails Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions Sampling Can be slow hard to detect convergence SumProduct Belief Propagation For any tree shaped factor graph exact marginals are computed by passing messages up then down Mme Mme h Nf were 2 W H Wm be H lime Nfm y Nfm heAm Finds all messages in 0NM2 time M is the number of states 0in N is the number of nodes ofX What if our graph is not a tree Loopy Belief Propagation o What if our graph is not a tree 0 Equations for Belief Propagation are entirely local The proof of exactness depends on tree structure but the equations could be applied to any graph 1 h lisz 90 Z lawful f yENO t be H limo h Nac Loopy Belief Propagation Despite lack of formal justi cation loopy belief propagation performs well for many realworld problems LBP was laterjustified as minimizing an approximate error metric between marginals and factored distributions This approximation is the Bethe free energy popular in statistical physics The Bethe approximation tends to degrade for factor graphs with many tight loops nearlydeterministic potential functions However good performance has been achieved even under these conditions 14 Klaus Sormann Karner quotSegmentbased stereo matching using belief propagation and a selfadapting dissimilarity measure ICPR 2006 39 L L i Right Image Left Image True Depth Inferred Depth Klaus Sormann Karner quotSegmentbased stereo matching using belief propagation and a selfadapting dissimilarity measure ICPR 2006 gquot 39 xv Kl 7 r 33 Inferred iepth Shape From Shading Results 1 Original Input b Linear Constraint Nodes c Lee amp Km 1 Zheng amp Chellappa Mean Squared En or 108 Mean Squared Error 3390 Mean Squared Error 4240 2 A i f I I39F H u l quot 39V a I 6 qquot V h Good image rerenderings imply that the Lambertian and Integrability constraints were wholly satisfied Improved performance can only be achieved through improving the spatial prior model pZ 17 Common Methods of Statistical Inference in Computer Vision Gradient Descent When PX is smooth and all else fails Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions Sampling Can be slow hard to detect convergence 18 Gradient Descent for Computer Vision Optimization problems in computer vision are often highly nonlinear and very large Be aware of smarter variations of gradient descent Conjugate Gradient Descent L BFGS Genetic algorithms simulated annealing Conjugate Gradient Descent Y For classical gradient descent each search Q the previous Can get zigzag while running along narrow channeb direction is orthogonal to Conjugate Gradient Descent For classical gradient descent each search direction is orthogonal to the previous Can get zigzag while running along narrow channels Conjugate each search direction so it is orthogonal in Mahalanobis space 132 Recall Newton s Method Newton s method nds fx 0 0 Can be used to nd maxima of pX which occur when fX a E a XpX 0 Utilizing the 2nd derivative of P gives faster convergence amp better results 0 Requires computing the DgtltD Hessian matrix of pX D is the dimensionality of X 0 Intractable for large Recall Newton s Method Newton s method nds fx 0 Can be used to nd maxima of pX which occur when NO 8 E a XpX 0 Utilizing the 2nd derivative of P gives faster convergence amp better results Requires computing the DgtltD Hessian matrix of pX D is the dimensionality of X Intractable for large LBFGS L BFGS approximates the Hessian from the last M search directions M is usually around 820 L BFGS does not need to explicitly construct the approximate Hessian it can compute each search direction using only the gradient and the last M directions Many software packages exist for L BFGS Common Methods of Statistical Inference in Computer Vision Gradient Descent when Ple is smooth and all else fails o Belief Propagation Powerful out can be slow can fail for graphswith tight loops 0 Graph Cuts Restrictive limitationson allowable potential functions 0 Sampling Can be slow hard to detect convergence Graph Cuts Using the MaxFlow MinCut Theorem to Optimize MRFs Max Flow Given a weighted directed total ow source to sink is maximized At each vertex besides source a sink ow in ow out 0 5 ow along edge edge capacity Max Flow Given a weighted directed with a sink and a total ow source to sink is maximized At each vertex besides source a sink ow in ow out 0 5 ow along edge edge capacity Max Flow Given a weighted directed graph with a sink and a total ow source to sink is maximized At each vertex besides source a sink ow in ow out 0 5 ow along edge 5 edge capacity Compute Max Flow by searching for augmenting paths simplest augmenting path algorithms are 0n2m dern methods are even a er Min Cut Given a weighted directed graph with a sink and a so rce partition the nodes into sets S andT so the sum ofedges between s andT is minimized Min Cut seems harder to compute But Maximum Flow Minimum Cut EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 2 Image Processing amp Filtering Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approachquot Chapter 7 0 An excellent reference for Fourier Analysis is Ronald Bracewell quotThe Fourier Transform and Its Applications What is Image Processing Image Processing image in 9 image out denoising restoration histogram equalization Image Analysis image in 9 measurements out ex ventricle volume from 2D MRI Computer Vision image in 9 highlevel description out object recognition depth reconstruction Images are Functions 0 We can think ofan image as a functionf 39 f R2 9 R 7 fx y gives the intensity at position x y Realistically we expect the image only to be de ned over a rectangle with a nite range 39 f ubXCd 9 01 A color image is just three functions pasted together We can write this as a quotvectorvalued function f x y gxy bxy Images are Functions Image Processing Function f 712 9 R rml H irt y Functional FRZARAR flapm Functional Transform 1 7a a 72 a R2 a R f 9 Flflu7 v Image Processing Function fZR2 gtR Functional FR2 gtR gtR Generalization of notation FSi 552 l y2 Functional Transform TI 732 a 73 H 732 e73 f H Flflwav Generalization of notation FSi 2 Image Processing Function fR2 gt73 lazylmfmy Functional F7 2R gt73 Generalization of notation 19181110732 32 Functional Transform TR2RHR2 gtR Generalization of notation u v f H mm 0 Flsi 172 y2lua v Image Processing Function sz2 H72 law Hfmy Functional Generalization of notation FSin 332 l Functional Transform TI R2 gt R gt 732 H73 f H Fllew Flsi 562 12u7v Flfy luav Generalization of notation 117 2 Image Processing Range transformation Tlle 0 WWW Some operations preserve the range but change the domain off Tlleav HOW 0211 Linear ShiftInvariant Systems LSIS Linearity O Tlfl Tlgl TlO f g Shift invariance V g f Ideal lens is a LSIS Example of LSIS Defocused image g is a processed version of the focused image f fx gt LSIS gt gx Linearity Brightness variation Shift invariance Scene movement not valid for lenses with nonlinear distortions Convolution 1D De nition Convolution 1D De nition continuous signals gtIlt g1 11 discrete signals gtIlt 933 ha 2D De nition continuous signals 1337 y gtIlt glj y ha u y vgu vdudv discrete signals h1j7 y gtIlt 933 y Z Z hi7 yjgia 12 j continuous signals gtIlt g1 00 h discrete signals gtIlt 933 hg Co nvol utio n Mai y 9W 3 Mai i y jg73 Jquot hay gay fky9hg Convolution Ma y gtIlt 956731 Mai i y j9ij MW gxy xly h 8 Convolution W2 3 961331 Mai i y j9ij hay gMy xWhg Convolution Mai y 96 Mai i y jgi7j hxy gxy Convolution Convolution Mai y 956731 Mai i7 y j9ij Ma y 9117y Mai i y j9iaj Single Pixel 3x3 array set to one h1 1 Each value 19 39 quot T 5 W V L 1 l 5 r m 47 f igg ggg Macy Macy Convolution Convolution M93 2 933731 Mai i y j9i7j h513a3 937 9 lbw i y j9i7j 8x8 array 16x16 array Each value 164 Each value 1256 hxy hxy COHVOIUUOH Border Handling hazy 9 3731 has w jgm 3 For discrete signals Convolve an NgtltN image with a KgtltK lter Result only de ned for NK1 gtlt N K1 output Several Optlons Crop Pad Wrap 16x16 Gaussian with sigma 5 IQ 39 quotquotquotquotquotquotquotquotquotquotquotquotquotquotquot quot l Li hxy EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 30 Graphical Models in Computer Vision PostMidterm Grades I Senesl 2 0 II I gt95 gt90 gt85 gt80 gt75 gt70 gt50 gt25 gt0 Line Fitting using PCA X 11 24 22 31 29 39 41 52 x0 meanX 39 A Na 1 Xo1 XI 2 Xo2 U S eigA A nd the larger eigenvalue in S and extract from U the correct eigenvector s i maxdiagS39 a V i a1xO40 a a2x040a el a1 a2 plote11 e12 39r3939 this is the line that reduce the shortest distance from the point to the line Joint Distribution Gradient A Magnitude Pixel ntensity hist fullhist2DGM2 im hist flipudhist hist histsumhist imshowhist Joint Distribution Normalized for Display Purposes Gradient A Magnitude hist hist fullhist2DGM2 im hist flipudhist histsumhist imshowhist maxmaxhist Pixel gntensity Conditional Distribution PG I Gradient A Magnitude Pixel gntenSIty hist fullhist2DGM2 im hist flipudhist hist histsumhist PGgivenI hist repmatsumhistl sizehistl 1 imshowPGgivenI Conditional Distribution PI G Gradient A Magnitude Pixel ntensity hist fullhist2D GM2 im hist flipud hist hist histsum hist PiligiveniG hist repmatsurnhist2 1 sizehist2 imshowP717giveniG Marginal PI I n 12 M 5 10 IS 20 25 3390 35 in 45 5 plot surn hist 1 Marginal PG plot surn hist2 Conditional Probabil39ty Assume you are a doctor This is the sample space of patients you might see on any given day Outcomes Nonsmoker female diabetic headache good insurance etc Smoker male herniated disk back pain mildly schizophrenic delinquent medical bills etc Carnegie Mellon Conditional Probabil39ty Event Flu PF 002 Carnegie Mellon Conditional Probabil39ty Event Headache H PH o1o Carnegie Mellon Conditional Probabi ty PF 002 PH 010 H F we still need to specify the interaction between flu and headache Define PHF Fraction of F s outcomes which are also in H Conditional Probab39 39ty 089 PF 002 PH 010 009 PHF 050 H F 001 001 H hadache F u Define PHF Fraction of F s outcomes which are also in H Conditional Probabi ty 0 89 which patient has a hadache 039 worlds with u and hadache F worlds with u 0 01 0 01 Size of H and Fquot re ion H haadachen Size of Fquot region F PH F PF PHF Fraction of u worlds in Conditional Probability De nition Iannd B are events in S and PB gt 0 then the conditional probability of Agiven B Written PAlB is PAB PA B PB The Chain Rule A simple rearrangement ofthe above equation yields PA B PA BPB Main Bayes Net c oncept Probabilistic Inference Have a headache Coming down with Fluquot H PH 010 F PF 002 PHF 050 One day you wake up with a headache You think Drat 50 of us are associated with headaches so I must have a 50 50 chance of coming down with flu Is this reasoning good Probabilistic Inference H Have a hmdachequot F Coming down with Fluquot H PH 010 F PF 002 PHF 050 PFlH P H PH PFH PHF PF 050x002 01 010 What we iust did PAB PAB PB PBA PA PA This is Bayes Rule Bays Thomas 1763 An essay towards solving a problem in the doctrine of chances DphICE Transactions of Royal Society of L 53 370 418 n I u More General Forms of Bayes PM W PA 13 C PB C u PB APA PB APA PB AP A PB A CPA C More General Forms of Bayes Rule PM 393 quotPalmu PB I APA n Independence De nition Two events Aand B are statis ically independent if PAB PAPB Which is equivalent to PA B PA Mu Important for Bayes Nets Representing PABC 0 How can we represent the function PA PAB PABC Mu The Joint Probab t1 Table Recipe for making a joint distribution of M variables Make a truth table listing all combinations of values of your variables if there are M boolean variables then the table will have rows N For each combination ofvalues say how probable it is Pquot If you subscribe to the axioms of probability those numbers must III A n n n n 1 1 1 1 Example PA B C B C Prob n 1 man u 1 nus 1 U n1n 1 1 nus n n nus n 1 mm 1 1 125 1 1 n1n Using the Joint One you have the JPT you can ask for the probability of any logical expression Carnegie Mellon what is PPoorMale gender hoursworked wealth Female v0405 poor 0253122 I rich 00245395 I v1405 poor 00421763 I rich 00116293 Male v0405 poor 0331313 I rich 00971295 I v1405 poor 0134106 I rich 0105933 I PE rows matchin E E Prow Joint PPoor 07604 Carnegie Mellon Using the gender hoursworked wealth Female v0405 poor 0253122 I rich 00245395 I v1405 poor 00421763 I rich 00116293 Male v0405 poor 0331313 I rich 00971295 I v1405 poor 0134106 I rich 0105933 I PE Epaow rowsmatchingE what is PMalelPoor Using the Joint PPoor Male 04654 Carnegie Mellon gender hoursworked wealth Female v0405 poor 0253122 I rich 00245395 I v1405 poor 00421763 I rich 00116293 Male v0405 poor 0331313 I rich 00971295 I v1405 poor 0134106 I rich 0105933 I PE P mwmgmngow what is PPoor Inference with the Joint Carnegie Mellon Joint Carnegie Mellon Inference with the PEi IE2 gender hoursworked wealth Female v0405 poor 0253122 I rich 00245395 I v1405 poor 00421763 I rich 00116293 Male v0405 poor 0331313 I rich 00971295 I v1405 poor 0134106 I rlch 0105933 I Prow M PEz Prow rows ingEz PMale Poor 04654 07604 0612 gender hoursworked wealth 0253122 Female v0405 poor rich 00245395 I v1405 poor 00421763 I rich 00116293 Male v0405 poor 0331313 I rich 00971295 I v1405 poor 0134106 I rich 0105933 I E Prow PEsz rows ingEliiinltE2 P Carnegie Mellon Inference is a big deal I ve got this evidence What s the chance that this conclusion is true I ve got a sore neck how likely am to have meningitis There s a thriving set of industries growing based around Bayesian Inference Highlights are Medicine Pharma Help Desk Support Engine Fault Diagnosis Learn39ng a JPT Build a Joint Probability table for your attributes in which t e probabilities are unspeci ed Then ll in each row with records matching row Prow 39 A Prob total number of records I 39 Prob B u u u u u 1 u 1 1 u 1 u 1 1 1 1 Aasasasan Assamese B u u 1 1 u u 1 1 Aasasasan Fraction of all records in which A and B are True but C is False Where are we 0 We have recalled the fundamentals of probability 0 We have become content with what JPTs are and how to use them 0 And we even know how to learn JPTs from data M 1 7 M u 33 34 Dens t1 Estimation Summag The Good News 0 Our Joint Probability Table JPT learner is our first The JPT allows us to learn PX from data example of something called Density Estimation 0 A Density Estimator learns a mapping from a set of can do Inferenge39 PE1E2 Automatic Doctor Recommender etc attributes to a Probability Can do anomaly detection spot suspicious incorrect records eg credit card fraud Attributes Pmbabi39ity Can do Bayesian classification Predict the class 0 a eg predict cancerousnotcancerous r M u 1 M 35 35 Summag The Bad News Using a test set Set Size Log likelihood DenSIty estimation With JPTs is triVial mindless Training Set 196 4661905 and dangerous Test Set 196 6146157 An independent test set with 196 cars has a much worse log likelihood than it had on the training set actually it s a billion quintillion quintillion quintillion quintillion times less likely Density estimators can over t And the JPT estimator is the over ttiest of them a r M 1 M If this ever happens it means there are that we learn are impossible 75W nll Overfitting Density aInrc mpg madelveav maker had my amen3 M7551 m m certaIn comblnatlons mpg classma mas amen3 mamm gm man amen3 mum 75m77 amen3 mum I ma amen3 Emma amas eumpe u u3571 as I Is there a better way The problem with the JPT is that it just mirrors the training data In fact it is just another way of storing the data we could reconstruct the original dataset perfectly from it We need to represent the probability function with fewer parameters y M u Bayes Nets n Bayes Nets 0 What are they Bayesian nets are a framework for representing and analyzing models involving uncertainty 0 What are they used for Intelligent decision aids data fusion 3E feature recognition intelligent diagnostic aids automated free text understanding data mining 0 How are they different from other knowledge representation and probabilistic analysis ols Uncertainty is handled in a mathematically rigorous yet ef cient and sImp e way y M n 10 Bayes Net Concepts hain Rule PAB PA PBA 260nditional Independence vuu PABC PAB A S mgle Bayes Net 0 Let s assume that we already have PMpgHorse PMpg Horse How would you rewrite this using the Chain rule y M u Review Chain Rule PMpg PMpg Horse Pgood 04 P bad 06 Pgood low P goodhigh t P bad low P badhigh PHorselMpg P lowl39good Phighlgood 39 Phighl bad 079 l M n Pgood Plowlgood 4 089 Review Chain 0 Pgood Phighgood 04 011 PMpai PMpg Horse good 04 P bad 06 Pgood 10w 0 36 Pgoodhigh 004 gt t p bad 10w 012 p badhigh 049 PHorsempg p 1wl39good 0 39 p owl d 0 21 Phxghlgood o 11 39highl bad 0 79 Pbad Phighlbad 079 Pbad Plowbad 0et 0e021 39u rilltril l39ln How to Make a Bayes Net PMpg Horse PMpg PHorse Mpg Horse l r u u How to Make a Bayes Net PMpg Horse PMpg PHorse Mpg PMpg Pgood 04 2 bad 06 P Horse Mpg P lowlgood P low bad Phighlgood Phighl bad Horse How to Make a Bayes Net Each node is a Qrobability function Each m denotes conditional degendence P Mpg 2 good P HorseIMpg lowlgood Horse P lowl bad 0 Phighl bad 079 l r u n How to Make a Bayes Net 50 what have we accomplished thus far Nothing MP9 PMpg we ve just Bayes Netified the PMpg Horse JPT using the 1 Chain rule Horse PHorselMpg the real excitement starts wield conditional independence How to Make a Bayes Net Before we continue we need a worthier opponent than puny PMpg Horse We ll use PMpg Horse Accel Note I made these up P MpgHorseAcce1 Pgood 1owslow P badhighfast l M n How to Make a Bayes Net Step 1 Rewrite joint using the Chain rule PMpg H0rseAccel PMpg PH0rse Mpg PAccel Mpg Horse Note Obviously we could have written this 36 different ways iig How to Make a Bayes Net Step 1 Rewrite joint using the Chain rule PMpg H0rseAccel PMpg PH0rse Mpg PAccel Mpg Horse l r u u How to Make a Bayes Net Mpg P MP9 gt Horse PHorseMpg Accel PAccelMpgHorse How to Make a Bayes Net PMpg Pgood 04 M P bad 6 PHorseMpg P lowlgood PAcce1MpgHorse P lowl ba Phighgood Horse Pslowgood low 097 Phighl bad 079 PSlowgoodhigh 015 lt Pslow bad low 090 Pslow badhxgh 005 ml P Eastlgood 1 w 003 PEastgoodhxgh 035 P a tl bad low 10 P ast badhigh 95 Note I made these up too i r u n How to Make a Bayes Net PMpg P Horse Mpg Pslowgood 097 Phighl bad 079 PSlowgoodhxgh 015 P 51 HI bad low 090 P5 I quot 39ihi 005 Accel H 1 1w 003 A Miracle Occurs thigh 035 i 10quot 010 Youaret0ldbyGod0ranotherd0main expert ilhigh 095 that Accei is independent of Mpg given Horse ie PAccel Mpg Horse PAccel H0rse Carnegiel lull How to Make a Bayes Net PMpg Mpg Pgodd 04 bad 06 PHorseMpg v P lowlgood Horse P high bad P Accel Horse Accel P slow lhigh P East lhigh How to Make a Bayes Net PMP9 V Mpg Pgoo d 04 Thankyoudomain expert P bad 06 Nowl only need to PHorseMpg learn 5 parameters instead of7 from mydata 39 P low lgood IOWI d H rse P high Igood My parameter estimates P high I bad will be more accurate as a result PAcce1Horse Accel P slowl high P East I high Independence The Acceleration does not depend on the Mpg once I know the Horsepowerquot This can be speci ed very simply PAccel lMpg Horse PAccel Horse This is a powerful statement It required extra domain knowledge A different kind of knowledge than numerical probabilities It needed an understanding of causation Bayes Nets Forma 39zed A Bayes net also called a belief network is an augmented directed acyclic graph represented by the pair V E where V is a set of vertices E is a set of directed edges joining vertices No loops of any length are allowed Each vertex in V contains the following information A Conditional Probability Table CPT indicating how this variable s probabilities depend on all possible combinations of parental values Bayes Nets Summagy Bayes nets are a factorization of the full JPT which uses the chain rule and conditional independence They can do everything a JPT can do like quick cheap lookups of probabilities The good news We can do inference We can compute any conditional probability P Some variable l Some othervariable values P joint entry HE A E2 oimam39iumz sEIMdEz PE1 EPUoint entry jaim mi mmhingg P051 IE2 The good news We can do inference We can compute any conditional probability P Some variable Some other variable values P j oint entry PEl A E2 jointenn39iesm 39 Eldel PE1IE2 E Suppose you have in binaryvalued variables in your Bayes 2 P j oint entry joint entries matching E2 Net and expression E2 mentions kvariables Carnegie Mellon How much work is the above computation The sad bad news Doing inference JPTstyle by enumerating all matching entries in the joint are expensive Exponential in the number of variables But perhaps there are faster ways of querying Bayes nets In fact if I ever ask you to manually do a Bayes Net inference you ll find there are often many tricks to save you time So we ve just got to program our computer to do those tricks too right Sadder and worse news General querying of Bayes nets is NPcomplete Carnegie Mellon Case Study I Pathfinder system Heckerman 1991 Press Cambridge MA 14000 probabilities Expert consulted to make net 8 hours to determine variables 35 hours for net topology Probabilistic Similarity Networks MIT Diagnostic system for lymphnode diseases 60 diseases and 100 symptoms and testresults 40 hours for probability table values probabilities Apparently the experts found it quite easy to invent the causal links and Pathfinder is now outperforming the world experts in diagnosis Being extended to several dozen other medical domains Carnegie Mellon Bayes Net Info GUI Packages Genie Free Netica Hugin NonGUI Packages All ofthe above have APls BNT for MATLAB AUTON code learning extremely large networks of tens of thousands of nodes Carnegie Mellon Bayes Nets in Computer Vision Yuille s Deformable Eye Template The First Eye Template Image y X Iris 5 Pupil image it Bayes Nets in Computer Vision EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 13 Deformable Templates Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approach Chapter 76 Forsyth amp Ponce quotComputer Vision A Modern Approach Chapter 221 223 Monday Sept 22 Normalized Correlation PCA Hw1 n ues Wednesday Sept 24 PCA Hw 2 Out Friday Sept 27 mage Formation Monday Sept 29 mage Formation amp Stereo Wednesday Oct 1 Stereo Friday Oct 3 Radiosity Hw 2 gt Hw 3 Monday Oct 6 Photometric Stereo Wednesday Oct 8 Shape From Shading Friday Oct 10 Shape From exture Monday Oct 13 exture Classification Wednesday Oct 15 Project deas Lecture Hw 3 n Friday Oct 17 Fall Break Monday Oct 20 exture Synthesis Wednesday Oct 22 Midterm Friday Oct 24 Statistics 0 Natural mages Monday Oct 27 Project Proposals Due Background Statistical Inference Given a joint probability distribution Pf we wish to infer 0 Most likely value of a 0 Expected value of f o Marginals of each variable in 5 Applications speech recognition disease diagnosis fraud detection genetic linkage analysis errorcorrecting codes data compression computer vision Background Statistical Inference Given a joint probability distribution 135 we wish to infer 0 Most likely value of f 0 Expected value of f o Marginals of each variable in 3 Applications 3D Shape Inference PE PShape Image Background Statistical Inference In general statistical inference is NPHard Gradient ascent often struggles with local maxima Key insight exploit local independencies using graphical models Many probability distributions can be factorized 190Hfz iz 32C Example Pa7bacad7 QC f1a 7bacf2b7df3caef4d7e Drawn as a Factor Graph Color Image Range Image Riegl LMSZB60 Relative Brightness An Ecological Explanation and Neurophysiological Basis Friday Sept 26 1200120pm Fraser 547 Deformable Template for Matching Matching based on snakes and gradient descent Requires an initial guess that is close to the real object Can get stuck in local minima Is not attracted to distant features Triangulated deformable templates address this template deformation Efficient Energy Minimization Total energy is the sum of energies at each triangle Every planar graph has a dual The dual of a template should be singly connected Dynamic Programming can be used to minimize energy in OnG3 time g Number of possible vertex locations n Number of triangles Ef cient Energy Minimization Total energy is the sum of energies at each triangle 1 Every planar graph has a dual The dual of a template should be singly connected Dynamic Programming can be used to minimize energy in OnG3 time g Number of possible vertex locations n Number of triangles Efficient Energy Minimization Total energy is the sum of energies at each triangle Every planar graph has a dual The dual of a template should be singly connected Dynamic Programming can be used to minimize energy in OnG3 time g Number of possible vertex locations n Number of triangles Ef cient Energy Minimization Total energy is the sum of energies at each triangle 6 10 9 Every planar graph has a dual The dual of a template should be singly connected Dynamic Programming can be used to minimize energy in OnG3 time gzNumber of possible vertex locations n Number oftriangles Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approachquot Chapter 76 Forsyth amp Ponce quotComputer Vision A Modern Approachquot Chapter 221 223 Edge Detection Isn t Perfect Perceptually salient contours are not always near sharp intensity changes Convolution Convolution Filtering Filtering Filtering R EU a I m EH Filtering Filtering Ii Ea Ii E Images as Points Feature detection Negative sum squared Error 2 Points in KXL dimensional space SSEWW ZUWJ 935 la y 3 13339 nyL Matching involves deciding how far apart they are in this space Negative Sum Squared Error Negative Sum Squared Error SSEmy 2W4 gar 2321 35 SSEW 2W4 gm 2321 35 E n I l E Negative sum squared Error Images as Points Feature detection SSW y ZWJ gm z y m2 Rzu E W onessizetemplate NSSE 2filter2template im filter2w imA2 i x1 Points in KXL dimensional space nyL Matching involves deciding how far apart they are in this space Normalized Correlation 1717uvlcosd 3080H Rsu ml Normalized Correlation uU lullvlcosd Normalized Correlation u lullvlcosd Normalized Correlation uU lullvlcosd W onessizetemplate NSSE filter2template im sqrtfilter2W imA2 4O Application 1 Template Matchi g if g Stereo Correspondence Depth can be recovered through the horizontal disparity displacement of points in the two images Contentbased Image Retrieval Jason Kroge Overview 0 Introduction a Query by Content o Methodology o Application a Results o Conclusion Introduction 0 Millions of images in existence 0 We need some way to search for images Textbased search can be inaccurate Difficult and expensive to label all images What if we could quickly and easily perform a search based on content within the image rather than keywords assigned to the image Query by Content 0 Take a userdrawn sketch and match it to images in a database 0 Has applications in Graphic design and intellectual property Retail catalogs Medical imaging Geographical information and remote sensing systems Crime prevention and filtering objectionable images Methodology Used a discrete wavelet transform Extracts features that resemble some form of similarity which a human subject can perceive Features contain information about color texture and shape Wavelet Type Haar wavelet transform Color Space YIQ instead of RGB Truncation 40 most significant coefficients Quantization Stored only the sign of the coefficients 1 or 1 Algorithm pmc Demmp semmym ma nh 1 at calm A 4 AVE while is f 1 din h 4 W2 fmri c Utah 11111 14quot339 lt A2fA2i1ny A hf lt A2 A2f1 and far A lt A and while end 1311 Algorithm continued prm Demmp semmge 39 may0r 1 Our 1 0f calm fur mw 4 1 m r dam D6 CUH1p seA1TayTmw Blur 1 and far far EDI 1 tun r dun Demmp seArm Tm r 1 EDI and fur and pram Application Application Form 1 284 12E3 DOE Application Form Application Results 0 Database queries were surprisingly fast with 1000 images the average query was less than 01 seconds 0 Difficult to determine accuracy small amount of images in the database no time to perform an experiment with human subjects Conclusion 0 Works well if you have already seen the image the image has distinct features 0 The algorithm could be combined with text based search to improve results Eigenveotor Decomposition A is symmetric gt all eigenvalues are real A can be decomposed A RDRT R is a rotation matrix D is a diagonal matrix Cova ance Characterizing Covariance Matrices independent 2 independent 2 dependent 23 00 40 00 40 10 40 X randn500l Distribution of PXy g randn500l 05 l 47 ylXg tu 22 x y 37 scatterZ2l Z22 39r39 44 ja T 2 covZ2 ih W f 09701 09596 gt17 09596 12049 0 t corrcoefZ2 4 jaj 10000 08876 i 08876 10000 392quot 4 2 0 2 4 x X randn500l g randn500l 05 l y l X 9 Z2 X Y Distribution of PXy scatterZ2l Z22 39r39 i 4 Xhat X meanX y meany 1 s Xhat Xhat 3 L gi vec val eigS 27 1w g vec gt 17 Eg f i u f o8723 04890 04890 08723 0 g g f E 4 val 14 gj f i 2243 o i 0 14420 392quot 4 2 0 2 4 vec val vec S X l l l l X randn500l g randn500l 05 l ylX9 Z2Xy scatterZ2l Z22 39r39 Xhat X meanX y meany S Xhat Xhat vec val eigS vec 08723 04890 04890 08723 val 2243 O 0 14420 vec val vec 1 l l l Eige nvecto rs Xhat invvec X randn500l g randn500l 05 l ylX9 Z2Xy scatterZ2l Z22 39r39 Xhat X meanX y meany S Xhat Xhat vec val eigS vec 08723 04890 04890 08723 val 2243 O 0 14420 vec val vec S l l l l Eigenvectors Xhat invvec sqrtinvval What is Matlab MATrix LABoratory EECS 841 Computer Vision Numeric Matlab is not Mathematica or Maple Interactive Brlan POtetZ varia bles are persistent Fall 2008 like many scienti c or graphing calculators Lecture 5 Matlab Edge Detection Accessing MATLAB at KU Finding Help on Matlab EECS has a site license Builtin Help Features Buy it or type help imreadquot Student Version 99 also online at ma e Processin Toolbox 60 g g Matlab Primer httpwvwvittclmedu potetzEE imllMatlabPrimenpdf Google Terminal Mode matlab nodisplay nojvl Via X very slow Using VPN to access license server Only for EECSowned computers Commands You Should Know exl ii plo double ior suri Isdouble while Imread clear ry 2 ch imwri e imshow ones zeros 39 ranspose axis rand randn iliplr ilipud subplo reshape hold on sum prod permu e i squeeze c0nv2 round meshgrid ill er iloor repma ii 2 ll Shil lind his c sub2ind ind2sub Scanning Electron Microscope SEM Image Acquisition Joseph Makarewicz Scanning Electron Microscope SEM SEM Column Creates an image using electrons Electron Gun An electron gun creates an electron beam Scannmg Cells Lenses focus the electron beam onto a small spot on the sample When the electron beam strikes the sample some of the electrons are absorbed and some of the Sample electrons are reflected Electron detectors detect these electrons Raster Pattern An image is formed by sampling 39 the electron detector while scanning the electron beam over the surface of sample Raster pattern is similar to television raster pattern Electron Beam Forming an Image If electron beam position is known the image is easily formed Assign position xy the value from the electron detector SEM images are typically formed in this way If electron beam position is not known is it possible to form an image Sometimes position xy must be inferred from electron detector data Does not need electron beam position Can be used as a redundant system Forming an Image without Position Method for forming image is dependent on raster pattern Sample electron detector while electron beam is rastering to get a series of samples Form image using four parameters Number of initial samples Find the beginning of the image Number of samples per row Find the width of the image Number of samples between rows Find erroneous samples Number of rows Find the height of the image Raster Pattern Series of Samples Ni N311 Nsbr NSF Nibl39 N5 at W I39 l W Fl f W Image h Nspr Nab 1 l ll CED ED l Ix Number of Rows Nr Ex 1 Series of Samples Ex 1 Autocorrelation Autocorrelate series of samples Index of first peakin envelope is equal to Nr N Ex 2 Series of Samples Ex 2 Autocorrelation spr Number of Samples per Row N Ex 1 Series of Samples zoomed Ex 1 Autocorrelation zoomed spr Autocorrelate series of samples Index of first m aj o r peak is eq u Nspr Ex 2 Series of Samples zoomed Ex 2 Autocorrelation zoomed Form First Image Ex 1 First Image Nsprand thave been determined Therefore the image height and width have been determined Form an image assuming Nis and Nsbr are 0 Detect horizontal line for Nis and slanted line for N sbr Number of Initial Samples Nis Ex 1 Horizontal Line Histogram Ex 1 Edges of First Image 0 Horizontal edge detection Continuous Canny Edge Operator Sum horizontal 5 0 10D 20D 30D 40D Sun 6 D max sum Ex 2 Horizontal Line Histogram Ex 2 Edges of First Image 0 Indexofthe horizontal edge I is equal to NiS N spr zuu Form Second Image Ex 1 Second Image 1 i1 x as Nspr N and Nis have been determined Form an image assuming Nsbr is O Detect slanted line for Nsbr Number of Samples between Rows Nsbr Ex 1 Hough Space of Second Slanted edge 39mage detection Continuous Canny Edge Operator Hough Transform maximum Hough Space of Second peak Nsbr is the slope of the slanted line 0 r Form Final Image 39 39 Ex1 Final lmagefixed Fmal Image Z T r 77 formed using 4 lt a 7 r parameters User Interface allows user to fix poorly formed images User Interface Determines parameters when signal is loaded Allows user to edit parameters Allows user to manually detect edges Allows user to save final image I swmgeg 17 Eelvveen Ravvs 1 Skew lnmal Onset mm mm LenRight I Future Work Ex 1 Final Image Ex 1 State of the Art Edge detection fails in many cases Can be fixed by user interface Create a new edge operator State of the art has better image quality Higher resolution 512x512 vs 1024x1024 Slower Scan rate Recontrasting Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern EECS 841 Computer Vision Approach Chapter 151 E R Davies quotMachine Vision Chapters 911 Brian Potetz L G Shapiro G C Stockman quotComputer Vision Fall 2008 Chapter 10 Lecture 9 Active Contours Reparameterization of Lines Reparameterization of Lines p 5Example Points 600 20 40 so so m 120 140 160 0 Two Lines Finding Circles by Hough Transform Finding Circles by Hough Transform Equation of Circle xi 602 yi b2 7 2 Equation of Circle 99 602 yz b2 7 2 If radius is known 2D Hough Space If radius is unknown 3D Hough Space Accumulator Array Aa b Accumulator Array Aa b 1quot Question What is the advantage of this What is he surface in the hough Space Hough transform over Convolution a Red circles locus of the Red circles locus of the center of possible circles center of possible circles b N Finding Circles by Hough Transform Generalized Hough Transform Find contours that match a xed template Template can be any shape Works just like Hough transform for circles of known size Moduli Equation of Circle x agt2 y bgt2 r2 If radius is unknown 3D Hough Space AccumulatorArray Aabr Alterna ively use a 2D Accumulator array Aab and increment it for each radius in some range of possible radii Red circles locus of the Question Why wouldn t this falsely center Of poss39ble C39rdes39 accept ellipses Generalized Hough Transform EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 9 Active Contours Suggested Reading Following Edges Using Graph Searching D H Ballard C M Brown quotComputer Vision 1 Use traditional techniques to nd edges in the Sect 45 A Approach image L G Shapiro G C Stockman quotComputer Vision 2 Convert the Edge image into a WEightEd Section 143 Energy Model dlrechonal graph 3 Find the path of least cost through the graph Kass Wltkln Terzopoulos Snakes Active using the A search algorithm Contour Models IJCV 1988 Energy Model From Edges to Weighted Graphs One way to de ne a directed graph Directed graph Original image Gradient magnitude IVIxyl Let the cost of each arc be M VIXy where Xy is the point on the image corresponding to the destination node of the arc Finding the LeastCost Path The A algorithm 1 Initialize the queue with the path from the source vertex to itself 2 Until the rst path in the queue reaches the destination vertex i Remove the rst path from the queue For each neighbor of the last node in this path create a new path If a new path terminates in a node that has already been explored and no path in the queue terminates in that node delete that new path 39 If a new path terminates in a node that has already been explored and there is a path in the queue that terminates in that node delete the path that has the greatest cost Sort the queue by the cost of the paths 2 Properties of the A Algorithm As long as 1 The actual cost of all paths through the graph is the sum of the costs of each arc it traverses 2 And each arc has positive cost Then A will nd the globallyoptimum least cost path UserDefined Costs In addition weighing paths according to how closely they follow edges in the image we want our snake to respond to advice from the user Some additional costs to weight include Distance to the original estimate Allow the user to discourage or encourage particular regions of the image Limitations on curvature Improving Search Performance When performing A let the cost of each path be the sum of the weights of all arcs traversed plus the some lowerbound estimate of the cost of traversing the remaining distance to the destination node in fact the search algorithm is not called A unless it uses this technique minwj e gt 0 Cost1 2 acn w1gt2 w23 wgnil gn E 39 ibdestz nation Improving Search Performance When performing A let the cost of each path be the sum of the weights of all arcs traversed plus the some lowerbound estimate of the cost of traversing the remaining distance to the destination node in fact the search algorithm is not called A unless it uses this technique minwij a gt 0 Costv12 Jan w1gt2 w2gt3 wgnil gn E 39 destz natz 0n Carefully choose your cost functions Flat terrains take more time to search Improving Search Performance Use the edge direction to eliminate arcs in your graph Improving Search Performance Use the edge direction to eliminate arcs in your graph ee Szi I 39 lti39 ltiF Figure 519 Graph representation ofan edge image a Edge directions corresponding graph NationallNaval Ice Center Amery Sea Icebergs B17B B15K ENVISAT Image 05 June 2008 I 02432 Analyst A61 SW Heisler T Im 04 June 2008 03242 Analysl39 AG1 SW Heisler A Taxonomy and Evaluation of Dense TwoFrame Stereo Correspondence Algorithms Daniel Scharstein Dept of Math and Computer Science Middlebury College Middlebury VT 05753 scharmiddleburyedu Abstract Stereo matching is one of the most active research areas in computer vision While a large number of algorithms for stereo correspondence have been developed relatively lit tle work has been done on characterizing their performance In this paper we present a taxonomy of dense twoframe stereo methods Our taxonomy is designed to assess the dif ferent components and design decisions made in individual stereo algorithms Using this taxonomy we compare exist ing stereo methods and present experiments evaluating the performance of many different variants In order to estab lish a common software platform and a collection of data sets for easy evaluation we have designed a t quotd l n exible C implementation that enables the evaluation of individual components and that can easily be extended to in clude new algorithms We have also produced several new multiframe stereo data sets with ground truth and are mak ing both the code and data sets available on the Web Finally we include a comparative evaluation of a large set of today s bestperforming stereo algorithms 1 Introduction Stereo correspondence has traditionally been and continues to be one of the most heavily investigated topics in computer vision However it is sometimes hard to gauge progress in the eld as most researchers only report qualitative results on the performance of their algorithms Furthermore a sur vey of stereo methods is long overdue with the last exhaus tive surveys dating back about a decade 7 37 25 This paper provides an update on the state of the art in the eld with particular emphasis on stereo methods that I operate on two frames under known camera geometry and 2 pro duce a dense disparity map ie a disparity estimate at each pixe Our goals are twofold 1 To provide a taxonomy of existing stereo algorithms Richard Szeliski Microsoft Research Microsoft Corporation Redmond WA 98052 szeliskimicr0s0ftc0m that allows the dissection and comparison of individual algorithm components design decisions N To provide a test bed for the quantitative evaluation of stereo algorithms Towards this end we are plac ing sample implementations of correspondence algo rithms along with test data and results on the Web at www middlebury edustereo We emphasize calibrated twoframe methods in order to fo cus our analysis on the essential components of stereo cor respondence However it would be relatively straightfor war quot Fr quot quotrm methods in particular multiplebaseline stereo 85 and its planesweep generalizations 30 113 The requirement of dense output is motivated by modern applications of stereo such as view synthesis and image based rendering which require disparity estimates in all im age regions even those that are occluded or without texture Thus sparse and featurebased stereo methods are outside the scope of this paper unless they are followedby a surface tting step eg using triangulation splines or seedand grow m ethods We begin this paper with a review of the goals and scope of this study which include the need for a coherent taxonomy and a well thoughtout evaluation methodology We also review disparity space representations which play a central role in this paper In Section 3 we present our taxonomy of dense twoframe correspondence algorithms Section 4 discusses our current test bed implementation in terms of the major algorithm components their interactions and the parameters controlling their behavior Section 5 describes our evaluation methodology including the methods we used for acquiring calibrated data sets with known ground truth In Section 6 we present experiments evaluating the different algorithm components while Section 7 provides an overall comparison of 20 current stereo algorithms We conclude in Section 8 with a discussion of planned future work 2 Motivation and scope Compiling a complete survey of existing stereo methods even restricted to dense twoframe methods would be a formidable task as a large number of new methods are pub lished every year It is also arguable whether such a survey would be of much value to other stereo researchers besides being an obvious catchall reference Simply enumerating different approaches is unlikely to yield new insights Clearly a comparative evaluation is necessary to assess the performance of both established and new algorithms and to gauge the progress of the eld The publication of a simi lar study by Barron et a1 8 has had a dramatic effect on the development of optical ow algorithms Not only is the per formance of commonly used algorithms better understood by researchers but novel publications have to improve in some way on the performance of previously publishedtech niques 86 A more recent study by Mitiche and Bouthemy 78 reviews a large number ofmethods for image ow com putation and isolates central problems but does not provide any experimental results In stereo correspondence two previous comparative pa pers have focused on the performance of sparse feature matchers 54 19 Two recent papers 111 80 have devel oped new criteria for evaluating the performance of dense stereo matchers for imagebased rendering andtelepresence applications Our work is a continuation of the investiga tions begun by Szeliski and Zabih 116 which compared the performance of several popular algorithms but did not provide a detailed taxonomy or as complete a coverage of algorithms A preliminary version of this paper appeared in the CVPR 2001 Workshop on Stereo and MultiBaseline Vision 99 An evaluation of competing algorithms has limited value if each method is treated as a black box and only nal results are compared More insights can be gained by exam ining the individual components of various algorithms For example suppose a method based on global energy mini mization outperforms other methods Is the reason a better energy function or a better minimization technique Could the technique be improved by substituting different matching costs In this paper we attempt to answer such questions by providing a taxonomy of stereo algorithms The taxonomy is designed to identify the individual components and de sign decisions that go into a published algorithm We hope that the taxonomy will also serve to structure the eld and to guide researchers in the development of new and better algorithms 21 Computational theory Any vision algorithm explicitly or implicitly makes as sumptions about the physical world and the image formation process In other words it has an underlying computational theory 74 72 For example how does the algorithm mea sure the evidence that points in the two images match ie that they are projections of the same scene point One com mon assumption is that of Lambertian surfaces ie surfaces whose appearance does not vary with viewpoint Some al gorithms also model speci c kinds of camera noise or dif ferences in gain or bias Equally important are assumptions about the world or scene geometry and the visual appearance of objects Starting from the fact that the physical world consists of piecewisesmooth surfaces algorithms have builtin smoothness assumptions often implicit without which the our 1 pru ulerrr would be I and ill posed Our taxonomy of stereo algorithms presented in Sec tion 3 examines bothmatching assumptions and smoothness assumptions in order to categorize existing stereo methods Finally most algorithms make assumptions about cam era calibration and epipolar geometry This is arguably the best understood part of stereo vision39 we therefore assume in this paper that we are given a pair of recti ed images as input Recent references on stereo camera calibration and recti cation include 130 70 131 52 39 22 Representation A critical issue in understanding an algorithm is the represen tation used internally and output externally by the algorithm Most stereo correspondence methods compute a univalued disparity function dr7 y with respect to a reference image which could be one of the input images or a cyclopian view in between some of the images Other approaches in particular multiview stereo meth ods use multivalued 113 voxelbased 101 67 34 33 24 or layerbased 125 5 representations Still other ap proaches use full 3D models such as deformable models 120 121 triangulated meshes 43 or levelset methods 38 Since our goal is to compare a large number of methods within one comm on framework we have chosen to focus on techniques that produce a univalued disparity map dr7 y as their output Central to such methods is the concept of a disparity space 17y7d The term disparity was rst intro duced in the human vision literature to describe the differ ence in location of corresponding features seen by the left and right eyes 72 Horizontal disparity is the most com monly studiedphenomenon butvertical disparity is possible if the eyes are verged In computer vision disparity is often treated as synony mous with inverse depth 20 85 More recently several re searchers have de ned disparity as a threedimensional pro jective transformation collineation or homography of 3D space X7 Y7 Z The enumeration of all possible matches in such a generalized disparity space can be easily achieved with a plane sweep algorithm 30 113 which for every disparity d projects all images onto a common plane using a perspective projection homography Note that this is different from the meaning of plane sweep in computational geometry In general we favor the more generalized interpretation of disparity since it 39 f to the geometry of the input cameras 113 94 we plan to use it in future extensions of this work to multiple images Note that plane sweeps can also be generalized to other sweep surfaces such as cylinders 106 In this study however since all our images are taken on a linear path with the optical axis perpendicular to the cam era displacement the classical inversedepth interpretation will suf ce 85 The 17y coordinates of the disparity space are taken to be coincident with the pixel coordinates of a reference image chosen from our input data set The corre spondence between a pixel my in reference image 7 and a pixel 1 y in matching image m is then given by r I I8dI7y y y 1 where s i1 is a sign chosen so that disparities are always positive Note that since our images are numbered from leftmost to rightmost the pixels move from right to left Once the disparity space has been speci ed we can intro duce the concept ofa disparity space image orDSI 127 18 In general aDSI is any image or function de ned over a con tinuous or discretized version of disparity space 17y7d In practice the DSI usually represents the con dence or log likelihood ie cost of a particular match implied by dI y The goal of a stereo correspondence algorithm is then to produce a univalued function in disparity space dr7 y that best describes the shape of the surfaces in the scene This can be viewed as nding a surface embedded in the dispar ity space image that has some optimality property such as lowest cost and best piecewise smoothness 127 Figure 1 h r quot quot tvpicalDSI 2 of this kind can be found in 18 3 A taxonomy of stereo algorithms In order to support an informed comparison of stereo match ing algorithms we develop in this section a taxonomy and categorization scheme for such algorithms We present a set of algorithmic building blocks from which a large set of existing algorithms can easily be constructed Our taxonomy is based on the observation that stereo algorithms generally perform subsets of the following four steps 97 96 1 matching cost computation N cost support aggregation39 E disparity computation optimization39 and He disparity re nement The actual sequence of steps taken depends on the speci c algorithm For example local windowbased algorithms where the disparity computation at a given point depends only on iner 1L ale 39 39 nite window usuallymake implicit smoothness assumptions by aggregating support Some of these algorithms can cleanly be broken down into steps 1 2 3 For example the traditional smnofsquareddifferences SSD algorithm can be described as the matching cost is the squared difference of intensity values at a given disparity39 N aggregation is done by summing matching cost over square windows with constant disparity39 E disparities are computed by selecting the minim al win ning aggregated value at each pixel Some local algorithms however combine steps 1 and 2 and use a matching cost that is based on a support region eg norm alized crosscorrelation 51 19 and the rank transform 129 This can also be viewed as a preprocessing step39 see Section 31 On the other hand global algorithms make explicit smoothness assumptions and then solve an optimization problem Such algorithms typically do not perform an ag gregation step but rather seek a disparity assignment step 3 that minimizes a global cost function that combines data step 1 and smoothness terms The main distinction be tween these algorithms is the minimization procedure used eg simulated annealing 75 6 probabilistic mean eld diffusion 97 or graph cuts 23 In between these two broad classes are certain iterative algorithms that do not explicitly state a global function that is to be minimized but whose behavior mimics closely that of iterative optimization algorithms 73 97 132 Hierar chical coarseto ne algorithms resemble such iterative al gorithm s but typically operate on an image pyramid where results from coarser levels are used to constrain a more local search at ner levels 126 90 11 31 Matching cost computation The most common pixelbased matching costs include squared intensity di erences SD 51 1 77 107 and abso lute intensity di erences AD 58 In the video processing community these matching criteria are referred to as the meansquared error MSE and mean absolute di erence MAD measures the term displaced frame di erence is also often used 118 More recently robust measures including truncated quadratics and contaminated Gaussians have been proposed 15 16 97 These measures are useful because they limit the in uence of mismatches during aggregation Figure 1 Slices through a typical disparity space image DSI a original color image b groundtruth disparities cie three 1 y slicesfor d 107 167 21 e an m d slicefor y 151 the dashed line in Figure 17 Di erent dark matching regions are visible in Figures cie e g the bookshelves table and cans and head statue while three di erent disparity levels can be seen as horizontal lines in the m d slice Figure w Note the dark bands in the various DSIs which indicate regions that match at this disparity Smaller dark regions are often the result of textureless regions Other traditional matching costs include normalized crosscorrelation 51 93 19 which behaves similar to sum ofsquareddi erences SSD and binary matching costs ie match no match 73 based on binary features such as edges 4 50 27 or the sign of the Laplacian 82 Bi nary matching costs are not commonly used in dense stereo methods however Some costs are insensitive to differences in camera gain or bias for example gradientbased measures 100 95 and nonparametric measures such as rank and census transforms 129 Of course it is also possible to correct for di er ent camera characteristics by performing a preprocessing step for biasgain or histogram equalization 48 32 Other matching criteria include phase and lterbank responses 74 63 56 57 Finally Birch eld and Tomasi have pro posed a matching cost that is insensitive to image sampling 12 Rather than just comparing pixel values shifted by inte gral amounts which may miss a valid match they compare each pixel in the reference image against a linearly interpo lated function of the other image The matching cost values over all pixels and all disparities form the initial disparity space image CO 30 y d While our study is currently restricted to twoframe methods the ini tial DSI can easily incorporate information from more than two images by simply summing up the cost values for each matching image m since the D81 is associated with a xed reference image r Equation 1 This is the idea behind multiplebaseline SSSD and SSAD methods 85 62 81 As mentioned in Section 22 this idea can be generalized to arbitrary camera con gurations using a plane sweep algo rithm 30 113 32 Aggregation of cost Local and windowbased methods aggregate the matching cost by summing or averaging over a support region in the D81 Cac y d A support region can be either two dimensional at a xed disparity favoring frontoparallel surfaces or threedimensional in acyd space supporting slanted surfaces Twodimensional evidence aggregation has been implemented using square windows or Gaussian convolution traditional multiple windows anchored at dif ferent points ie shiftable windows 2 18 windows with adaptive sizes 84 60 124 61 and windows based on connected components of constant disparity 22 Three dimensional support functions that have been proposed in clude limited disparity difference 5 0 limited disparity gra dient 8 8 and Prazdny s coherence principle 89 Aggregation with a xed support region can be performed using 2D or 3D convolution C967y7d w967 y d 00967 y d 2 or in the case of rectangular windows using ef cient mov ing average box lters Shiftable windows can also be implemented ef ciently using a separable sliding min lter Section 42 A different method of aggregation is itera tive dijj usion ie an aggregation or averaging operation that is implemented by repeatedly adding to each pixel s cost the weighted values of its neighboring pixels costs 114103 97 33 Disparity computation and optimization Local methods In local methods the emphasis is on the matching co st computation and on the co st aggregation steps Computing the nal disparities is trivial simply choose at each pixel the disparity associated with the minimum cost value Thus these methods perform a local winnertake all WTA optimization at each pixel A limitation of this approach and many other correspondence algorithms is that uniqueness of matches is only enforced for one image the reference image while points in the other image might get matched to multiple points Global optimization In contrast globalmethods perform almost all of their work during the disparity computation phase and often skip the aggregation step Many global methods are formulated in an energyminimization frame work 119 The objective is to nd a disparity function d that minimizes a global energy Ed Edi aw AEsmaath 3 The data term Edam measures how well the disparity function d agrees with the input image pair Using the dis parity space formulation Eda1ad Z Comm 4 gay where C is the initial or aggregated matching cost DST The smoothness term Esmaalh d encodes the smooth ness assumptions made by the algorithm To make the opti mization computationally tractable the smoothness term is often restricted to only measuring the differences between neighboring pixels disparities Z pdzyd11vy W wow idzy17 5 Esmooth d where p is some monotonically increasing function of dis parity difference An alternative to smoothness functionals is to use a lowerdimensional representation such as splines 112 In regularizationbased vision 87 p is a quadratic func tion which makes d smooth everywhere and may lead to poor results at object boundaries Energy functions that do not have this problem are called discontinuitypreserving and are based on robust p functions 119 16 97 Geman and Gem an s seminal paper 47 gave a Bayesian interpreta tion of these kinds of energy functions 1 10 and proposed a discontinuitypreserving energy function based on Markov Random Fields MRFs and additional line processes Black and Rangarajan 16 show how line processes can be often be subsumed by a robust regularization framework The terms in Esmaalh can also be made to depend on the intensity differences eg Mamaidr17y IHIIyy71I17yll7 6 where p is some monotonically decreasing function of in tensity differences that lowers smoothness costs at high in tensity gradients This idea 44 42 18 23 encourages dis parity discontinuities to coincide with intensitycolor edges and appears to account for some of the good performance of global optimization approaches Once the global energy has been de ned a variety of al gorithms can be used to nd a local minimum Traditional approaches associated with regularization and Markov Ran dom Fields include continuation 17 simulated annealing 47 75 6 highest con dence rst 28 and mean eld an nealing 45 More recently max ow and graphcut methods have been proposed to solve a special class of global optimiza tion problems 92 55 23 123 65 Such methods are more ef cient than simulated annealing and have produced good results Dynamic programming A different class of global opti mization algorithms are those based on dynamic program ming While the 2Doptimization of Equation 3 can be shown to be NPhard for common classes of smoothness functions 123 dynamic programming can nd the global minimum for independent scanlines inpolynomialtime Dy nam ic programming was rst used for stereo vision in sparse edgebased methods 3 83 More recent approaches have focused on the dense intensitybased scanline optimization problem 10 9 46 31 18 13 These approaches work by computing the minimumcost path through the matrix of all pairwise matching costs between two corresponding scan lines Partial occlusion is handled explicitly by assigning a group of pixels in one image to a single pixel in the other image Figure 2 shows one such example Problems with dynamic programming stereo include the selection of the right cost for occluded pixels and the di lculty of enforcing interscanline consistency al though several methods propose ways of addressing the lat ter 83 9 31 18 13 Another problem is that the dynamic programming approach requires enforcing the monotonicity or ordering constraint 128 This constraint requires that the relative ordering of pixels on a scanline remain the same between the two views which may not be the case in scenes containing narrow foreground objects Cooperative algorithms Finally cooperative algo rithms inspired by computational models of human stereo vision were among the earliest methods proposed for dis parity computation 36 73 76 114 Such algorithms it eratively perform local computations but use nonlinear op erations that result in an overall behavior similar to global optimization algorithms In fact for some of these algo rithms it is possible to explicitly state a global function that is being minimized 97 Recently a promising variant of Marr and Poggio s original cooperative algorithm has been developed 132 34 Re nement of disparities Most stereo correspondence algorithms compute a set of disparity estimates in some discretized space eg for inte ght xmn ne gt a b c a e r g k L scanline Figure 2 Stereo matching using dynamic programming For each pair of corresponding scanlines a minimizing path through the matrix of all pairwise matching costs is selected Lowercase letters wk symbolize the intensities along each scanline Uppercase letters represent the selected path through the matrix Matches are indicated by M while partially occluded points which have a xed cost are indicated by L and R corresponding to points only visible in the left and right image respectively Usually only a limited disparity range is considered which is H in the gure indicated by the nonshaded squares Note that this diagram shows an unskewed xd slice through the DSI ger disparities exceptions include continuous optimization techniques such as optic ow 11 or splines 112 For ap plications such as robot navigation or people tracking these may be perfectly adequate However for imagebased ren dering such quantized maps lead to very unappealing view synthesis results the scene appears to be made up of many thin shearing layers To remedy this situation many al gorithms apply a subpixel re nement stage after the initial discrete correspondence stage An alternative is to simply start with more discrete disparity levels Subpixel disparity estimates can be computed in a va riety of ways including iterative gradient descent and t ting a curve to the matching costs at discrete disparity lev els 93 71 122 77 60 This provides an easy way to increase the resolution of a stereo algorithm with little addi tional computation However to work well the intensities being matched must vary smoothly and the regions over which these estimates are computed must be on the same correct surface Recently some questions have been raised about the ad visability of tting correlation curves to integersampled matching costs 105 This situation may even be worse when samplinginsensitive dissimilarity measures are used 12 We investigate this issue in Section 64 below Besides subpixel computations there are of course other ways of postprocessing the computed disparities Occluded areas can be detected using crosschecking comparing left toright and righttoleft disparity maps 29 42 A median lter can be applied to clean up spurious mismatches and holes due to occlusion can be lled by surface tting or by distributing neighboring disparity estimates 13 96 In our implementation we are not performing such cleanup steps since we want to measure the performance of the raw algorithm components 35 Other methods Not all dense twoframe stereo correspondence algorithms can be described in terms of our basic taxonomy and rep resentations Here we brie y mention some additional al gorithms and representations that are not covered by our framewor The algorithms described in this paper rst enumerate all possible matches at all possible disparities then select the best set of matches in some way This is a useful approach when a large amount of ambiguity may exist in the com puted disparities An alternative approach is to use meth ods inspired by classic in nitesimal optic ow computa tion Here images are successively warped and motion esti mates incrementally updated until a satisfactory registration is achieved These techniques are most often implemente within a coarseto ne hierarchical re nement framework 90118112 A univalued representation of the disparity map is also not essential Multivalued representations which can rep resent several depth values along each line of sight have been extensively studied recently especially for large multi view data set Many of these techniques use a voxelbased representation to encode the reconstructed colors and spatial occupancies or opacities 113 101 67 34 33 24 Another way to represent a scene with more complexity is to use mul tiple layers each of which can be represented by a plane plus residual parallax 5 14 117 Finally deformable surfaces of various kinds have also been used to perform 3D shape reconstruction from multiple images 120 121 43 38 36 Summary of methods Table 1 gives a summary of some representative stereo m atching algorithm s and their corresponding taxonomy i e the matching cost aggregation and optimization techniques used by each The methods are grouped to contrast different matching costs top aggregation methods middle and op timization techniques third section while the last section lists some papers outside the framework As can be seen from this table quite a large subset of the possible algorithm design space has been explored over the years albeit not very systematically 4 Implementation We have developed a standalone portable C implemen tation of several stereo algorithms The implementation is closely tied to the taxonomy presented in Section 3 and cur rently includes windowbased algorithms diffusion algo Method Matching cost Aggregation Optimization SSD traditional squared difference square Window WTA Hannah 51 crosscorrelation square Window WTA Nishihara 82 binarized lters square Window WTA Kass 63 lter banks none WTA Fleet et al 40 phase none phasematching Jones and Malik 57 lter banks none WTA Kanade 58 absolute difference square Window WTA Scharstein 95 gradientbased Gaussian WTA Zabih and Wood ll 129 rank transform square Window WTA Cox et al 32 histogram eq none DP Frohlinghaus and Buhm ann 41 wavelet phase none phasematching Birch eld and Tomasi 12 shifted abs diff none DP Marr and Poggio 73 binary images iterative aggregation WTA Prazdny 89 binary images 3D aggregation WTA Szeliski and Hinton 1 14 binary images iterative 3D aggregation WTA Okutomi and Kanade 84 squared difference adaptive Window WTA Yang et al 127 crosscorrelation nonlinear ltering hier WTA Shah 103 squared difference nonlinear diffusion regularization Boykov et a1 22 thresh abs diff connectedcomponent WTA Scharstein and Szeliski 97 robust sq diff iterative 3D aggregation mean eld Zitnick and Kanade 132 squared difference iterative aggregation WTA Veksler 124 abs diff avg adaptive Window WTA Quam 90 crosscorrelation none hier warp Barnard 6 squared difference none SA Geiger et al 46 squared difference shiftable Window DP Belhum eur 9 squared difference none DP Cox et al 31 squared difference none DP lshikawa and Geiger 55 squared difference none graph cut Roy and Cox 92 squared difference none graph cut Bobick and Intille 18 absolute difference shiftable Window DP Boykov et a1 23 squared difference none graph cut Kolmogorov and Zabih 65 squared difference none graph cut Birch eld and Tomasi 14 shifted abs diff none GC planes Tao et al 1 17 squared difference color 39 WTA regions Table 1 Summary taxonomy of several dense twoframe stereo correspondence methods The methods are grtmped to contrast di rent matching costs top 39 39 1 middle 1 r 39 39 39 techniques third v rm The last v rm lists mm papers mtside our amework Key to abbreviations hier 7 hierarchical coarseto ne WTA 7 winnertakeall DP 7 dynamic programming SA 7 simulated annealing GC7 graph cut rithms as well as global optimization methods using dy namic programming simulated annealing and graph cuts While many published methods include special features and postprocessing steps to improve the results we have chosen to implem ent the basic versions of such algorithms in order to assess their respective merits most directly The implementation is modular and can easily be ex tended to include other algorithms or their components We plan to add several other algorithms in the near future and we hope that other authors will contribute their methods to our framework as well Once a new algorithm has been inte grated it can easily be compared with other algorithms using our evaluation module which can measure disparity error and reprojection error Section 51 The implementation contains a sophisticated mechanism for specifying parame ter values that supports recursive script les for exhaustive performance comparisons on multiple data sets We provide a highlevel description of our code using the same division into four parts as in our taxonomy Within our code these four sections are optionally executed in sequence and the perform ancequality evaluator is then in voked A list of the most important algorithm parameters is given in Table 2 41 Matching cost computation The simplest possible matching cost is the squared or ab solute difference in color intensity between corresponding pixels matchfn To approximate the effect of a robust matching score 16 97 we truncate the matching score to a maximal value matchmax When color images are being compared we sum the squared or absolute intensity differ ence in each channel before applying the clipping 1f frac tional disparity evaluation is being performed dispstep lt 1 each scanline is rst interpolated up using either a linear or cubic interpolation lter matchinteip 77 We also optionally apply Birch eld and Tom asi s sampling insensi tive intervalbased matching criterion match interval 12 ie we take the minimum of the pixel matching score and the score at istep displacements or 0 if there is a sign change in either interval We apply this criterion separately to each color channel which is not physically plausible the subpixel shift must be consistent across channels but is easier to implement 42 Aggregation The aggregation section of our test bed implements some commonly used aggregation methods aggnfn 0 Box lter use a separable moving average lter add one rightbottom value subtract one lefttop This im plementation trick makes such windowbased aggrega tion insensitive to window size in terms of computation tim e and accounts for the fast perform ance seen in re al time matchers 59 64 Figure 3 Shiftable window The e ect oftrying all 3 X 3 shifted windows armnd the black pixel is the same as taking the minimum matching score across all centered nonshifted windows in the same neighborhood Only 3 ofthe neighboring shifted windows are shown here for clarity o Binomial lter use a separable FIR nite impulse re sponse lter We use the coef cients 1151 4 6 4 1 the same ones used in Burt and Adelson s 26 Lapla cian pyramid Other convolution kernels could also be added later as could recursive bidirectional HR ltering which is a very ef cient way to obtain large window sizes 35 The width of the box or convolution kernel is controlled by aggr window size To simulate the effect of shiftable windows 2 18 117 we can follow this aggregation step with a separable square min lter The width of this lter is controlled by the param eter aggnmin lter The cascaded effect of a box lter and n nquot l i r min lter39 39 a complete set of shifted windows since the value of a shifted window is the same as that of a centered window at some neighboring pixel Figure 3 This step adds very little additional com putation since a moving 1D min lter can be computed ef ciently by only recomputing the min when a minimum value leaves the window The value of aggrmin lter can be less than that of aggrwind0wsize which simulates the effect of a partially shifted window The converse doesn t make much sense since the window then no longer includes the reference pixel We have also implemented all of the diffusion methods developed in 97 except for local stopping ie regular dif fusion the membrane model andBayesian m ean eld dif fusion While this last algorithm can also be considered an optimization method we include it in the aggregation mod ule since it resembles other iterative aggregation algorithms closely The maximum number of aggregation iterations is controlled by aggriter Other parameters controlling the diffusion algorithms are listed in Table 2 43 Optimization Once we have computed the optionally aggregated costs we need to determine which discrete set of disparities best represents the scene surface The algorithm used to deter Name Typical values Description disp min 0 smallest disparity disp max 15 largest disparity disp step 05 disparity step size match fh SD AD matching function match interp Linear Cubic interpolation function matchmax 0 maximum difference for truncated SADSSD match interval false 12 disparity match 12 aggrfh Box Binomial aggregation function aggr windowsize 9 size of Window aggr min lter 9 spatial min lter shiftable Window aggriter 1 number of aggregation iterations di Jambda 015 parameter A for regular and membrane diffusion di ibeta 05 parameter 5 for membrane diffusion di scalecost 001 scale of cost values needed for Bayesian diffusion di mu 05 parameter M for Bayesian diffusion di sigmaP 04 parameter Up for robust prior of Bayesian diffusion di epsP 001 parameter 5 for robust prior of Bayesian diffusion opt fh WTA DP SA GC optimization function 10 optsmoothness optgrad thresh opt grad penalty opt occlusion cost weight of smoothness term A threshold for magnitude of intensity gradient smoothness penalty factor if gradient is too small cost for occluded pixels in DP algorithm optsavar Gibbs Metropolis simulated annealing update rule optsa start T 00 starting temperature optsa end T 0 01 ending temperature optsaschedule Linear annealing schedule re ne subpix true t subpixel value to local correlation evalbad thresh 10 acceptable disparity error eval textureless width 3 box lter Width applied to V751 2 eval texturelessthresh 40 threshold applied to ltered HVgEI HQ eval disp gap 20 disparity jump threshold eval discont width 9 Width of discontinuity region evalignoreborder 10 number of border pixels to ignore evalpartial shuf e 02 analysis interval for prediction error Table 2 The most important stereo algorithm parameters of our implementation mine this is controlled by optfn and can be one of o winnertakeall W TA39 0 dynamic programming DP 0 scanline optimization SO 0 simulated annealing SA39 0 graph cut GC The winnertakeall method simply picks the lowest aggre gated matching cost as the selected disparity at each pixel The other methods require in addition to the matching cost the de nition of a smoothness cost Prior to invoking one of the optimization algorithms we set up tables containing the values of pd in Equation 6 and precompute the spa tially varying weights p I These tables are controlled by the parameters opt smoothness which controls the over all scale of the smoothness term ie A in Equation 3 and the param eters opt grad thresh and opt grad penalty which control the gradientdependent smoothness costs We currently use the smoothness terms de ned by Veksler 123 p if AI lt optgradthresh pIAI 1 if AI 2 optgraidthresh7 7 where p optgradpenalty Thus the smoothness cost is multiplied by p for low intensity gradient to encourage disparity jumps to coincide with intensity edges All of the optimization algorithms minimize the same objective func tion enabling a more meaningful comparison of their per formance Our rst global optimization technique DP is a dynamic programming method similar to the one proposed by Bobick and lntille 18 The algorithm works by computing the minimumcost path through each xd slice in the DST see Figure 2 Every point in this slice can be in one of three states M match L leftvisible only or R rightvisible only Assuming the ordering constraint is being enforced a valid path can take at most three directions at a point each associated with a deterministic state change Using dynamic programming the minimum cost of all paths to a point can be accumulated e lciently Points in state M are simply charged the matching cost at this point in the DST Points in states L and R are charged a xed occlusion cost opt occlusion cost Before evaluating the nal disparity map we ll all occluded pixels with the nearest background disparity value on the same scanline The DP stereo algorithm is fairly sensitive to this param eter see Section 6 Bobick and lntille address this problem by precomputing ground control points GCPs that are then used to constrain the paths through the DST slice GCPs are highcon dence matches that are computed using SAD and shiftable windows At this point we are not using GCPs in our implementation since we are interested in comparing the basic version of different algorithms However GCPs are potentially useful in other algorithms as well and we plan to add them to our implementation in the future Our second global optimization technique scanline opti mization S0 is a simple and to our knowledge novel ap proach designed to assess different smoothness terms Like the previous method it operates on individual xd DSl slices and optimizes one scanline at a time However the method is asymmetric and does not utilize visibility or ordering con straints Instead a d value is assigned at each point I such that the overall cost along the scanline is minimized Note that without a smoothness term this would be equivalent to a winnertakeall optimization The global minimum can again be computed using dynamic programming however unlike in traditional symmetric DP algorithms the order ing constraint does not need to be enforced and no occlusion cost parameter is necessary Thus the SO algorithm solves the same optimization problem as the graphcut algorithm described below except that vertical smoothness terms are ignored Both DP and SO algorithms suffer from the wellknown dif culty of enforcing interscanline consistency resulting in horizontal streaks in the computed disparity map Bo bick and lntille s approach to this problem is to detect edges in the DST slice and to lower the occlusion cost for paths along those edges This has the effect of aligning depth dis continuities with intensity edges In our implementation we achieve the same goal by using an intensitydependent smoothness cost Equation 6 which in our DP algorithm is charged at all LM and R M state transitions 39 A 39 f 39 J quot uppnrtsboth the Metropolis variant where downhill steps are always taken and uphill steps are sometimes taken and the Gibbs Sampler which chooses among several possible states ac cording to the full marginal distribution 47 In the latter case we can either select one new state disparity to ip to at random or evaluate all possible disparities at a given pixel Our current annealing schedule is linear although we plan to add a logarithmic annealing schedule in the future Our nal global optimization method GC implements the 15 swap move algorithm described in 23 123 We plan to implement the aexpansion in the future We ran domize the 15 pairings at each inner iteration and stop the algorithm when no further local energy improvements are possible 44 Re nement The subpixel re nement of disparities is controlled by the boolean variable re nesubpix When this is enabled the three aggregated matching cost values around the winning disparity are examined to compute the subpixel disparity estimate Note that if the initial DSl was formed with frac tional disparity steps these are really subsubpixel values A more appropriate name might be oating point disparity Our values A parabola is t to these three values the three end ing values are used if the winning disparity is either dispmin or dispmwc If the curvature is positive and the minimum of the parabola is within a halfstep of the winning disparity and within the search limits this value is used as the nal disparity estimate In future work we would like to investigate whether initial or aggregated matching scores should be used or whether some other approach such as LucasKanade might yield higherquality estimates 122 5 Evaluation methodology In this section we describe the quality metrics we use for evaluating the performance of stereo correspondence algo rithms and the techniques we used for acquiring our image data sets and ground truth estimates 51 Quality metrics To evaluate the performance of a stereo algorithm or the effects of varying some of its parameters we need a quan titative way to estimate the quality of the computed corre spondences Two general approaches to this are to compute error statistics with respect to some ground truth data 8 and to evaluate the synthetic images obtained by warping the reference or unseen images by the computed disparity map 1 l 1 In the current version of our software we compute the following two quality measures based onknown groundtruth data39 1 RMS rootmeansquared error measured in disparity units between the computed disparity map do I y and the ground truth map dT I ie l R NZldcI7yrdTI7yl2 7 any 8 where N is the total number of pixels N Percentage of bad matching pixels B Zldc17yr dTr7yl gt 6d 9 T where 60 evalbadthresh is a disparity error toler ance For the experiments inthis paper we use 60 10 since this coincides with some previously published studies 116 13265 In addition to computing these statistics over the whole image we also focus on three different kinds of regions These regions are computed by preprocessing the reference image and ground truth disparity map to yield the following three binary segmentations Figure 4 Nam e Symb Description rms error all R RMS disparity error rms error nonocc R5 quot no occlusions rms error occ R0 quot at occlusions rms error textured R7 quot textured rms error textureless R quot textureless rms error discont Rp quot near discontinuities bad pixels all B bad pixel percentage bad pixe lsnonocc B5 quot no occlusions bad pixe ls occ BO quot at occlusions bad pixe ls textured B7 quot textured bad pixe ls textureless B quot textureless bad pixe ls discont Bp quot near discontinuities predict errnear P view extr error near predict errmiddle P1 2 view extr error mid predict errmatch P1 view extr error m atch predict err far P view extr error far Table 3 Error quality statistics computed by our evaluator See the notes in the text regarding the treatment ofoccluded regions 0 textureless regions 7 regions where the squared hor izontal intensity gradient averaged over a square win dow of a given size eval textureless width is below a given threshold eval textureless thresh o occluded regions 0 regions that are occluded in the matching image ie where the forwardmapped dis parity lands at a location with a larger nearer disparity39 and o depth discontinuity regions D pixels whose neighbor ing disparities differ by more than eval disp gap di lated by a window of width eval discont width T 39 quottn unruan L l fmatch ing results in typical problem areas For the experiments in this paper we use the values listed in Table 2 The statistics described above are computed for each of the three regions and their complements eg 1 BT N 2 Wm 7 Mam lt 6d T oer and so on for R7 B R5 Table 3 gives a complete list of the statistics we collect Note that for the textureless textured and depth discontinu ity statistics we exclude pixels that are in occluded regions on the assumption that algorithms generally do not pro duce meaningful results in such occluded regions Also we exclude a border of evalignoreborder pixels when com puting all statistics since many algorithms do not compute meaningful disparities near the image boundaries Wampum wwlwrux w meWidwwim mm a mummmmmmmm mm mum nwmuwrmmmuwwt m m u WM rM imamaux m m am nu ma nuam mwndwm xomt at 9 mm awryWWW za 7mm WW1 um m u Wm m a tnmlwzm M am mu m a um im 4 am im p d mwl an 5 mm The second major approach to gauging the quality of re construction algorithms is to use the color images and dis parity maps to predict the appearance of other views 111 Here again there are two major avors possible 1 Forward warp the reference image by the computed disparity map to a different potentially unseen view Figure 5 and compare it against this new image to obtain a forward prediction error N Inverse warp anew view by the computed disparity map to generate a stabilized image Figure 6 and compare it against the reference image to obtain an inverse pre diction error There are pros and cons to either approach The forward warping algorithm has to deal with tearing problems if a singlepixel splat is used gaps can arise even between adjacent pixels with similar disparities One pos sible solution would be to use a twopass renderer 102 Instead we render each pair of neighboring pixel as an in terpolated color line in the destination image ie we use Gouraud shading If neighboring pixels differ by more that a disparity of evaldispgap the segment is replaced by sin gle pixel spats at both ends which results in a visible tear light magenta regions in Figure 5 For inverse warping the problem of gaps does not oc cur Instead we get ghosted regions when pixels in the reference image are not actually visible in the source We eliminate such pixels by checking for visibility occlusions rst and then drawing these pixels in a special color light magenta in Figure 6 We have found that looking at the inverse warped sequence based on the groundtruth dispari ties is a very good way to determine if the original sequence is properly calibrated and recti ed In computing the prediction error we need to decide how to treat gaps Currently we ignore pixels agged as gaps in computing the statistics and report the percentage of such missing pixels We can also optionally compensate for small misregistrations 111 To do this we convert each pixel in the original and predicted image to an interval by blend ing the pixels value with some fraction evalpariial shu ie of its neighboring pixels min and max values This idea is a generalization of the samplinginsensitive dissimilarity measure 12 and the shuf e transformation of 66 The reported difference is then the signed distance between the two computed intervals We plan to investigate these and other samplinginsensitive matching costs in the future 115 52 Test data To quantitatively evaluate our correspondence algorithms we require data sets that either have a ground truth dispar ity map or a set of additional views that can be used for prediction error test or preferably both We have begun to collect such a database of images build ing upon the methodology introduced in 1 16 Each image sequence consists of 9 im ages taken at regular intervals with a camera mounted on a horizontal translation stage with the camera pointing perpendicularly to the direction of motion We use a digital highresolution camera Canon G1 set in manual exposure and focus mode and rectify the images us ing tracked feature points We then downsample the original 2048 x 1536 images to 512 x 384 using a highquality 8tap lter and nally crop the images to normalize the motion of background objects to a few pixels per frame All of the sequences we have captured are made up of piecewise planar objects typically posters or paintings some with cutout edges Before downsampling the images we handlabel each image into its piecewise planar compo nents Figure 7 We then use a direct alignment technique on each planar region 5 to estimate the af ne motion of each patch The horizontal component of these motions is then used to compute the ground truth disparity In future work we plan to extend our acquisition methodology to han dle scenes with quadric surfaces eg cylinders cones and spheres Of the six image sequences we acquired all of which are 39 web page Sawtooth and Venus for the experimental study in this paper We also use the University of Tsukuba head and lamp data set 81 a 5 x 5 array of images together with handlabeled integer groundtruth disparities for the center image Finally we use the monochromatic Map data set rst introduced by Szeliski and Zabih 116 which was taken with a Point Grey Research trinocular stereo cam era and whose ground truth disparity m ap was computed using the piecewise planar technique described above Figure 7 shows the reference image and the groundtruth disparities for each of these four sequences We exclude a border of 18 pixels in the Tsukuba images since no groundtruth disparity values are provided there For all other images we use evalignoreborder 10 for the experiments reported in this paper In the future we hope to add further data sets to our collection of standard test images in particular other se quences from the University of Tsukuba and the GRASP Laboratory s Buffalo Bill data set with registered laser range nder ground truth 80 There may also be suitable images among the C1IU Computer Vision Home Page data sets Unfortunately we cannot use data sets for which only a sparse set of feature matches has been computed 19 54 It should be noted that highquality groundtruth data is critical for a meaningful performance evaluation Accurate subpixel disparities are hard to come by however The groundtruth data for the Tsukuba images for example is strongly quantized since it only provides integer disparity estimates for a very small disparity range d 5 14 This is clearly visible when the images are stabilized using planar reglons gomdmm dxspanues Vmus ref mage gomdmm dxspanues groundrtruLh dlsparmes l 39 Map ref 1mage groundrtruLh dlsparmes p afplanar object The gm haw my quot222quotan Image my planar Vegan labelmg and my gmundrtrwh dummy We 1111 my my zmxlmr Txumba head andlamp data m and my manorhmman Map xmagepmr the groundtruth data and viewed in a video loop In contrast the groundtruth disparities for our piecewise planar scenes have high subpixel precision but at the cost of limited scene complexity To provide an adequate challenge for the bestperforming stereo methods new stereo test images with complex scenes and subpixel ground truth will soon be needed Synthetic images have been used extensively for quali tative evaluations of stereo methods but they are often re stricted to simple geometries and textures e g randomdot stereograms Furthermore issues arising with real cam eras are seldom modeled eg aliasing slightmisalignment noise lens aberrations and uctuations in gain and bias Consequently results on synthetic images usually do not extrapolate to images taken with real cameras We have experimented with the University of Bonn s synthetic Cor ridor data set 41 but have found that the clean noisefree images are unrealistically easy to solve while the noise contaminated versions are too dif cult due to the complete lack of texture in much of the scene There is a clear need for synthetic photorealistic test imagery that properly models realworld imperfections while providing accurate ground truth 6 Experiments and results In this section we describe the experiments used to evalu ate the individual building blocks of stereo algorithms Us ing our implementation framework we examine the four main algorithm components identi ed in Section 3 match ing cost aggregation optimization and subpixel tting In Section 7 we per orm an overall comparison of a large set of stereo algorithms including other authors implemen tations We use the Tsukuba Sawtooth Venus and Map data sets in all experiments and report results on subsets of these images The complete set of results all experi ments run on all data sets is available on our web site at www middlebury edu stereo Using the evaluation measures presented in Section 51 we focus on common problem areas for stereo algorithms Of the 12 groundtruth statistics we collect Table 3 we have chosen three as the most important subset First as a measure of overall performance we use B5 the percent age of bad pixels in nonoccluded areas We exclude the occluded regions for now since few of the algorithms in this study explicitly model occlusions and most perform quite poorly in these regions As algorithms get better at matching occluded regions 65 however we will likely focus more on the total matching error B The other two important measures are E and Bp the percentage of bad pixels in textureless areas and in areas near depth discontinuities These measures provide important in formation about the performance of algorithms in two criti cal problem areas The parameter names for these three mea sures are bad pixelsmanacc bad pixels textureless and badpixelsdiscanl and they appear in most of the plots below We prefer the percentage of bad pixels over RMS disparity errors since this gives a better indication of the overall performance of an algorithm For example an al gorithm is performing reasonably well if B5 lt 10 The RMS error gure on the other hand is contaminated by the potentially large disparity errors in those poorly matched 10 of the image RMS errors become important once the percentage of bad pixels drops to a few percent and the qual ity of a subpixel t needs to be evaluated see Section 64 Note that the algorithms always take exactly two images as input even when more are available For example with our 9frame sequences we use the third and seventh frame as input pair The other frames are used to measure the prediction error 61 Matching cost We start by comparing different matching costs including absolute differences AD squared differences SD trun cated versions of both and Birch eld and Tomasi s 12 samplinginsensitive dissimilarity measure BT An interesting issue when trying to assess a single algo rithm component is how to x the parameters that control the other components We usually choose goodvalues based on experiments that assess the other algorithm components The inherent bootstrapping problem disappears after a few rounds of experiments Since the best settings for many pa rameters vary depending on the input image pair we often have to compromise and select a value that works reasonably well for several images Experiment 1 In this experiment we compare the match ing costs AD SD ADBT and SDBT using a local al gorithm We aggregate with a 9 x 9 window followed by winnertakeall optimization ie we use the standard SAD and SSD algorithms We do not compute subpixel esti mates Truncation values used are 1 2 5 10 20 50 and 00 no truncation these values are squared when truncating SD Results Figure 8 shows plots of the three evaluation mea sures Ba B and Bp for each of the four matching costs as a function of truncation values for the Tsukuba Saw tooth and Venus images Overall there is little difference betweenAD and SD Truncation matters mostly for points near discontinuities The reason is that for windows con tainingmi ml 1 J 39 bothfnreormmd and points truncating the matching cost limits the in uence of wrong matches Good truncation values range from 5 to 50 typically around 20 Once the truncation values drop below the noise level eg 2 and 1 the errors become very large Using Birch eldTomasi BT helps for these small trunca tion values but yields little improvem ent for good truncation 60 50 Tsuku ba 40 9x9 30 bad7plxelsinonooc 20 badiplxelsitextureless bad7plxelsidscont 10 0 E emNF E emNF ES SWNF EE SWNF SAD SSD SADBT SSDBT matchimax 60 50 Sawtooth 40 9x9 30 bad7plxelsinonooc 20 badiplxelsitextureless bad7plxelsidscont 10 0 Egaewwr Egaewwr Egaewwr Egaewwr SAD SSD SADBT SSDBT matchimax 60 50 Venus 40 9x9 30 bad7plxelsinonooc 20 I badiplxelsitextureless bad7plxelsidscont 10 0 SAD SSD SADBT SSDBT matchimax Figure 8 Experiment 1 Performance of d rent matching costs aggregated with a 9 X 9 window as a Anction of truncation values matchmax for three di erent image pairs Intermediate truncation values 20 yield the best results Birch eld Tomasi helps when truncation values are low values The results are consistent across all data sets how ever the best truncation value varies We have also tried a window size of 21 with similar results Conclusion Truncation can help for AD and SD but the best truncation value depends on the images signaltonoise ratio SNR since truncation should happen right above the noise level present see also the discussion in 97 Experiment 2 This experiment is identical to the previ ous one except that we also use a 9 X 9 min lter in effect we aggregate with shiftable windows Results Figure 9 shows the plots for this experiment again for Tsukuba Sawtooth and Venus images As before there are negligible differences between AD and SD Now how ever the nontruncated versions perform consistently the best In particular for points near discontinuities we get the lowest errors overall but also the total errors are com parable to the best settings of truncation in Experiment 1 BT helps bring down larger errors but as before does not signi cantly decrease the best nontruncated errors We again also tried a window size of 21 with similar results Conclusion The problem of selecting the best truncation value can be avoided by instead using a shiftable window min lter This is an interesting result as both robust matching costs trunctated functions and shiftable windows have been proposed to deal with outliers in windows that straddle object boundaries The above experiments suggest that avoiding outliers by shifting the window is preferable to limiting their in uence using truncated cost functions Experiment 3 We now assess how matching costs affect global algorithms using dynamic programming DP scan line optimization SO and graph cuts GC as optimization techniques A problem with global techniques thatminimize a weighted sum of data and smoothness terms Equation 3 is that the range of matching cost values affects the optimal value for A ie the relative weight of the smoothness term For e ample A 39 J 39 higher values for A than absolute differences Similarly truncated differ ence functions result in lower matching costs and require lower values for A Thus in trying to isolate the effect of the matching costs we are faced with the problem of how to choose A The cleanest solution to this dilemma would per haps be to nd a different optimal A independently for each matching cost under consideration and then to report which matching cost gives the overall best results The optimal A however would not only differ across matching costs but also across different images Since in a practical matcher we need to choose a constant A we have done the same in this experiment We use A 20 guided by the results dis cussed in Section 63 below and restrict the matching costs to absolute differences AD truncatedby varying amounts For the DP algorithm we use a xed occlusion cost of 20 Results Figure 10 shows plots of the bad pixel percent ages Ba B and Bp as a function of truncation values for Tsukuba Sawtooth and Venus images Each plot has six curves corresponding to DP DPBT SO SOBT GC GCBT It can be seen that the truncation value affects the performance As with the local algorithms if the truncation value is too small in the noise range the errors get very large Intermediate truncation values of 5075 depending on algorithm and image pair however can sometimes improve the performance The effect of Birch eldTomasi is mixed39 as with the local algorithms in Experiments 1 and 2 it limits the errors if the truncation values are too small It can be seen that ET is most bene cial for the SO algorithm how ever this is due to the fact that SO really requires a higher value of A to work well see Experiment 5 in which case the positive effect of ET is less pronounced Conclusion Using robust truncated matching costs can slightly improve the performance of global algorithms The best truncation value however varies with each image pair Setting this parameter automatically based on an estimate of the image SNR may be possible and is a topic for fur ther research Birch eld and Tomasi s matching measure can improve results slightly lntuitively truncation should not be necessary for global algorithms that operate on un aggregated matching costs since the problem of outliers in a window does not exist An important problem for global algorithms however is to nd the correct balance between data and smoothness terms see Experiment 5 below Trun cation can be useful in this context since it limits the range of possible cost values 62 Aggregation We now turn to comparing different aggregation methods used by local methods While global methods typically op erate on raw unaggregated costs aggregation can be useful for those methods as well for example to provide starting values for iterative algorithms or a set of highcon dence matches or graundcanz ralpaints GCPs l 8 usedto restrict the search of dynamicprogramming methods In this section we examine aggregation with square win dows shiftable windows min lter binomial lters reg ular diffusion and membrane diffusion 97 Results for Bayesian diffusion which combines aggregation and opti mization can be found in Section 7 Experiment 4 In this experiment we use nontruncated absolute differences as matching cost and perform a winner takea J 39 39 39 a ter 39 estimation We compare the following aggregation meth ods tap 4 quotk p 1 1 square windows with window sizes 3 5 7 2939 2 shiftable square windows min lter with window sizes 3 5 7 2939 60 50 Tsukuba 40 9x9 min filter 30 badiplxelsinonooc 20 badiplxelsitextureless badiplxelsidlscont 10 0 E emNF E emNF ES SWNF EE SWNF SAD SSD SADBT SSDBT matchimax 60 50 Sawtooth 40 9x9 min filter 30 badiplxelsinonooc 20 badiplxelsitextureless badiplxelsidlscont 10 0 Egaewwr Egaewwr Egaewwr Egaewwr SAD SSD SADBT SSDBT matchimax 60 50 Venus 40 9x9 min filter 30 badiplxelsinonooc 20 badiplxelsitextureless badiplxelsidscont 10 0 SAD SSD SADBT SSDBT matchimax Figure 9 Experiment 2 Performance of dij erent matching costs aggregated with a 9 X 9 shiftable window min lter as a xnction of truncation values matchmaxfor three di erent image pairs Large truncation values no truncation work best when using shiftable windows 60 50 Tsukuba 40 30 bad7plxelsinonocc 200 bad7plxelsitextureless o bad7plxelsidsoont 10 0 gsaeww gsaeww Egaeww ES SWN 2883 E88 DP DPBT SO 5 BT GC GCBT match max 60 50 40 30 Sawtooth badiplxelsinonocc bad7plxelsitextureless bad7plxelsidsoont 20 10 0 matchimax 60 50 40 30 20 10 Venus badiplxelsinonocc bad7plxelsitextureless bad7plxelsidsoont 0 gsaeww gsaeww Egaeww ES SWN 883 3882 DP DPBT SO SOBT GC GCBT matchimax Figure 10 Experiment 3 Performance of dij erent matching costs for global algorithms as a Anction of truncation values matchmax for three di erent image pairs Intermediate truncation values N 20 can sometimes improve the performance 50 40 Tsukuba 30 bad7plxelsinonocc bad7plxelsitextureless bad7plxelsidlscont 20 10 0 mm NW NW cccccccceeccccc WW mewm bm cccececccc window w windowminf w binomial f iter diffusion iter membrane 3 50 40 Sawtooth 30 bad7plxelsinonocc bad7plxelsitextureless bad7plxelsidlscont 20 10 0 mm new New cccccccceeccccc WW vmmwmmvm cccececccc window w windowminf w binomial f iter diffusion iter membrane B 50 40 Venus 30 bad7plxelsinonocc 20 bad7plxelsitextureless bad7plxelsidlscont 10 0 WMWWWWW WWW cccececccc window w windowminf w binomial f iter diffusion iter membrane 3 Figure 11 Experiment 4 Performance of dt erent aggregation methodt as a xnetion of spatial extent window size number of iterations and dt xsion Larger window extents do worse near discontinuitiex butbetter in textureless areas which tend to dominate the overall statistics Near discontinuitiex shiftable windows have the best performance E iterated binomial 14641 lter for 2 4 6 28 iterations 4 regular diffusion 97 for 10 20 30 150 iterations Lquot membrane diffusion 97 for 150 iterations and 09 08 07 00 Note that for each method we are varying the parameter that controls the spatial extent of the aggregation ie the equiv alent of window size In particular for the binomial lter and regular diffusion this amounts to changing the number of iterations The membrane model however converges af ter suf ciently many iterations and the spatial extent of the aggregation is controlled by the parameter 5 the weight of the original cost values in the diffusion equation 97 Results Figure 11 shows plots of B5 B and Bp as a function of spatial extent of aggregation for Tsukuba Saw tooth and Venus images Each plot has ve curves corre sponding to the ve aggregation methods listed above The most striking feature of these curves is the opposite trends of errors in textureless areas B and at points near discon tinuities B1 Not surprisingly more aggregation larger window sizes or higher number of iterations clearly helps to recover textureless areas note especially the Venus images which contain large untextured regions At the same time too much aggregation causes errors near object boundaries depth discontinuities The overall error in nonoccluded regions B exhibits a mixture of both trends Depending on the image the best performance is usually achieved at an intermediate amount of aggregation Among the ve ag gregation methods shiftable windows clearly perform best most notably in discontinuity regions but also overall The other four methods square windows binomial lter regu lar diffusion and membrane model perform very similarly except for differences in the shape of the curves which are due to our somewhat arbitrary de nition of spatial extent for each method Note however that even for shiftable win dows the optimal window size for recovering discontinuities is small while much larger windows are necessary in untex tured regions Discussion This experiment exposes some of the funda mental limitations of local methods While large windows are needed to avoid wrong matches in regions with little texture windowbased stereo methods perform poorly near object boundaries ie depth discontinuities The reason is that such methods implicitly assume that all points within a window have similar disparities If a window straddles a depth boundary some points in the window match at the foreground disparity while others match at the background disparity The aggregated cost function at a point near a depth discontinuity is thus bimodal in the d direction and stronger of the two modes will be selected as the winning disparity Which one of the two modes will win This de pends on the amount of horizontal texture present in the two regions Consider rst a purely horizontal depth discontinuity top edge of the foreground square in Figure 12 Whichever of the two regions has more horizontal texture will create a stronger mode and the computed disparities will thus bleed into the lesstextured region For nonhorizontal depth boundaries however the most prominent horizontal texture is usually the object boundary itself since differ ent objects typically have different colors and intensities Since the object boundary is at the foreground disparity a strong preference for the foreground disparity at points near the boundary is created even if the background is textured This is the explanation for the wellknown foreground fat tening effect exhibited by windowbased algorithms This can be seen at the right edge of the foreground in Figure 1239 the left edge is an occluded area which can t be recovered in any case Adaptive window methods have been developed to com bat this problem The simplest variant shiftable windows min lters can be effective as is shown in the above ex periment Shiftable windows can recover object boundaries quite accurately if both foreground and background regions are textured and as long as the window ts as a whole within the foreground object The size of the min lter should be chosen to match the window size As is the case with all lo cal methods however shiftable windows fail in textureless areas Conclusion Local algorithms that aggregate support can perform well especially in textured even slanted regions Shiftable windows perform best in particular near depth dis continuities Large amounts of aggregation are necessary in textureless regions 63 Optimization In this section we compare the four global optimization tech niques we implemented dynamic programming DP scan line optimization SO graph cuts GC and simulated an nealing SA Experiment 5 In this experiment we investigate the role of apLsmathness the smoothness weight A in Equation 3 We compare the performance of DP SO GC and SA for A 5 10 20 50 100 200 500 and 1000 We use unag gregated absolute differences as the matching cost squared differences would require much higher values for A and no subpixel estimation The number of iterations for simulated annealing SA is 500 Results Figure 13 shows plots of B5 B and Bp as a function of A for Tsukuba Venus and Map images To show more varied results we use the Map images instead of Sawtooth in this experiment Since DP has an extra parameter the occlusion cost we include three runs for True disparities 4 01 Y 1 5 Input image SAD error SADMF SADMF error Figure 12 Illustration of the foreground fattening e ect using the W imqge pair and a 21 X 21 SAD algorithm with and without a min lter T error that without the min lter middle column the 39 maps encode the signed disparity error using gray for 0 light for positive errors and dark fo J r negative errors Note 39 With the menAs me 1 w r1g min lter right column the object boundaries are recover fairly well optocclusioncost 20 50 and 80 Using as before 35 bad pixels nonocc as our measure of ovemll performance it can be seen that the graphcut method GC consistently performs best while the other three DP SO and SA per form slightly worse with no clear mnking among them GC also performs best in textureless areas and near discontinu ities The best performance for each algorithm however requires different values for A depending on the image pair For example the Map images which are well textured and on 39 n inn munimni n d 500 while the Tsukuba images which contain many objects at different depths require smaller values 207200 also de pending on the algorithm The occlusion cost parameter for the DP algorithm while not changing the performance dramatically also affects the optimal value for A Although GC is the clear winner here it is also the slowest algorithm DP and SO which operate on each scanline independently typically run in less than 2 seconds while GC and SA re quire 1 7 39 s Conclusion The graphcut method consistently outper forms the other optimization methods although at the cost of much higherrunru 39mes GC is clearly superior to sim annealing which is consistent with other published results 23 116 When comparing GC and scanline meth ods DP and SO however it should be noted that the latter solve adifferent easier optimization problem since vertical 1 mad quot quot highly e icient dynamic programming techniques it nega tively affects the performance as exhibited in the character 39 39 t 39 39 arity maps see Figures 17 and 18 methods for increas ing in ersc ine consistency in dynamicprogramming ap proaches eg 9 31 13 We plan to investigate this area in future work Experiment 6 We now focus on the gmphcut optimiza tion method to see whether the results can be improved We Ll cost that depends on the intensity gradients Results Figure 14 shows the usual set of performance mea sures Bg Bf and ED for four different experiments for Tsukuba Sawtooth Venus and Map images We use a smoothness weight of A 20 except for the Map images where A 50 The matching cost are nontruncated abso lute differences The parameters for the gradientdependent r gr iments and opt grad penalty 1 2 or 4 denoted p1 p2 and p4 in the plots Recall that the smoothness cost is multi plied by opt grad penalty if the intensity gradient is below optgrad thresh to encoumge disparity jumps to coincide with intensity edges Each plot in Figure 14 shows 4 runs p1 p1B T p2BT and p4B T In the rst run the penalty is 1 ie the gradient dependency is turned off This gives the same results as in Experiment 5 In the second run we 60 50 40 30 badiplxelsinonooc ba dipixelsitexturel ass 0 20 badiplxelsidscont 1 0 0 mooooooo mooooooo mooooooo mooooooo mooooooo mooooooo FNmoooo FNLnoooo FNLnoooo FNmoooo FNLnoooo FNmoooo Fame Fame Fame Fame Fame Fame DP 20 DP 50 DP 80 SO GC SA optismoothness 60 50 Tsukuba 40 30 badiplxelsinonocc badiplxelsitextureless o 20 badiplxelsidlsoont 1 0 0 mooooooo mooooooo mooooooo mooooooo mooooooo mooooooo Fwwoooo Fwwoooo FNADOOOO FNADOOOO Fwwoooo Fwwoooo Fame Fame Fame Fame Fame Fame DP 20 DP 50 DP 80 SO GC SA optismoothness 60 50 Venus 40 30 badiplxelsinonocc badiplxelsitextureless 20 badiplxelsidlsoont 1 0 0 70000000 70000000 70000000 70000000 70000000 70000000 Fwwoooo Fwwoooo FNADOOOO FNADOOOO Fwwoooo Fwwoooo Fwwo FNno FNno Fame Fwwo FNno DP 20 DP 50 DP 80 SO GC SA optismoothness Figure 13 Experiment 5 Performance of global optimization techniques as a Anction of the smoothness weight A opt smoothness for Map T sukuba and Venus images Note that each image pair requires a dijferent value of A for optimal performance Tsukuba nonooo pixeis p1 ET p2BT pABT Sawtooth o co in pixei 5 pi p1 BT p2BT mm Figure 14 Experimentd Performance ofthe 11 r 39 39 p2 p4 and with and without Bimh eld Tomasi add Birch eldTom asi still without a penalty We then add a penalty of 2 and 4 in the last two runs It can be seen that the lowgradient penalty clearly helps recovering the dis continuities and also in the other regions Which of the two penalties works better depends on the im age pair Birch eld Tom asi also yields a slight improvement We have also tried other values for the threshold with mixed results In future work we plan to replace the simple gradient threshold with an edge detector which should improve edge localization The issue of selecting the right penalty factor is closely re lated to selecting the right value for A since it affects the overall relationship between the data term and the smooth ness term This also deserves more investigation Conclusion Both Birch eldTomasi s matching cost and the gradientbased smoothness cost improve the perfor mance of the graphcut algorithm Choosing the right parameters threshold and penalty remains dif cult and imagespeci c We have perform ed these experiments for scanlinebased optimization methods DP and SO as well with similar results Gradientbased penalties usually increase perfor mance in particular for the SO method Birch eldTomasi always seems to increase overall performance but it some times decreases performance in textureless areas As be fore the algorithms are highly sensitive to the weight of the smoothness term A and the penalty factor technique 39 39 39 I p1 ET p2BT pABT p1 ET p2BT pABT 64 Sub pixel estimation Experiment 7 To evaluate the performance of the sub pixel re nement stage and also to evaluate the in uence of the matching criteria and disparity sampling we cropped a small planar region from one of our image sequences Figure 15a second column of images The image itself is a page of newsprint mounted on cardboard with high frequency text and a few lowfrequency white and dark re ions These textureless regions were excluded from the statistics we gathered The disparities in this region are in the order of 0858 pixels and are slanted both vertically and horizontally Results We rst run a simple 9 x 9 SSD window Fig ure 15b One can clearly see the discrete disparity levels computed The disparity error map second column of im ages shows the staircase error and the histogram of dispari ties third column also shows the discretization If we apply the subpixel parabolic t to re ne the disparities the dis parity map becomes smoother note the drop in RMS error in Figure 15c but still shows some soft staircasing which is visible in the disparity error map and histogram as well These results agree with those reported by Shim izu and Oku tomi 105 In Figure 15d we investigate whether using the Birch eldTomasi samplinginvariant measure 12 im proves or degrades this behavior For integral sampling their idea does help slightly as can be seen by the reduced disp re ne Birchf preproc RMS disp disp disp step subpiX Tomasi blur disp error map error histogram a ground truth 0 L b 1 no no no 0296 1 c 1 yes no no 0088 k d 1 yes yes no 0082 W e 1 yes no yes 0135 k f yes no no 005 1 M g i no no no 0087 M h yes no no 0046 M Figure 15 RMS disparity errors for cropped image sequence planar region of newspaper The reference image is shown in row a in the disp error column The columns indicate the disparity step the subpixel re nement option Birch eldTomasi s samplinginsensitive matching option the optional initial blur and the RMS disparity error from ground truth T he rst image column shows the computed disparity map the second shows the signed disparity error and the last column shows a histogram of computed disparities 25 RMS disparity error o rmserrormtegral rmserrorrenned Inverse prediction error middle predld err integral ned Figure 16 Plots ofRil1S disparity error and inverse prediction errors as a metion of disp step and match interval The even data points are with samplinginsensitive matching match interval turned on The second set of plots in each gure is with preprocblur enabled I blur iteration RMS value and the smoother histogram in Figure 15d In all other instances it leads to poorer performance see Fig ure 16a where the samplinginvariant results are the even data points In Figure 15e we investigate whether lightly blurring the input images with a 147 127 14 kernel helps subpixel re nem ent because the rst order Taylor series expansion of the imaging function becomes more valid Blurring does indeed slightly reduce the staircasing effect compare Fig ure 15e to Figure 15c but the overall RMS performance degrades probably because of loss of highfrequency detail We also tried 12 and 14 pixel disparity sampling at the initial matching stages with and without later subpixel re nem ent Subpixel re nement always helps to reduce the RMS disparity error although it has negligible effect on the inverse prediction error Figure 16b From these prediction error plots and also from visual inspection of the inverse warped stabilized image sequence it appears that using subpixel re nement after any original matching scheme is suf cient to reduce the prediction error and the appearance of jitter or shearing to negligible levels This is de spite the fact that the theoretical justi cation for subpixel re nement is based on a quadratic t to an adequately sam pled quadratic energy function At the moment for global methods we rely on the perpixel costs that go into the opti mizationto do the subpixel disparity estimation Alternative approaches such as using local plane ts 5 14 117 could also be used to get subpixel precision Conclusions To eliminate staircasing in the computed disparity map and to also eliminate the appearance of shear ing in reproj ected sequences it is necessary to initially eval uate the matches at a fractional disparity 12 pixel steps ap pear to be adequate This should be followed by nding the minima of local quadratic ts applied to the computed matching costs 7 Overall comparison We close our experimental investigation with an overall com parison of 20 different stereo methods including 5 algo rithms implemented by us and 15 algorithms implemented by their authors who have kindly sent us their results We evaluate all algorithms using our familiar set of Tsukuba Sawtooth Venus and Map images All algorithms are run with constant parameters over all four images We do not perform subpixel estimation in this comparison Among the algorithms in our implementation framework we have selected the following ve 1 SSD 7 21 x 21 shiftable window SSD 2 DP 7 dynamic programming 3 SO 7 scanline optimization 4 GC 7 graphcut optimization and 5 Bay 7 Bayesian diffusion We chose shiftablewindow SSD as bestperforming repre sentative of all local aggregationbased algorithms We are not including simulated annealing here since GC solves the same optimization problem better and more ef ciently For each of the ve algorithms we have chosen xed parame ters that yield reasonably good performance over a variety of input images see Table 4 We compare the results of our implementation with results provided by the authors of the following algorithms 6 Max ow mincut algorithm Roy and Cox 92 91 one of the rst methods to formulate matching as a graph ow problem 7 Pixeltopixel stereo Birch eld and Tomasi 13 scan line algorithm using gradientmodulated costs fol lowed by vertical disparity propagation into unreliable areas SSD DP SO GC Bay Matching cost matchfn SD AD AD AD AD Truncation no no no no no Birch eld Tom asi no yes yes yes no Aggregation aggr window size 21 i i i i aggnmin ller 21 i i i i aggniter 1 i i i 1000 di imu i i i i 05 di JigmaP i i i i 04 di epsP i i i i 001 di Jcale cost 7 i i i 001 Optimization opon WTA DP SO GC Bayesian opLsmooIhness A i 20 50 20 i opt occlusion cost 7 20 i i i opt grad Ihresh i 8 8 8 i optgradpenalzy i 4 2 2 i Table 4 Parameters for the ve algorithms implemented by us 8 Multiway cut Birch eld and Tomasi 14 alternate segmentation and nding af ne parameters for each segment using graph cuts 9 Cooperative algorithm Zitnick and Kanade 132 a new variant of Marr and Poggio s algorithm 73 10 Graph cuts Boykov el al 23 same as our GC method but a much faster implem entation 11 Graph cuts with occlusions Kolmogorov and Zabih 65 an extension of the graphcut framework that ex plicitly models occlusions 12 Compact windows Veksler 124 an adaptive window technique allowing nonrectangular windows 39 13 Genetic algorithm Gong and Yang 49 a global opti mization technique operating on quadtrees 14 Realtime method Hirschmi39iller 53 9 X 9 SAD with shiftable windows followed by consistency checking and interpolation of uncertain areas 15 Stochastic diffusion Lee el al 68 a variant of Bayesian diffusion 97 16 Fast correlation algorithm Muhlmann el al 79 an ef cient implementation of correlationbased matching with consistency and uniqueness validation 17 Discontinuitypreserving regularization Shao 104 a multiview technique for virtual view generation 18 Maximumsurface technique Sun 108 a fast stereo algorithm using rectangular subregions 19 Belief propagation Sun el al 109 aMRF formulation using Bayesian belief propagation 20 Layered stereo Lin and Tom asi 69 a preliminary ver sion of an extension of the multiwaycut method 14 Some of these algorithms do not compute disparity esti mates everywhere in particular those that explicitly model occlusions 3 and 11 but also 16 which leaves low con dence regions unmatched In these cases we ll un matched areas as described for the DP method in Section 43 Table 5 summarizes the results for all algorithms As in the previous section we report B5 badpixels nonocc as a measure of overall performance as well as B bad pixels Iexlureless and Bp bad pixels disconl We do not report B for the Map images since the images are textured almost everywhere The algorithms are listed roughly in order of overall performance The disparity maps for Tsukuba and Venus images are shown in Figures 17 and 18 The full set of disparity maps as well as signed and binary error maps are available on our web site at www middl ebury edustereo Looking at these results we can draw several conclu sions about the overall performance of the algorithms First global optimization methods based on 2D MRFs generally perform the best in all regions of the image overall tex tureless and discontinuities Most of these techniques are based on graphcut optimization 4 8 10 11 20 but belief propagation 19 also does well Approaches that explic itly model planar surfaces 8 20 are especially good with piecewise planar scenes such as Sawtooth and Venus True disparities 10 7 Graph cuts 3 7 Scanline opt 7 7 Pixeltopixel stereo 1 7 SSDMF 5 7 Bayesian diffusion 8 7 Multiway cut 17 7 Discpres regul 16 7 Fast Correlation Figure 17 Comparative results on the T sukuba images The results are shown in decreasing order of overall performance Ba Algorithms implemented by us are marked with a star 28 L7 True disparities 8 Multiway cut 20 Layered stereo L 14 Realtime SAD 10 Graph cuts 4 Graph cuts t 6 Max ow 15 Stochastic diffusion 13 Genetic algorithm 9 Cooperative alg 11 GC occlusions 1 SSDlF 5 Bayesian diffusion 18 Max surface g 17 Discpres regul 7 Pixeltopixel stereo 16 Fast Correlation 3 Scanline opt Figure 18 Comparative results on the Venus images The results are shown in decreasing order of overall performance B5 Algorithms implemented by us are marked with a star 29 Tsukuba Sawtooth Venus Map B5 B Bp B5 B Bp B5 B Bp B5 Bp 20 Layered 158 3 106 4 882 3 034 1 000 1 335 1 152 3 29610 262 2 037 6 524 6 4 Graphcuts 194 5 109 5 949 5 130 6 006 3 634 6 179 7 261 8 691 4 031 4 388 4 19 Beliefprop 115 1 042 1 631 1 098 5 030 5 483 5 100 2 076 2 913 6 08410 527 7 11 GCocc1 127 2 043 2 690 2 036 2 000 1 365 2 27912 53913 254 1 17913 100812 10 Graphcuts 186 4 100 3 935 4 042 3 014 4 376 3 169 6 230 6 540 3 23916 93510 8 Multiw cut 80817 65314 253318 061 4 046 8 460 4 053 1 031 1 806 5 026 3 327 3 12 Compactwin 336 8 354 8 1291 9 161 9 045 7 787 7 167 5 218 4 1324 9 033 5 394 5 14 Realtime 42512 44712 150513 132 7 035 6 921 8 153 4 180 3 1233 7 081 9 113515 5 Bay diff 64916 116219 1229 7 145 8 072 9 929 9 40014 72116 183913 020 1 249 2 9 Cooperative 349 9 365 9 147711 20310 22914 134113 25711 35211 263817 022 2 237 1 1 SSDMF 52315 38010 246617 22111 07210 139715 37413 68215 1294 8 066 8 93510 15 Stoch diff 39510 40811 154915 24514 09011 105810 245 9 241 7 218415 13112 779 9 13 Genetic 296 6 266 7 149712 22112 27616 139614 24910 289 9 230416 10411 109114 7 Pixtopix 51214 70617 146210 23113 17912 149317 63017 113718 145710 050 7 683 8 6 Max ow 298 7 200 6 151014 34715 30017 141916 216 8 224 5 217314 31317 159818 3 Scanl opt 50813 67815 1194 6 40616 26415 119011 94419 145919 182012 18414 102213 2 Dyn prog 41211 46313 1234 8 48419 37119 132612 1010 20 150120 171211 33318 140417 17 Shao 96718 70416 3563 19 425 17 31918 3014 20 60116 67014 439120 23615 330120 16 FastCorrel 97619 1385 20 243916 47618 18713 224918 64818 103617 312918 84220 126816 18 Max surf 111020 107018 4199 20 55120 556 20 273919 43615 47812 411319 41719 278819 Table 5 Comparative performance of stereo algorithms using the three performance measures Ba badpixels nonocc B badpixelstextureless and BD badpixels discont All algorithms are run with constant parameter settings across all images The small numbers indicate the rank of each algorithm in each column The algorithms are listed rmghly in decreasing order of overall performance and the minimum best value in each column is shown in bold Algorithms implemented by us are marked with a star Next cooperative and diffusionbased methods 5 9 15 do reasonably well but often get the boundaries wrong espe cially on the more complex Tsukuba images On the highly textured and relatively simple Map sequence however they can outperform some of the full optimization approaches The Map sequence is also noisier than the others which works against algorithms that are sensitive to internal pa rameter settings In these experiments we asked everyone to use a single set of parameters for all four datasets Lastly local 1 12 14 16 and scanline methods 2 3 7 perform less well although 14 which performs additional consistency checks and cleanup steps does reasonably well as does the compact window approach 12 which uses a so phisticated adaptive window Simpler local approaches such as SSDMF 1 generally do poorly in textureless areas if the window size is small or near discontinuities if the win dow is large The disparity maps created by the scanline based algorithms DP and SO are promising and show a lot of detail but the larger quantitative errors are clearly a result of the streaking due to the lack of interscanline consistency To demonstrate the importance of parameter settings Ta ble 6 compares the overall results B5 of algorithms 15 for the xed parameters listed in Table 4 with the best results when parameters are allowed to vary for each image Note that we did not perform a true optimization over all param eters values but rather simply chose the overall best results among the entire set of experiments we performed It can be seen that for some of the algorithms the performance can be improved substantially with different parameters The global optimization algorithms are particularly sensitive to the parameter A and DP is also sensitive to the occlusion cost parameter This is consistent with our observations in Section 63 Note that the Map image pair can virtually be solved using GC Bay or SSD since the images depict a simple geometry and are well textured More challenging data sets with many occlusions and textureless regions may be useful in future extensions of this study Finally we take a brief look at the ef ciency of dif ferent methods Table 7 lists the image sizes and num ber of disparity levels for each image pair and the run ning times for 9 selected algorithms Not surprisingly Tsukuba Sawtooth Venus Map xed best xed best xed best xed best 1 SSD 523 523 221 155 374 292 066 022 2 DP 412 382 484 370 1010 913 333 121 3 SO 508 466 406 347 944 831 184 104 4 GO 194 194 130 098 179 148 031 009 5 Bay 649 649 145 145 400 400 020 020 Table 6 Overall performance Ba badpixelSnonocc for algorithms 175 both using xed parameters across all images and best parameters for each image Note that for some algorithms signi cant performance gains are possible parameters are allowed to vary for e ach image the speedoptimized methods 14 and 16 are fastest fol lowed by local and scanlinebased methods 1 7 SSD 2 7 DP 3 7 SO Our implementations of Graph cuts 4 and Bayesian diffusion 5 are several orders of magni tude slower The authors implementations of the graph cut methods 10 and 11 however are much faster than our implementation This is due to the new max ow code by Boykov and Kolmorogov 21 which is available at www cs cornell eduPeopIevrrk software html In summary if ef ciency is an issue a simple shiftable window method is a good choice In particular method 14 by Hirschmi39iller 53 is among the fastest and produces very good results New implementations of graphcut methods give excellent results and have acceptable running times Further research is needed to fully exploit the potential of scanline methods without sacri cing their ef ciency 8 Conclusion In this paper we have proposed a taxonomy for dense two frame stereo correspondence algorithms We use this tax onomy to highlight the most important features of existing stereo algorithms and to study important algorithmic com ponents in isolation We have implemented a suite of stereo matching algorithm components and constructed a test har ness that can be used to combine these to vary the algo rithm parameters in a controlled way and to test the perfor mance of these algorithm on interesting data sets We have also produced some new calibrated multiview stereo data sets with handlabeled ground truth We have perform ed an extensive experimental investigation in order to assess the impact of the different algorithmic components The ex periments reported here have demonstrated the limitations of local methods and have assessed the value of different global techniques and their sensitivity to key parameters We hope that publishing this study along with our sample code and data sets on the Web will encourage other stereo researchers to run their algorithms on our data and to report their comparative results Since publishing the initial ver sion of this paper as a technical report 98 we have already received experimental results disparity maps from 15 dif ferent research groups and we hope to obtain more in the future We are planning to maintain the on line version of Table 5 that lists the overall results of the currently best performing algorithms on our web site We also hope that some researchers will take the time to add their algorithms to our framework for others to use and to build upon In the long term we hope that our efforts will lead to some set of data and testing methodology becoming an accepted stan dard in the stereo correspondence community so that new algorithms will have to pass a litmus test to demonstrate that they improve on the state of the art There are many other open research questions we would like to address How important is it to devise the right cost function e g with better gradientdependent smooth ness terms in global optimization algorithms vs how im portant is it to nd a global minimum What are the best samplinginvariant matching metrics What kind of adap tiveshiftable windows work best Is it possible to auto matically adapt parameters to different images Also is prediction error a useful metric for gauging the quality of stereo algorithms We would also like to try other existing data sets and to produce some labeled data sets that are not all piecewise planar Once this study has been completed we plan to move on to study multiframe stereo matching with arbitrary cam era geometry There are many technical solutions possi ble to this problem including voxel representations layered representations and multiview representations This more general version of the correspondence problem should also prove to be more useful for imagebased rendering applica tions Developing the taxonomy and implementing the algo rithmic framework described in this paper has given us a much deeper understanding of what does and does not work well in stereo matching We hope that other researchers will also take the time to carefully analyze the behavior of their own algorithms using the framework and methodology developed in this paper and that this will lead to a deeper understanding of the complex behavior of stereo correspon dence algorithms Tsukuba Sawtooth Venus Map width 384 434 434 284 height 288 380 383 216 disparity levels 16 20 20 30 Time seconds 14 7Realtime 01 0 2 02 01 16 7Ef cient 02 0 3 03 0 2 17SSDMF 11 15 17 0 8 2 7DP 1 0 1 8 19 0 8 3 7 SO 11 2 2 23 1 3 10 7GC 236 483 513 223 11 7 GCocclusions 698 1544 2399 640 4 7 GC 6620 7350 8290 4800 5 7Bay 10550 20490 20470 12360 Table 7 Running times ofselected algorithms on thefour image pairs Acknowledgements Many thanks to Ramin Zabih for his help in laying the foundations for this study and for his valuable input and suggestions throughout this project We are grateful to Y Ohta and Y Nakamura for supplying the groundtruth im agery from the University of Tsukuba and to Lothar Her mes for supplying the synthetic images from the Univer sity of Bonn Thanks to Padma Ugbabe for helping to la bel the image regions and to Fred Lower for providing his paintings for our image data sets Finally we would like to thank Stan Birchfreld Yuri Boykov lIinglung Gong Heiko Hirschmi39iller Vladimir Kolm ogorov Sang Hwa Lee lIike Lin Karsten Muhlm ann S bastien Roy Juliang Shao Harry Shum Changming Sun Jian Sun Carlo Tomasi Olga Vek sler and Larry Zitnick for running their algorithms on our data sets and sending us the results and Gary Bradski for helping to facilitate this comparison of stereo algorithms This research was supported in part by NSF CAREER grant 9984485 References 1 P Anandan A computational framework and an algorithm for the measurement ofvisual motion UCV 232837310 89 19 2 R D Arnold Automated stereo perception Technical Re port AIM351 Arti cial Intelligence Laboratory Stanford University 1983 3 H Baker and T Binford Depth from edge and intensity based stereo In UCAI8I pages 6314336 1981 4 H H Baker Edge based stereo correlation In L S Bau mann editor Image Understanding Workshop pages 1687 175 Science Applications International Corporation 1980 5 S Baker R Szeliski and P Anandan A layered approach to stereo reconstruction In CWR pages 4347441 1998 6 S T Barnard Stochastic stereo matching over scale UCV 31177321989 7 S T Barnard and M A Fischler Computational stereo ACM Comp Surveys l445537572 1982 8 J L Barron D J Fleet and S S Beauchemin Performance of optical ow techniques UCV 12143777 1994 9 P N Belhumeur A Bayesian approach to binocular stere opsis IJCV1932377260 1996 10 P N Belhurneur and D Mumford A Bayesian treatment of the stereo correspondence problem using halfoccluded regions In CWR pages 5067512 1992 J R Bergen P Anandan K J Hanna and R Hingorani Hi erarchical modelbased motion estimation InECCV pages 2377252 1992 12 S Birch eld and C Tomasi Depth discontinuities by pixel to pixel stereo In ICCV pages 107371080 1998 13 S Birch eld and C Tomasi A pixel dissimilarity mea sure that is insensitive to image sampling IEEE TPAMI 2044017406 1998 14 S Birch eld and C Tomasi Multiway cut for stereo and motion with slanted surfaces In ICCV pages 4897495 1999 15 M J Black and P Anandan A framework for the robust estimation of optical ow In ICCV pages 2317236 1993 16 M J Black and A Rangarajan On the uni cation of line processes outlier rejection and robust statistics with appli cations in early vision UCV 19157791 1996 17 A Blake and A Zisserman Visual Reconstruction MIT Press Cambridge MA 1987 18 A F Bobick and S S Intille Large occlusion stereo UCV 333l8l7200 1999 19 R C Bolles H H Baker and M J Hannah The JISCT stereo evaluation In DARPA Image Understanding Work shop pages 2637274 1993 20 R C Bolles H H Baker and D H Marimont Epipolar plane image analysis An approach to determining structure from motion IJCV 17755 1987 21 Y Boykov and V Kolmogorov A new algorithm for energy minimization with discontinuities In Intl Workshop on En ergyMinimization Methods in Computer Vision and Pattern Recognition pages 2057220 2001 Y Boykov O Veksler and R Zabih A variable window approach to early vision IEEE TPAMI 2012128371294 1998 Y Boykov O Veksler and R Zabih Fast approxi mate ener minimization via graph cuts IEEE TPAMI 2311l222il239 2001 A Broadhurst T Drurnmond and R Cipolla A probabilis tic frarnework for space carving In ICCV volume 1 pages 3887393 2001 L G Brown A survey of image registration techniques Computing Surveys 2443257376 1992 P J Burt and E H Adelson The Lap1acian pyramid as a compact image code IEEE Transactions on Communica tions COM3145327540 l983 F C y A computational approach to edge detection IEEE TPAMI 8634413 1986 24 25 26 The theory and practice of Bayesian image labeling IJCV 431857210 1990 S D Cochran and G Medioni 3D surface description from binocular stereo IEEE TPAMI 14109817994 1992 30 R T Collins A spacesweep approach to true multiimage matching In CWR pages 3587363 1996 31 I J Cox S L Hingorani S B Rao and B M Maggs A maximum likelihood stereo algorithm CVIU 6335427 567 1996 I J Cox S Roy and S L Hingorani Dynamic his to am warping of image pairs for constant image bright ness In IEEE International Conference on Image Process ing ICE3 95 volume 2 pages 3667369 IEEE Computer Society 1995 B Culbertson T Malzbender and G Slabaugh Generalized voxel coloring In International Workshop on Vision Algo rithms pages 1007114 Kerkyra Greece 1999 Springer 34 J S De Bonet and P Viola Poxels Probabilistic voxelized volume reconstruction In ICCV pages 4184125 1999 R Deriche Fast algorithms for lowlevel vision IEEE TPAMI l2l78787 1990 P Dev Segmentation processes in visual perception A cooperative neural model COINS Technical Report 74C5 University of Massachusetts at Amherst 1974 U R Dhond and J K Aggarwal Structure from stereoi a review IEEE Trans on Systems Man and Cybern l96l489il510 1989 O Faugeras and R Keriven Variational principles surface evolution PDE s level set methods and the stereo problem IEEE Trans Image Proc 733367344 1998 39 O Faugeras and QT Luong The Geometry ofMultiple Images MIT Press Cambridge MA 2001 40 D J Fleet A D Jepson and M R M Jenkin Phasebased disparity measurement CVGIP 5321987210 1991 41 T Frohlinghaus and J M Buhrnann Regularizing phase based stereo In ICPR volume A pages 4517455 1996 42 P Fua Aparallel stereo algorithm that produces dense depth maps and preserves image features Machine Vision and Applications 635749 1993 32 33 35 36 37 38 43 P Fua andY G Leclerc Objectcentered surface reconstruc tion Combining multiimage stereo and shading IJCV 1635756 1995 E Gamble andT Poggio Visual integration and detection of discontinuities the key role of intensity edges A 1 Memo 970 A1 Lab MIT 1987 D Geiger and F Girosi Parallel and deterministic algo rithms for MRF s Surface reconstruction IEEE TPAMI l3540174l2 1991 D Geiger B Ladendorf and A Yuille Occlusions and binocular stereo In ECCV pages 4257433 1992 47 S Geman and D Geman Stochastic relaxation Gibbs dis tribution and the Bayesian restoration of images IEEE TPAMI 6672li74l 1984 48 M A Gennert Brightnessbased stereo matching In ICCV pages 1397143 1988 49 M Gong and YH Yang Multibaseline stereo matching using genetic algorithm In IEEE Workshop on Stereo and MultiBaseline Vision 2001 IJCV this issue 50 W E L Grimson Computational experiments with afeature based stereo algorithm IEEE TPAMI 7117734 1985 51 M J Hannah Computer Matching ofAreas in Stereo Im ages PhD thesis Stanford University 1974 R I Hartley and A Zisserrnan Multiple View Geometry Cambridge University Press Cambridge UK 2000 H Hirschm ller Improvements in realtime correlation based stereo vision In IEEE Workshop on Stereo andMulti Baseline Vision 2001 IJCV this issue Y C Hsieh D McKeown and F P Perlant Performance evaluation of scene registration and stereo matching for car tographic feature extraction IEEE TPAMI 1422147238 1992 44 52 55 H Ishikawa and D Geiger Occlusions discontinuities and epipolar lines in stereo InECCV pages 2327248 1998 M R M JenkinA D Jepson and J K Tsotsos Techniques for disparity measurement CVGH Image Understanding 53114730 1991 D G Jones and J Malik A computational framework for determining stereo correspondence from a set of linear spa tial lters In ECCV pages 3957410 1992 58 T Kanade Development of avideorate stereo machine In Image Understanding Workshop pages 5497557 Monterey CA 1994 Morgan Kaufmann Publishers T Kanade et al A stereo machine for videorate dense depth mapping and its new applications In CWR pages 1967202 1996 T Kanade and M Okutomi A stereo matching algorithm with an adaptive window Theory and experiment IEEE TPAMI l699207932 1994 S B Kang R Szeliski and J Chai Handling occlusions in dense multiview stereo In CWR 2001 S B Kang J Webb L Zitnick and T Kanade A multi L 1 I 5 5 L I I I I 1 real tlme image acquisition In ICCV pages 88793 1995 63 M Kass Linear image features in stereopsis 143577368 1988 64 R Kimura et al A convolverbasedrealtime stereo machine SAZAN In CVPR volume 1 pages 4577463 1999 56 57 59 60 61 62 IJCV 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 B 8 81 82 83 84 85 86 V Kolmogorov and R Zabih Computing visual correspon dence with occlusions using graph cuts In ICCV volume II pages 5087515 2001 K N Kutulakos Approximate Nview stereo In ECCV volume 1 pages 67783 2000 K N Kutulakos and S M Seitz A theory ofshape by space carving IJCV 3831997218 2000 S H LeeY Kanatsugu andJI Park Hierarchical stochas tic diffusion for disparity estimation In IEEE Workshop on Stereo andMultiBaseline Vision 2001 IJCV this issue M Lin and C Tomasi Surfaces with occlusions from lay ered stereo Technical report Stanford University 2002 In preparation C Loop andZ Zhang Computing rectifying homographies for stereovision In CJPR volume I pages 1257131 1999 B D Lucas and T Kanade An iterative image regis tration technique with an application in stereo vision In Seventh International Joint Conference onArti cial Intelli gence UCAI 8I pages 6747679 Vancouver 1981 D Marr Vision W H Freeman and Company New York 1982 D Marr and T Poggio Cooperative computation of stereo disparity Science 1942837287 1976 D C Marr andT Poggio A computational theory of human stereo vision Proceedings of the Royal Society ofLondon B 2043017328 1979 J Marroquin S Mitter andT Poggio Probabilistic solution of illposed problems in computational vision Journal of theAmerican StatisticalAssociation 8239776789 1987 J L Marroquin Design of cooperative networks Working Paper 253 A1 Lab MIT 1983 L Matthies R Szeliski and T Kanade Kalman lterbased algorithms for estimating depth from image sequences IJCV 32097236 1989 A Mitiche and P Bouthemy Computation and analysis of 39 39 Asynopsisoruumu Jmethod IJCV l9l29755 1996 K Muhlmann D Maier J Hesser and R Manner Cal culating dense disparity maps from color stereo images an e icient implementation In IEEE Workshop on Stereo and MultiBaseline Vision 2001 IJCV this issue J Mulligan V Isler and K Daniilidis Performance evalu ation of stereo for telepresence In ICCV volume II pages 5587565 2001 Y Nakarnura T Matsuura K Satoh andY Ohta Occlusion detectable stereo i occlusion patterns in camera matrix In CVPR pages 3717378 1996 H K Nishihara Practical realtime imaging stereo matcher Optical Engineering 2355367545 1984 Y Ohta and T Kanade Stereo by intra and inter scanline search using dynamic programming IEEE TPAMI 7213971541985 M Okutorni and T Kanade A locally adaptive window for signal matching IJCV 721437162 1992 M Okutorni and T Kanade A multiplebaseline stereo IEEE TPAMI 1543537363 1993 M Otte and HH Nagel Optical ow estimation advances and comparisons In ECCV volume 1 pages 5140 1994 87 88 E 95 96 97 98 E 3 100 101 102 103 104 105 106 107 108 T Poggio V Torre and C Koch Computational vision and regularization theory Nature 31760353147319 1985 S B Pollard J E W Mayhew and J P Frisby PMF A stereo correspondence algorithm using a disparity gradient limit Perception 144497470 1985 Prazdn Detection of binocular disparities Biological Cybernetics 52293i99 1985 L H Q Hierarchical warp stereo In Image Understanding Workshop pages 1497155 New Orleans Louisiana 1984 Science Applications International Corpo ratr S Roy Stereo without epipolar lines A maximum ow formulation IJCV 3423l477161 1999 S Roy and I J Cox A maximum ow formulation of the Ncamera stereo correspondence problem In ICCV pages 4927499 1998 T W Ryan R T Gray and B R Hunt Prediction of cor relation errors in stereopair images Optical Engineering l933l27322 1980 H Saito and T Kanade Shape reconstruction in projective grid space fromlarge number of images In CJPR volume 2 pages 4954 1999 D Scharstein Matching images by comparing their gradient elds In ICPR volume 1 pages 5727575 1994 D Scharstein View Synthesis Using Stereo Vision vol ume 1583 ofLecture Notes in Computer Science LNCS SpringerVerlag 1999 D Scharstein and R Szeliski Stereo matching with nonlin ear diffusion IJCV 282 1557174 1998 D a L J S dense twoframe stereo correspondence algorithms Tech nical Report MSRTR200181 Microso Research 2001 D Scharstein R Szeliski and R Zabih A taxonomy and evaluation of dense twoframe stereo correspondence algo rithms In IEEE Workshop on Stereo and MultiBaseline Vision 2001 P Seitz Using local orientation information as image prim itive for robust object recognition In SPIE Visual Com munications and Image Processing IV volume 1199 pages 163071639 1989 S M Seitz and C M Dyer Photorealistic scene reconstruc tion by voxel coloring IJCV 3521723 1999 J Shade S Gortler LW He and R Szeliski Layered depth images In SIGGRAPH pages 2317242 1998 J Shah A nonlinear diffusion model for discontinuous dis parity and halfocclusion in stereo In CJPR pages 34410 1993 J Shao Combination of stereo motion and rendering for 3d footage display In IEEE Workshop on Stereo andMulti Baseline Vision 2001 IJCV this issue M Shimizu and M Okutorni Precise subpixel estimation on areabased matching In ICCV volume 1 pages 90797 01 1 1 a r 20 HY Shum and R Szeliski Stereo reconstruction from multiperspective panoramas In ICCV pages 14721 1999 E P Simoncelli E HAdelson andD J Heeger Probability distributions of optic ow In CJPR pages 3107315 1991 C Sun Rectangular subregioning and 3d maximum surface techniques for fast stereo matching In IEEE Work shop on Stereo andMultiBaseline Vision 2001 IJCV this issue 109 J Sun H Y Shum andN N Zheng Stereo matching using belief propagation InECCV 2002 Submitted 110 R Szeliski BayesianModeling ofUneertainty inLow Level Vision Kluwer Academic Publishers Boston MA 1989 111 R Szeliski Prediction error as a quality metric for motion and stereo In ICCV pages 7817788 1999 112 R Szeliski and J Coughlan Splinebased image registra tion IJCV 223l9972l8 1997 113 R Szeliski and P Golland Stereo matching With trans parency and matting IJCV 321454l 1999 Special Issue for Mair Prize papers 114 R Szeliski and G Hinton Solving randomdot stereograms using the heat equation In CVPR pages 2847288 1985 115 R Szeliski and D Scharstein Symmetric subpixel stereo matching In ECCV 2002 Submitted 116 R Szeliski and R Zabih An experimental comparison of stereo algorithms In International Workshop on VisionAl gorithms pages 1719 Kerkyra Greece 1999 Springer 117 H Tao H SaWhney and R Kumar A global matching framework for stereo computation InICCV volume I pages 5327539 2001 118 M Tekalp Digital Video Processing Prentice Hall Upper Saddle River NJ 1995 119 D Terzopoulos Regularization of inverse visual prob lems involving discontinuities IEEE TPAMI 844l34124 86 19 120 D Terzopoulos and K Fleischer Deformable models The Visual Computer 463067331 1988 121 D Terzopoulos and D Metaxas Dynamic 3D models With local and global deformations Deformable superquadrics IEEE TPAMI l3770377l4 1991 122 Q Tian and M N Huhns Algorithms for subpixel registra tion CVGH 352207233 1986 123 O Veksler E eient Graphbased Energy Minimization Methods in Computer Vision PhD thesis Comell Univer sity 1999 124 O Veksler Stereo matching by compact Windows via mini mum ratio cycle In ICCV volume I pages 5407547 2001 125 J Y A Wang and E H Adelson Layered representation for motion analysis In CWR pages 3617366 1993 126 A Witkin D Terzopoulos and M Kass Signal matching through scale space UCV ll33il44 1987 127 Y Yang A Yuille and J Lu Local global and multilevel stereo matching In CVPR pages 2747279 1993 128 A L Yuille and T Poggio A generalized ordering constraint for stereo correspondence AI Memo 777 Al Lab MIT 1984 129 R Zabih and J Wood ll Nonparametric local transforms for computing visual correspondence In ECCV volume II pages 1517158 1994 130 Z Zhang Determining the epipolar geometry and its uncer tainty ArevieW UCV 2721617195 1998 131 Z Zhang A exible new technique for camera calibration IEEE TPAMI 221 1 133071334 2000 132 C L Zitnick and T Kanade A cooperative algorithm for stereo matching and occlusion detection IEEE TPAMI 2276757684 2000 EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 28 Optical Flow Optical Flow Motion of brightness pattern in the image Ideally Optical flow Motion field Aperture Problem Aperture Problem Optical Flow Constraint Equation xu8tyv5t D quot3 Optical Flow Velocities uv LY Ly Displacement time t nme t 8t x y u 6tv5t Assume brightness of patch remains same in both images Ex u 6ty v 6tt5t Exyt Assume small motion First order Taylor expansion of E Exyt 6x 6y E6t E mExyt 0x y at Optical Flow Constraint Equation ax 5y 5t o 6x y at Divide by at and take the limit 6t gt 0 EEQEE dt 6x dt y at Constraint Equation Exu Ey vE 0 Optical Flow Constraint Equation 6E 6E 5x7 i6ti 6 5y at Divide by 5t and take the limit 6t a 0 u EQEE d1 61 d By a Constraint Equatiun ExuEy vE 0 quot39 NOTE uv must lie on a straight line We can compute Ex Ey 5 using gradient operators 0 But uv cannot be found uniquely with this constraint Optical Flow Constraint Intuitively what does this constraint mean The component ofthe flow in the gradient direction is determined The component ofthe flow parallel to an edge is nknown Computing Optical Flow Discrete Optical Flow Algorithm Formulate Error in Optical Flow Constraint 2 z ExuEyvE dxdy Wu We need addition ai con straintsi Smoothness Constraint as in snaoe rrom snading and stereo sually mutlun field varies smoothly in ne image so penalize departure rreim smootnness 2 2 2 2 e f ux uyvx vy dxdy WK Find u v at each image point tnat MlNlMlZES e e M Weighting 1 c iaetur Considerimage pixel 1quot Departure rrom Smoothness Constraint Z 2 Z Uh1 39 i m1 39 u 2 1 virlJ 1 VI 11 Vt Error in Optical Flow constraint equation cg EV ug EV vv E902 We seekthe set 14y amp vy hat minimize e22sy7ecy Discrete Optical Flow Algorithm Differentiating e Wrt v5 4 u and settingto zero 35 7 H u H u I2u u2EuEyv12Ex 0 a g2v E2AEfu EfvEfiEquot 0 u E amp a are averages of uv around pixel kl Example Update Rule H7 H 7 H um 7 Ex uEy vE H u u 2 u 2 17E Ey Hi H7 u u T 15 142Ey V2Et v vH 17Ef39 Ef E Q Q Optical FIOW RESU Low Texture Region Bad gradients have small magnitude Edges soso aperture problem High Textured Region Good lar e rad39ents all the same 9 g I gradients are different large magnitudes Revisiting the Small Motion Assumption Revisiting the Small Motion Assumption Is this motion small enough Is this motion small enough Probably not it s much larger than one pixel 2ncl order terms dominate Probably not it s much larger than one pixel 2ncl order terms dominate How might we solve his problem How might we solve this problem Reduce the Resolution Coarsetofine Optical Flow Estimation u125 pixels quot2395P Xe s u5 pixels image H 10 PiXeIJ Gaussian pyramid of image H Gaussian pyramid of image I Coarsetofine Optical Flow Estimation run iterative OF upsample l l x v l x l Ill l run Iteratlve OF quot l l 1 i l image H Gaussian pyramid of image H Gaussian pyramid of image I Future Lectures Concepts useful to many class projects Statistical methods in computer vision statistical models of natural images statistical inference approaches to energy minimization Modern Object Recognition Systems SFT features Segmentation Superpixel approaches ScaleInvariant Feature Transform SIFT Superpixels OverSegmentation Probabilistic Methods for Vision Probabilistic approaches are useful in several ways Setting parameters according to the statistics of real world problems Provides natural ways to decompose problems into subproblems Bayes rule chain rule conditional independence Approach optimization with techniques from statistical inference Typical Snake Energies 1 Elasticity EezasticvO aslv sl2ds 1 1 Stiffness Estiffnessv O Bslv sl2ds 1 Edge Proximity EedgevO iv1s7ys 2ds User interaction Elmo O1 Userzsysds Smoothness Constraint for SF S In nature objects are cohesive and typically have smooth surfaces Smoothness constraint relates surface normals of neighboring surface points Minimize Esmoothness l l penalize rapid changes in surface normals over the image Modern Stereo Algorithms Often use colorbased adaptive windows Clean up results using a Global Energy Function ie Minimize WSW SE functiun dfxy E01 Edatad t Esmoothd 2 Edatad ZCIlEftxay7Irig x t dWMMD Ry 7 24mm 7 ma 7 mg Z ltdltzy 1 7 dltz7ygtgt S m o ot h n e ss P ri 0 rs SSEESISSELTZSSIIIS MM Ad Gaussian Approach Lorentzian Approach BMW 7 2mm Ly 7 my Z ltdltm 1 7 my Every Energy Function has a Probabilistic Interpretation Recall the energy function we minimized for stereo Ed Em AEWmd Every Energy Function has a Probabilistic Interpretation Recall the energy function we minimized for stereo Ed Em AEsmcth EM Ileft lugn Edamltdgt Ileft lugn Es rvwathd Every Energy Function has a Probabilistic Interpretation Recall the energy function we minimized for stereo Ed Em AEsmcth EdIzethmgm EdatadgtIleftgtImght Es rvwathd 1 Pdl122f lugn Z 6XPEd122fImghz Every Energy Function has a Probabilistic Interpretation Recall the energy function we minimized for stereo Ed Em AEsmcth EM Ileft lugn Edamltdgt Ileft lugn Es rvwathd 1 Pdl122f lugn Z 6XPEd Ilefz LagLn 1 Z expEdatadgt Ileft Imam expEs7vwathd Probability A Review ofthe Basics Points I d like to cover Large joint discrete distributions Marginals Condit39ionals Bayes Rule Factorizing large probability distributions EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 12 Deformable Templates Suggested Reading Yuille Hallinan Cohen quotFeature Extraction from Faces Using Deformable Templatesquot IJCV 1992 Felzenszwalb quotRepresentation and Detection of Deformable Shapesquot PAMI 2005 Comparing the Two Techniques Energy Minimization allows for more exible energy functions A is guaranteed to nd a globally optimum solution so long as conditions are met Energy minimization can get stuck in holes Energy Minimization snakes are not attracted to curves that are far away Facial Tracking Using Snakes Partbased models 0 Pictorial structures Constellation of parts 0 Rigid parts arranged in a deformable configuration Each part represents local Visual properties Spatial configuration captured by statistical model or springlike connections template uou m Yuille s Deformable Eye Template The First Eye Template I image y i 7177 quot 777757 7 Iris 5 Pupil 39 ra i ix 39 aquot r u Whites h c r 2 p 37 l y A Tech7 r z 39 rh7f2 2 l I l Yuille s Deformable Eye Template Deformable Template for Matching Matching based on snakes and gradient descent Requires an initial guess that is close to the real object Can get stuck in local minima Is not attracted to distant features Triangulated deformable templates address this template deformation Deformable Template for Matching Matching based on snakes and gradient descent Requires an initial guess that is close to the real object Can get stuck in local minima Is not attracted to distant features Triangulated deformable templates address this Deformable Template for Matching Triangulated deformable templates address this template deformation Energy function has two parts A Data term that encourages outer contours to lie on image edges Efficient Energy Minimization Energy function has two parts A Data term that encourages outer contours to lie on image edges A Prior term that inhibits excessive deformation A Prior term that inhibits excessive deformation l7 Efficient Energy Minimization Energy function has two parts A Data term that encourages outer contours to lie on image edges A Prior term that inhibits excessive deformation Total energy is the sum of energies at each triangle En Z EdgeoostT1 T2 T3 WarpoostT1 T2 T3 T Ef cient Energy Minimization Total energy is the sum of energies at each triangle 1 Ef cient Energy Minimization Total energy is the sum of energies at each triangle 6 10 9 Ef cient Energy Minimization Total energy is the sum of energies at each triangle Ef cient Energy Minimization Total energy is the sum of energies at each triangle 1 4 Every planar graph has a dual The dual ofa template should be sineg connected Dynamic Programming can be used to minimize energy in OnGa time Ef cient Energy Minimization Total energy is the sum of energies at each triangle 1 The dual ofa template should be singly connected Dynamic Programming can be used to minimize energy in OnGa time Ef cient Energy Minimization Total energy is the sum of energies at each triangle 1 4 Every planar graph has a dual The dual ofa template should be singly connected Dynamic Programming can be used to minimize energy in OnG3 time gZNumberufpussiblevertex locations n Numberuftriangle Using Gradient Descent Nonoptimal local minima are a serious problem Initial Esti mate v x Object recognition 0 Simple nearest neighbor approach Find minimum deformation necessary to transform object into one of the stored examples 9Rt000 999cyee9 object models Using a good shared deformation model we get high classification accuracy with few training examples Discussion 0 Deformable templates give a simple and compact representation highly variable obj ects 0 We can use a generic deformation model to represent large families of objects using a few examples from each class 0 Efficient matching algorithms Exploit structure in classes of models Can find optimal match of model to image quickly Robust to occlusion noise etc EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 20 Photometric Stereo amp ShapefromShading Suggested Reading Photometric Stereo Forsyth amp Ponce quotComputer Vision A Modern Approach Section 54 Shape from Shading Zhang Tsai Cryer and Shah quotShape from shading A surveyquot IEEE Trans Pattern Analysis and Machine Intelligence 218690 706 1999 Computing Surface Normals Suppose that points my 2 in the 3D scene were described by z Am SI fth t tl39 627 Z 1 opeo e angen ll39le a 7 A2 32 7 4 mm 3 Tangent vector Ax 0 A2 X 17 0710 Surface normal is perpendicu ar to the tangent plane 07171 X 10pp7qr1 Note that his is not a unit vector 3 Re ectance map For a xed lighting arrangement viewing direction and surface BRDF the Re ectance Map Rp q describes the pixel intensity given a particular surface normal For a surface of constant albedo and BRDF if the illumination conditions don t vary across the surface then L06 y P411067 mm y where L is pixel intensity lfalbedo varies across the surface but the BRDF is otherwise static we can Write LOW 9zyyRqx7typw7 31 Lambertian Reflectance Map Lambertian Equation L 1 cos 9 Lambertian RC0 17 BI 1 103 M 7 z Reflectance Map 7 7 1 102 q 1 103 qg 1pp qq c 1p aw q 1 For Lambem39an surface contours of constant brightness Rq c are nested conic surface sections in the p q plane Maximum ofRCpq is at pq Sunace nuvmal mav l2 ammeve on the mace mm Dune Lambertian Re ectance Map lsurhvluhlness cunluuv e 90 vPaa10 Note Rpq l5 maximum when pq pg 45 NonLambertian Re ectance Map 1 cosBl Rpq 7 cost diffuse term specular term Diffuse peak Glossy surfaces TorranceSparrow re ectance model Photometric Stereo What if I have different images of o The same scene From the same viewing angle Lit by different lighting conditions The BRDF is known for every point in the scene Generally the BRDF will be constant throughout the scene and o The illumination conditions are known throughout the scene Generally this means no shadows lighting directions are known and all incoming light rays are parallel a Photometric Stereo Photometric Stereo Photometric Stereo I hotometric Stereo Photometric Stereo Ii e1 e2 e3 69q33 Cw Solving the Equations More then I hree Light Sour Gee L v Get better results by uSir Ig more I ghtihg ehglee 1 81 e L 5 L2 g pm 1 1 L3 egf E 2 p T e z T LN 5 11 3e 3 3 axe L S le Mxl 7391 ee Sill STL STS 0 e e STS718TL W 77 el l Z Solve for p n as before Moore pequotrose Pseudo 39quotVErE e Depth from Nor male Assumir Ig Orthographic ProjectiOr I ete of equetlone one per eeier enennei ltxy y ltpltxy y qltxy 3 co 5 L R 0 R Sn 8e e 4 pltx 3 3e LG pG Sn gt L S e DScrete Approxlmathr IS of Differentiation B 6 B 7 L ee 3 Derivetive et eeueeieh a ax G e i r teei f i g n az Then substitute Known 1 into ebeve equations to get I ninite Dinereneee a elte 1 y e elte 2 pg pe 95 Or Comblne three nenneie enei eeive for n L L e an L2G L23 0 SR Computing Surface Normals Suppose that points ac y z in the 3D scene were described by zfx7y dz Slope of the tangent line p z T 87 A2 z I q l Ax l 9y Tangent vector A33 0 Az I X 1 019 Surface normal is perpendicu ar to the tangent plane 07 17 X 17 p7 q7 Note that his is not a unit vector 19 Depth from Normals Assuming Orthographic Projection Wm P7yaQ7y 1const 82 Discrete Approximations of Differentiation 82 8 atlal n 1 Derivative of Gaussian gtllt z Finite Differences g Z z 1 y z y ac Each normal gives us two linear constraints on 2 One for dzdx one for dzdy Together all constraints form a matrix equation A2 pq that is both overconstrained and underconstrained does not specify absolute depth MoorePenrose matrix pseudoinverse A A391A can also solve underconstrained linear systems For all x that satisfy Ax b A A391A x has the least magnitude Limitations Big problems Doesn t work for shiny things semitranslucent things Shadows interreflections Smaller problems Camera and lights have to be distant Calibration requirements measure light source direc ions intensities camera response function Original Images Results Shape File EdlL law Insert Innis magma winduw Help a Fde V E Mae staeneu w a ew may als Desktop Window Help lt Vl Ta ig 39a eoa D Eilig 4 r t 7 7 Shallow reconstruction Accurate recon st39rTIcfign effect of interreflections after removing interreflections Results Albedo FigureZ File Edit View Insert Tools Desktop Window Help D lh QQWID Elli D a No Shading Information Original Images Results Albedo Results Shape Estimate light source directions Compute surface normals Compute albedo values Estimate depth from surface normals Relight the object with original texture and uniform albedo EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 34 Graph Cuts amp Gibbs Sampling Suggested Reading Graph Cuts Boykov Veksler amp Zabih quotFast approximate energy minimization via graph cuts ICCV 2001 Gibbs Sampling Common Methods of Statistical Inference in Computer Vision Gradient Descent When PX is smooth Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions Sampling Can be slow hard to detect convergence Gradient Descent for Computer Vision Optimization problems in computer vision are often highly nonlinear and very large Be aware of smarter variations of gradient descent 39 Conjugate Gradient Descent 39 LBFGS 39 Genetic algorithms simulated annealing Conjugate Gradient Descent For classical gradient descent each search direction is orthogonal to the previous 39 Can get zigzag while running along narrow channels Conjugate Gradient Descent For classical gradient descent each search direction is orthogonal to the previous Can get zig zag while running along narrow channels Conjugate each search direction so it is orthogonal in Mahalanobis space 352 T2 Recall Newton s Method Newton s method nds fx 0 0 Can be used to nd maxima of pX which occur when 8 fXEa X X0 Utilizing the 2nd derivative of P gives faster convergence amp better results 0 Requires computing the DgtltD Hessian matrix of pX D is the dimensionality of X 0 Intractable for large D Recall Newton s Method Newton s method nds fx 0 Can be used to nd maxima of pX which occur when 8 fX a X X 0 Utilizing the 2nd derivative of P gives faster convergence amp better results Requires computing the DgtltD Hessian matrix of pX D is the dimensionality of X Intractable for large DA LBFGS L BFGS approximates the Hessian from the last M search directions M is usually around 820 L BFGS does not need to explicitly construct the approximate Hessian it can compute each search direction using only the gradient and the last M directions Many software packages exist for L BFGS Common Methods of Statistical Inference in Computer Vision Sampling Can be slow hard to detect convergence Gradient Descent When PX is smooth Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions Graph Cuts Using the MaxFlow MinCut Theorem to Optimize MRFs At each vertex besides source amp sink ow in ow out Max Flow Given a weighted directed graph with a sink and a source assign ow values to each edge so that the total ow source to sink is maximized O 3 ow along edge 3 edge capacity Max Flow Given a weighted directed graph with a sink an a source ssign ow values to each edge so th t e total ow source to sink is maximized Max Flow Given a weighted directed graph with a sink and a to each edge so th tthe total ow source to sink is maximized At eaeh vertex At eaeh vertex besides source a sink besides souree a sink o r ow out Wm ow out 0 5 ow along edge 5 edge eapaertv 0 5 ow along edge 5 edge eapaertv Compute Max Flow by searching for augmenting paths simplest augmenting path a1gonthms are Modem methods are even faster MaxFlow MmCut Given any ow F and any cut S the ow across the cut equals the ow into target node t Flow through pipes cannot exceed capacity so Flowf S CapacityS So mincu 2 max ow 10 6 Value 24 MaxFlow Mm Cut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towards T 2 empty pipes moving away from T Because otherwise I could construct an augmentingparh and F would not be optimal cupcmly 15 aw 14 M WM 2E 30 3 ms ob mm 8 80 MaxFlow MmCut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towar s T 2 empty pipes moving away from T Because otherwise I could construct an augmentingparh and F would not be optimal 7 4 0 15 15 o 10 4 E 9 5 a E 10 4 1o 40 o 15 o 5 cupcmly 15 aw 14 M Wm 28 30 2 MaxFlow Mm Cut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towar s T 2 empty pipes moving away from T Because otherwise I could construct an augmentingparh and F would not be optimal capcmly 15 ow 14 4 30 z 4 plt N m It MaxFlow MmCut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towar s T 2 empty pipes moving away from T Because otherwise I could construct an augmentingpalh and F would not be optimal capcmty 15 r aw 14 25 tigi my a MaxFlow Mm Cut Suppose ow F is maximal There must be some cut S composed only of 1 full pipes moving towar s T 2 empty pipes moving away from T Because otherwise I could construct an augmentingparh and F would not be optimal The capacity ofS equals the ow ofF Since mincut Z maxflow then minicut maxi ow tummy 15 r ow I M M 4 30 7 39 Graph Cuts Cost of pixel p not belonging to class a Cost ofpixel p and q not belonging to the same class It Cost ofpixel p not belonging to class 3 Graph Cuts GraphBased Segmentation Felzenszwalb amp Huttenlocher Efficient GraphBased Image Segmentation IJCV 2004 27 Common Methods of Statistical Inference in Computer Vision Gradient Descent When PX is smooth Belief Propagation Powerful but can be slow can fail for graphs with tight loops 0 Graph Cuts Restrictive limitations on allowable potential functions 0 Sampling Can be slow hard to detect convergence Gibbs Sampling Goal Given a multidimensional distribution try to draw samples from it IfI can accumulate a large database of samples I can easily find the most likely sample or the expected value of some variable Gibbs Sampling Guess X1 using the marginal PXl Then guess X2 using the conditional Plexr Then guess X3 using the conditional PX3lx1 X2 Etc 1 Initialize 1 2 i l 7quot 2 Foni139 lrll lrl SampleJ wimpy J i 7 Sample 19 praglav ft r 4 if I 39 lrli l39l r i rl 7 Sample t wimp 39ar quotur39 r 5amp1ewf WWW if 413 EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 3 Image Processing amp Filtering Suggested Reading Forsyth amp Ponce quotComputer Vision A Modern Approach Chapter 7 An excellent reference for Fourier Analysis is Ronald Bracewell The Fourier Transform and Its Applications Convolution hfva y 9 y hJJ i y gfhj MW gxy fixy h g Convolution hJ a 3 9aa y hiv i y jgiaj 16x16 array Each value 1256 Properties of Convolution haa y 9 y hfv i y j9ij For xed lter hxy Z J Linear Shift Invariant System Linear ah91 5hg2 h 0491 592 ShiftInvariant gtlt Z gtlt a a Every LSIS is a convolution operator As a binary operator on images Commutative f 9 Z 9 f Associative fghfgh Distributes over Addition f g h f g f h f derivatives exist 8 8 f9 87f 9 hxy fxy 2h g Fourier Transforms and M 00 nae Wm Don t forget em cos 9 isin 9 Jft 2 OO ft cos27twtdt z39OO ft sin27twtdt Fourier Transforms Hm Fltwgt 00 nae Welt Don t forget em cos 6 isin 9 7ft 2 00 ft oos27twtdt z39OO ft sin27twtdt oo ftcos27tiwt 6dt ReFw 3086 ImFw Sing 00 f ftcos2m wt may d6 FW2 Fourier Transforms 2D Continuous Fourier Transform firm Fltuvgt fxye 2mw ydxdy f liFmvH u v 00 Fm memwnvwdudv FXv Fourier Transforms 2D Continuous Fourier Transform fifmn Hm foo fltccygte Wwvygtdxdy f lmuwn m U 00 Fltuvgte2mltwvygtdudv a l a u or t A 4 CO I p FXv Fourier Transforms 2D Continuous Fourier Transform fmn mum 00 faye 2 w ydaidy f lmuwn M v 00 Fltuvgte2 ltwvygtdudv FXv Fourier Transforms 2D Continuous Fourier Transform firm Fltuvgt fltxygte WltWvygtdasdy rum 22 M v 00 Fm memwwdudv log FXy 2 Fourier Transforms 2D Continuous Fourier Transform Examples of 1D Fourier Transforms 11m r w W illlllillli w mm m L l L 7 Examples of 1D Fourier Transforms Hi Yealw lln I ac oklcnlallj TOOK quotthe glider traOSfo 39l cl 5 Ca l Meow l I if m Properties of Fourier Transforms Flftgtl m2 nae Wait Linearity afos gt aFw ow Properties of Fourier Transforms Flftl Fm nae Wait Linearity a t gt ozFw ow Similarity j fat Lag lUl Properties of Fourier Transforms Flftgtl W nae Wait Linearity a t gt aFw pop Similarity fm iF lUl 39 Shift flft an e WWFM Properties of Fourier Transforms Flftl Fm fte 2 tdt Linearity a t 3905 aFw 60w Similarity j quotlfat LN lUl Shift fl t an e Qmmw Transform of real functions are Hermitian Properties of Fourier Transforms Properties of Fourier Transforms Flftl Fw naeWm Hm Fm nae Walt Linearity Faft 5905 aFw 50w 39 Linea my floa t 3905 041700 56100 Similarity J quotfat am Similarity ff0t 2 am Shift Hm 1 ememw 39 Shift Flft 0 e ZTWFM Transform of real functions are Hermitian Transform of real functions are Hermitian Differentiation ft gmwpw Differentiation f t 2mwpw Convolution f g FwGw Co nvo l utI o n Way 9y ZZhW iay JMWJ gt 73 j Convolve I 16x16 array Each value 1256 hxy fxy h g gt gt Convolve Convolve Fourier Transform Fourier Transform gt Multiply Conv olve Fourier Transform Fourier Transform Inverse Fourier TTransform Multiply Actually Shon gt Convolve Fourier Transform Fourier Transform V gt Multiply gt Convolve Fourier Transform Inverse Fourier TTransform V gt Multiply 4Hquot g f powe spect um Conv olve Fourier Transform Inverse Fourier TTransform Multiply up Actua y Shn lg fpowe spct um Examples of 1D Fourier Transforms Exai x39nples of Fourier transtku n39ls fit Fluflewle Fit 2 fl39tjleTJL fclt x h T Unit impulse 6x ll Pam 25inaw o 33 1 t 1 a 0 a t V 0 mV c 1 C g u Unitstep a 2612 h 1 15 L 2 M 21ru a o 3 Ft 3 2m ca IL 2a 2 2 k a m e 0 Ft 0 3 E f a mad 39x39 a e e o 39t 0 53 Actually Shown lg of powe spect um L r Convolve Fourier Transform Inverse Fourier TTransform V L 7 Multiply I y 7 Actually Shown log of powe spect um L r Convolve Fourier Transform Inverse Fourier TTransform V L 7 Multiply Conv olve Fourier Transform V L r M ultiply Actua ly Shown lg of powe spect um Conv olve Fourier Transform V L r M ultiply Actua ly Shown If powe spect um L r Conv olve O Fourier Transform Inverse Fourier TTransform V 39P r M ultiply I y Actua ly Shown log of powe spect um Convolve Fourier Transform Inverse Fourier TTransform Multiply Actua ly Shown 0g of powe spect um Com olve Fourier Transform Multiply Actually Shown log of powe spect um Convolve Fourier Transform Multiply 1 of powe spect um Actua ly Shown log Com olve Fourier Transform 23quot Inverse Fourier TTransform Multiply Aually Shown log of powe spect um GaUSSIan gt i 4 Gaussian Function Convolve 1 2 gltxgt 0 27f Fourier Transform 1 ac2y2 09 gm 0227f V Multiply GaUSSIan GaUSSIan Gaussian Function Gaussian Function 1 2 1 2 gx e202 gm e2a2 0 27f 0 27f 1 M 1 9x3y 02276 202 91 022We 202 Fourier transform of Gaussian is Gaussian Fourier transform of Gaussian is Gaussian 1 2 2 2 2 1 2 2 2 2 Gu 27r0uv Guyv 27r0uv 022W2 022W2 Product of two Gaussians is Gaussian Gaussian Gaussian Function 1 22 a 6 g 0 2 1 0022y2 02276 20 Fourier transform of Gaussian is Gaussian 1 271202 u2v2 GUU m6 Product of two Gaussians is Gaussian Convolution of two Gaussians is Gaussian Convolve Fourier Transform Multiply Actua ly Shown log of powe spect um Deconvolution 0 In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Deconvolution 0 In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain l I Power Spectrum of Gaussian Filter Reciprocal Power Spectrum of Gaussian Filter Inverse Filtering Ampli es Noise Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain lrl Reciprocal Power Spectrum of Gaussian Filter Power Spectrum ofGaussian Filter Inverse Filtering Ampli es Noise N Weiner Filtering OutputSignal 305 05 nt Known Convolution Filter Unknown Input Signal Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Inverse Filtering Ampli es Noise Noise OutputSignal F yt ft t 7115K U nlmown In put Signal Weiner Filtering Known Con olution Filter Yw FwXw Nw Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Inverse Filtering Ampli es Noise Noise Weiner Filtering OutputSlgnal a yt ft gtllt 9515 nt Know Con olution Filter Yw FwXw Nw A 1 WWI2 X0 W W lYltwgtl2 lNltwgtl2 Unknown Input Signal Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Inverse Filtering Ampli es Noise Noise OutputSignal a yt ft gtllt 1305 nt Unknown Input Signal Weiner Filtering Know Con olution Filter Yw FwXw Nw A 1 WWI2 XW Yl Fltwgt W Assume ome small noise and assume Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Inverse Filtering Ampli es Noise 39 Weiner Filtering OutputSlgnal F yt ft 9515 7105quot Know Con olution Filter Unknown InputSignal Yw FwXw Nw A 1 WW2 X0 W W lYltwgtl2 lNltwgtl2 Weiner Deconvolution Decldhellglelltion lY w l 2 const Deconvolution In the Fourier domain Convolution turns into Multiplication Inverse Filtering can be performed by Division in the Fourier domain Inverse Filtering Ampli es Noise 39 Weiner Filtering OutputSlgnal F yt ft 950 Mt Know Con olution Filter Unknown InputSigna Yw FwXw Nw A 1 WWI2 Xl Yl Wl1lwll2l1fltwl2 gt Inverse Deconvolution Fourier Transform Sharpening Schedule Reminder EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 36 SIFT Features Common Methods of Statistical Suggested Reading Inference in Computer Vision SIFT Features David Lowe quotObject Recognition from Local Scale 39 Gradient Descent Invariant Featuresquot 1999 When poo is smooth Belief Propagation Powerful but can be slow can fail for graphs with tight loops Graph Cuts Restrictive limitations on allowable potential functions 0 Sampling Can be slow hard to detect convergence G1bbs Samplm g G1bbs Samplm g Goal Given a multidimensional distribution try to draw samples from it Gibbs Sampling Guess x1using the marginal Px1 Then guess X2 using the conditional Plex1 Then guess x3 using the conditional Px3lx1 x2 Etc IfI can accumulate a large database of samples I can easily nd the most likely sample or the expected Value of some Variab e Gibbs Sampling Guess X1 using me marginal 1301 Gibbs Isampling is especially suited to graphical models where Then guess X2 using the conditional PX2lx1 condmonal mdependenues are 0 Then guess x3 using the conditional Px3lx1 x2 Etc 7 Sample r n Mm 7 Samplez E MRF 7 Sample quot39 71w M t gj xiii PX1lXX1 depends only PxilXxi depends only on 5 on immediate neighbors parents children and coparents 7 Samplesz 1 quot 13quot quot1 6 Approaches Considered So Far Normalized Correlation 7 Slow Brittle Sensitive to changes in illumination rotation viewpoint pose Approaches Considered So Far Normalized Correlation 7 Slow Brittle Sensitive to changes in illumination rotation viewpoint pose PCA Slow Brittle Sensitive to changes in illumination rotation viewpoint pose Approaches Considered So Far Normalized Correlation 7 Slow Brittle Sensitive to changes in illumination rotation viewpoint pose PCA 7 Slow Brittle Sensitive to changes in illumination rotation viewpoint pose Deformable Templates 7 Depends onlyprimarily on edge information alone 7 Invariant to rotation in the image plane but not viewpoint invariant Bag of Words Philosophy borrowed from Natural Language Processing Grammar is complicated can be more trouble than it s worth Used especially in teXt document indexing 7 ie Google Store prominent unusual words Bag of Words Philosophy borrowed from Natural Language Processing Grammar is complicated can be more trouble than it s worth Used especially in teXt document indexing 7 ie Google Store prominent unusual words Analog for image Lmderstanding Keep track of the prominent unusual features of an imageregion The spatial relationships between these features are more trouble than they re worth Bag of Words Featu re ah Dictionary Bag of Words Bag of Words l M Some features may encode negative space or conjunctions Approaches Considered So Far Normalized Correlation Slow Brittle Sensitive to changes in illumination rotation viewpoint pose PCA Slew Brittle Sensitive to changes in illumination rotation viewpoint pose Deformable Templates Depends onlyprimarily on edge information alone Invariant to rotation in the image plane but not viewpoint invariant Approaches Considered So Far Normalized Correlation Slow Brittle Sensitive to changes in illumination rotation viewpoint pose PCA Slew Brittle Sensitive to changes in illumination rotation viewpoint pose Deformable Templates Depends onlyprimarily on edge information alone Invariant to rotation in the image plane but not viewpoint invariant Approaches Considered So Far Normalized Correlation Slow Brittle Sensitive to changes in illumination rotation viewpoint pose PCA Slew Brittle Sensitive to changes in illumination rotation viewpoint pose Deformable Templates Deneride nnlx hrim2rilx rm Price inFnrma rinn airme Let s define image features that are highly distinctive discriminative but also as insensitive to rotation amp pose as possible SIFT Features SIFT Features A a J il 1 la fiatK Elma 7 aw nrgga w 294 t 5 33346553 w i if m r 5 V V 1 aflt an 9 5 The Laplacian Pyramid SIFT Features 1Locate prominent unusual image features Gaussian Pyramid Laplacian Pyramid Locate features in scalespace so each feature has a location and a spatial scalesize 2 Identify the overall orientation of each feature 3 Describe the appearance of each feature Gaussian Pyramid Laplacian Pyramid Frequency Composition Frequency Composition f I f f f f f Levei 3 The Laplacian Pyramid is Level 3 i a band pass representation i vice a low pass representation f39 for the Gaussian f Difference of Gaussians DOG Laplacian of Gaussian can be approximated by the difference between two different Gaussians Figure 2 16 The best engineering approximation to VZG shown by the contin uous line obtained by using the difference of two Gaussians DOG occurs when the ratio of the inhibitory to excitatory space constraints is about 1216 The DOG is shown here dotted The two pro les are very similar Reprinted by permission from D Marr and E Hildreth Theory of edge detection Proc R Soc Lond B 204 pp 301 328 1 Locate SIFT Features 1Find extrema in the Laplacian pyramid ie points of greaterlesser value than their 8 spatial neighbors and 18 neighbors of different scale 21nterpolate to nd the exact location 3 Remove points with low contrast 33 2 Choose Feature Orientation 1 Compute the gradient direction at each pixel within a Gaussian window og 365 2 Take the histogram of gradient orientations Weight histogram contributions according to 1 the Gaussian window and 2 the gradient magnitude 3 Take the peak of the histogram 35 32 1quot o I 1 tutti 34 i SIFT Locations and Orientations are Robust r He squot W 39 39 39 Image transformation Match Ori A Increase contrast by 12 89 0 866 B Decrease intensity by 02 885 859 C Rotate by 20 degrees 854 810 D Scale by 07 851 803 E Stretch by 12 835 761 F Stretch by 15 777 650 G Add 10 pixel noise 903 884 H All of ABCDEG 78 6 718 Figure 2 For various image transformations applied to a sample of 20 images this table gives the percent of keys that a 1 are found at matching locations and scales Match and r tut0 t am 9 y that also match in orlentatlon Orr aquot ifquot view 36 EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 32 Graphical Models in Computer Vision Typical Snake Energies Hashcity Emma g asiv si2ds 1 1 Stiffness Esiifnessvgt 0 siv si2ds Edge Proximity Eedggw 01iVIIsysi2ds User interaction Emu 1 Userzs ysds 0 Snakes as a Bayes Net Pm 1N P 7 1 7 2 2 13112 211 1211 EA C12 CUh 112 Cedgevl7v2 Cuservlv 112 Celastic11712 1 oedgawz e iV1tv11 inward 0 9lt 1 0115842117112 7 Usertv1 1 itl2dt 0 05111515110117 112 i112 111i Snakes as a Bayes Net Pv1lN P 7 1 7 2 2 P12 11 24 C 1 2 mam geeww pv4 13 e0v4 3 9lt Solving Snakes Via Dynamic Programming 00117112 Cedgevly 112 CuseTU17 112 Celasticvl7 112 00117112 CUg 113 017 114 0mm min min min min C27 11 13 14 minmin Cv1vg minC1223 min Cvg7 v4 v1 U2 U3 U4 Solving Snakes Via Dynamic Programming 00117112 081198047112 Cuser117 112 Celasticvl7 112 00117112 CUg 113 CUg 114 0mm 113111 113111 rrgin Harin C27 minmin Cvl7 02 rginC1223 Him Cvg7 04 a minmin Cvl7 02 minC1223 C413 11 12 113 Solving Snakes Via Dynamic Programming 00117112 Cedgevl7 112 629541117112 Celestic117v2 C27 00117212 CUg US Cv37 214 cm 12111113211113 113 C77 11311110131211 Cv1vg 113l3HC12713 113111 Cvg 04 a 1131110131211 021212 InianC22 as 04mg minrnin Cv17v2 0302 111 12 Solving Snakes Via Dynamic Programming 00117112 Cedgevl7 112 Cuservl7 112 Celestic117v2 C27 Cvl 212 Cv2 213 Cvg 714 cm 12111113211113 113 C77 Hillnmgn cm 02 gamma 03 rain 0 U4 9 Hqiiinrrqij2n01112HgnCv2vg 04ml 11311141131211 Cv1vg 03 For each intermediate function Qvi1 store both the a minimal cost and the best Vi for each V11 Go back down the chain to nd each optimal Vi This technique will work for all noncyclic Bayes nets Belief Propagation You have just derived a powerful statistical inference technique called Belief Propagation Belief Propagation works by passing messages up a tree from the leaves and then down from the root ijZwiniiln Writ7 H mag9 J ICEVON Before we get into the details lets generalize our models of probability Modern Stereo Algorithms Often use colorbased adaptive windows 0 Clean up results using a Global Energy Function Markov Random Fi MRFs represent ab111t1es PW H di7dj 2 Unfortunately learning D from data is no longer as trivial as taking histograms and measuring Pchild l parents Training MRFs is a major topic of machine learning Many applications handtune the parameters D 11 ie Minimize Dispamasa function of xy Edatad l Esmoothd d R2 gt R Edutad Z CIleftmaya1rightm 7 2 y ray Esmaath Z 17 Km Ivy 5 Markov Random Fields MRFs are often used as statistical priors for images or range images Stereo Shapefrom Shading Optical Flow Denoising Superresolution etc Posterior Distribution Prior Probability Pimagedata 0c PdataimagePimage Data Likelihood EECS 841 Computer Vision Brian Potetz Fall 2008 Lecture 7 Edge Detection Suggested Reading Forsyth amp Ponce Computer Vision A Modern Approach Chapter 8 Derivatives Amplify Noise gt DFT Horizontal Derivative Filter Power Spectrum of Derivative Filter What is Noise Unwanted image acquisition or transmition Common Types of Noise 0 Film Scratches amp Dust Salt amp Pepper Noise 0 Impulse Noise Additive Sensor Noise Noise at each pixel is independent White Noise White Noise noise value at each pixel is generated independently Autocorrelation Function mm m Em ygtnltxm7 yAygt wyy EWW ynA7 yAyl Convolution zzy mm Z x zy jw flhn Nu 10m WWW Smoothing Before Differentiating gausssan Isl domaine Edge Detection Edge Detection Localizing Edges Thresholding Vfl2 can result in wide poorly localized edges Especially for high a Localizing Edges Thresholding Vfl2 can result in wide poorly localized edges Especially for high a Solution Instead ofthresholding the rst derivative nd zerocrossings of the second derivative lVIarrHildreth Edge Detection Solution Instead of thresholding the rst derivative nd zerocrossings of the second derivative 32f 32f Common 2nd Order Derivative Operator Laplacian Af I 9amp2 8y2 lVIarrHildreth Edge Detection Solution Instead ofthresholding the rst derivative nd zerocrossings of the second derivative 82f 82f Common 2nd Order Derivative Operator Laplacian Af I d 40 39 20 quot20 Laplacian of Gaussian Deconvolution Difference of Gaussians DOG In the Fourier domain Convolution turns into Multiplication Laplacian of Gaussian can be approximated by the 0 Inverse Filtering can be performed by Division in the Fourier difference between two different Gaussians domain 0 Inverse Filtering Ampli es Noise Noise Weiner Filtering OutputSignaI yt ft gtIlt rt nt Know Con olution Filter UnknownlnputSignal Yw FwXw Nw A 1 Yw2 X0 W Fltwgt Ywl2 Nwl2 Figure 2 16 The best engineering approximation to VZG shown by the contin uous line obtained by using the difference of two Gaussians DOG occurs when the ratio of the inhibitory to excitatory space constraints is about 1216 The DOG is shown here dotted The two pro les are very similar Reprinted by permission from D Marr and E Hildreth Theory of edge detection Proc R Soc Lond B 204 pp 301 528 Inverse Fourier Transform Weiner Deconvolution MarrHildreth Edge Detection Solution Instead of thresholding the rst derivative MarrHildreth Edge Detection Solution Instead of thresholding the rst derivative nd zerocrossings of the second derivative nd zerocrossings of the second derivative 82 82 82 82 Common 2nd Order Derivative Operator Laplacian Af f f Common 2nd Order Derivative Operator Laplacian Af f f 8132 8y2 8132 8y2 1 Hi3 x10 2 2 2 0 t 0 O O 2 a a 40 40 r 1 39 pkg a no 39 r do 20 39 20 2 7 20 o quoto o No Laplacian of Gaussian Laplacian of Gaussian MarrHlldreth Edge Detection MarrHlldreth Edge Detection Solution Instead of thresholding the rst derivative Solution Instead of thresholding the rst derivative nd zerocrossings of the second derivative nd zerocrossings of the second derivative 82 82 82 82 Common 2nd Order Derivative Operator Laplacian Af f f Common 2nd Order Derivative Operator Laplacian Af f f 8132 8y2 8132 8y2 2 39 39 V20 Laplacian of Gaussian Laplacian of Gaussian Marr Hildreth Edge Detection Solution Instead of thresholding the rst derivative nd zerocrossings of the second derivative 32 f 32 f C 2 dOd D 39 ti 0 t 39 2 ommon n r er erlva ve pera or Laplacan Af 312 I 3y2 Laplacian of Gaussian Marr Hildreth Edge Detection Solution Instead of thresholding the rst derivative nd zerocrossings of the second derivative Edges can be localized to sub pixel accuracy But all contours must be closed Spaghetti Localizing Edges Thresholding Vfl2 can result in wide poorly localized edges Especially for high a Localizing Edges Thresholding Vfl2 can result in wide poorly localized edges Especially for high a Solution For each pixel above threshold nd the local maximum of I Vfl2 along the direction of the gradient Vf NonMaximum Suppression Solution For each pixel above threshold nd the local maximum of I Vle along the direction of the gradient Vf Q It Compute edge direction from image gradient tan 6 Image convolved with derivative of Gaussian thresholded and colored by edge direction 23 NonMaximum Suppression Discrete Canny Edge Operator 1 Threshold the gradient magnitude I Vfl2 2 Discretize 0 by 45 increments 3 Keep edge pixels only if I Vfl2 is greater than the 2 neighbors along the edge normal NonMaximum Suppression Discrete Canny Edge Operator 1 Threshold the gradient magnitude Vfl2 2 Discretize 0 by 45 increments 3 Keep edge pixels only if Vfl2 is greater than the 2 neighbors along the edge normal Continuous Canny Edge Operator 1 Threshold the gradient magnitude Vfl2 2 Compute the derivative of I Vf2 in the direction of the edge normal 2 2 2 V0 Vfl Vf 2121 m 4fmfyfmy 2fyfyy 3 Find zeros of V Vfl2 39Vf where Vfl2 is above hreshold NonMaximum Suppression Discrete Canny Edge Operator 1 Threshold the grad ent magnitude Vfl2 2 Discretize 0 by 45 increments o 3 Keep edge pixels only if Vfl2 is greater t the edge normal Vlt VGIZ 39VG Continuous Canny Edge Operator 1 Threshold the gradient magnitude Vfl2 2 Compute the derivative of I Vf2 in the direction of the edge normal 2 2 2 Vltl Vfl Vf 2121 m 4fmfyfmy 2fyfyy 3 Find zeros of V VfIZ Vf where Vfl2 is above hreshold MarrHildreth Canny Edge Detection Edge Detection Edge Thresholding Standard Thresholding j 1 if Vfr y gt T for some threshold T Ew y 0 otherwise Can only select strong edges Does not guarantee continuity Thresholding with Hysteresis use two thresholds HVfL y 2 f1 de nitely an edge to 2 HVf y lt M maybe an edge depends on context HVf17 y lt to de nitely not an edge Example For maybe edges decide on the edge if neighboring pixel is a strong edge Choice of 0 Determines Features Large 0 results in coarse large scale features Small 0 results in fine small scale features 3 39 g gt K39 1 Choice of 0 Determines Features Large 0 results in coarse large scale features Small 0 results in fine small scale features 39 ill xquot Canny with o 1 Canny with o 2 Multi resolution Image Pyramids Low resolution I I r High resolution The Gaussian Pyramid G1 G0 gmsian J 2 G0 Image The Gaussian Pyramid G Cl gaussian J 2 G1 Go quot gunman J 2 The Gaussian Pyramid G G2 gaussian J 2 G2 G1 gaussian J 2 G1 G0 gmsian J 2 G0 Image The Gaussian Pyramid G4 G3 gamsian J 2 G3 G gaussiun J 2 G Gl gaussian J 2 G1 G quot gunman J 2 Applications of Image Pyramids CoarsetoFine strategies for computational ef ciency 0 Search for correspondence 7 look at coarse scales then re ne with ner scales 0 Edge tracking 7 a good edge at a ne scale has parents at a coarser scale 0 Control of detail and computational cost in matching 7 e g nding stripes 7 very important in texture representation Edge Detection using Pyramids FaSt Template MatChing Search Region Tempate Coarsetofine strategy Do edge detection at higher level Consider edges of finer scales only near the edges of higher scales 23 For an m x n image g g Q For a p X q template g The complexity of the 2D pattern recognition task is o This gets even worse for a family of templates eg to address scale andor rotational effects E 539 i Olmnpq Fast Template Matching L 13 S h eve earc Tempate Search Region At the lowest pyramld level search the ent1re Original Image image With the template 3 i E Level 2 Search Level 1 Search Subsequent searches are constrained to a small Subsequent searches are constrained to a small neighborhood surrounding good matches from the neighborhood surrounding good matches from the next coarser level next coarser level 42 Level 0 Search Coarseto ne search reduced total search time in Matlab from 31 seconds to 05 seconds 43 Gaussian Pyramid Frequency Composition Level3 V V Laplacian Pyramid Frequency Composition f The Laplacian Pyramid is Level 3 a band pass representation vice a low pass representation for the Gaussian f Difference of Gaussians DoG Laplacian of Gaussian can be approximated by the difference between two different Gaussians Figure 2 16 The best engineering approximation to VZG shown by the contin uous line obtained by using the difference of two Gaussians DOG occurs when the ratio of the inhibitory to excitatory space constraints is about 1216 The DOG is shown here dotted The two pro les are very similar Reprinted by permission from D Marr and E Hildreth Theory of edge detection Proc R Soc Lond B 204 pp 301 328 The Laplacian Pyramid Gaussian Pyramid Laplacian Pyramid 48 Image Blending and Mosaicing Given two images A and B Construct Laplacian Pyramid La and Lb Create a third Laplacian Pyramid L where for each level Laij ifilt1rirlfI2 LI39jLijLbij2 fizn39idrIZ k 1111139 ifigtmdth2 39 Sum all levels Lc in to get the blended image Image Merging With Laplacian Pyramids Image 1 Cnmbined Seamless Image Blending Apples and Oranges Pyramid blending of Regions 39 Given two images A and B and a mask Ill Consrmct Laplacian Pyramids La and Lb Constmct a Gaussian Pyramid Gm Create a third Laplacian Pyramid L where for each level L411 j GmiljLaijl GmUl jLbi j Sum all levels Lr in to get the blended image Horror Photo prof dmartin Image Compression emtiutiwu hmmtnmtcd Raniamuted Flu plumquot uplacians awnmu iorlgxul Lngai reuonntumum mgay ipyramid coding and decoding Firsl the original image sin lower left is used l0 generale V rough rape41c Inca averaging Levels ol lhe Lap acian pyrami Lw L the differences between adjacem Gaussian levels anlacian pyramid elements are quanuzed in yield he L39 quotl 1 Finally a reconslrucled image 1quot Is generated by summing levels urllie code pyramid Fig ID A summaiy ul39llie slaps in Laplaciai Gaussian pyrami levels ghg arc tlcn computed as placian pyramid node Ll

### BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.

### You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

## Why people love StudySoup

#### "I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

#### "I used the money I made selling my notes & study guides to pay for spring break in Olympia, Washington...which was Sweet!"

#### "Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

#### "It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

### Refund Policy

#### STUDYSOUP CANCELLATION POLICY

All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email support@studysoup.com

#### STUDYSOUP REFUND POLICY

StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here: support@studysoup.com

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to support@studysoup.com

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.