New User Special Price Expires in

Let's log you in.

Sign in with Facebook


Don't have a StudySoup account? Create one here!


Create a StudySoup account

Be part of our community, it's free to join!

Sign up with Facebook


Create your account
By creating an account you agree to StudySoup's terms and conditions and privacy policy

Already have a StudySoup account? Login here

Virtual Environments

by: Alayna Veum

Virtual Environments CS 7497

Alayna Veum

GPA 3.81


Almost Ready


These notes were just uploaded, and will be ready to view shortly.

Purchase these notes here, or revisit this page.

Either way, we'll remind you when they're ready :)

Preview These Notes for FREE

Get a free preview of these Notes, just enter your email below.

Unlock Preview
Unlock Preview

Preview these materials now for free

Why put in your email? Get access to more of this material and other relevant free materials for your school

View Preview

About this Document

Class Notes
25 ?




Popular in Course

Popular in ComputerScienence

This 0 page Class Notes was uploaded by Alayna Veum on Monday November 2, 2015. The Class Notes belongs to CS 7497 at Georgia Institute of Technology - Main Campus taught by Staff in Fall. Since its upload, it has received 13 views. For similar materials see /class/234167/cs-7497-georgia-institute-of-technology-main-campus in ComputerScienence at Georgia Institute of Technology - Main Campus.

Similar to CS 7497 at

Popular in ComputerScienence


Reviews for Virtual Environments


Report this Material


What is Karma?


Karma is the currency of StudySoup.

You can buy or earn more Karma at anytime and redeem it for class notes, study guides, flashcards, and more!

Date Created: 11/02/15
Virtual Reality and Highly Interactive Three Dimensional User Interfaces A Technical Outline BY Mark Green and Christopher David Shaw Department of Computing Science University of Alberta Edmonton Alberta Canada GVU Center College of Computing Georgia Tech Atlanta Georgia USA Copyright 2001 by M Green and C D Shaw Contents 1 Introduction 11 Why Interactive 3D Graphics 12 ABrieinstory 13 Some Applications 14 Where DoWe GoFrom Here 15 WhatDoYouNeedToKnow 2 Perception 21 Light 211 Vision 212 3D Vision 213 Stereopsis 214 Vision Summary 22 Sound 221 Hearing 222 3DHearing 223 Environmental Effects 23 Touc 24 Proprioception 25 Ba ance 251 Simulator Sickness 26 Interaction andLag 261 Perceptual Processing 262 Immediate Response 263 Unit Task 27 WrapUp 3 Hardware 31 How Does The Hardware FitTogether 32 Graphics Engines 321 The Simple Rendering Pipeline 322 Display Output Refresh vs Update Video Signals 33 GraphicalDisplay Devices 331 Display of Stereo Images 332 Passive Stereoptic Displays 333 Active Eyewear 34 HeadMountedDisplays 341 Transparent HeadMountedDisplays 342 Examples of HeadMounted Displays FiberOptic Helmet MountedDisplay VPL EyePhone 5 UI Virtual Research Flight Helmet 28 LEEP Systems CYBERFACE 28 LCD Pocket Televisions 29 LEEP Optics 29 Fake Space Labs BOOM 31 Private Eye 32 HeadMounted Display Summary 33 35 Position and Orientation Trackers 33 351 Tracker System Criteria 34 Resolution 34 Accuracy 34 35 Update Rate 35 Range 35 Interference and Noise 36 Mass Inertia and Encumbrance 36 Multiple Tracked Points 37 Price 37 352 Tracking Technologies 38 Mechanical Trackers 38 Magnetic Tracking 38 Ultrasonic Tracking 39 Optical Tracking 40 Inertial Tracking 4l Inertial Tracking With Sensor Fusion 42 353 Position and Orientation Tracker Summary 42 36 Joysticks 43 361 ForceFeedback Joysticks 44 362 ForceFeedback Arms 44 37 Hand Sensors 45 371 Hand Anatomy 45 372 Degrees of Freedom Measured 45 373 Glove Sensors 47 Bend Sensor Types 47 374 Exoskeletons 47 375 Contact Sensors 48 System Architecture 49 41 Environment Design 53 Interaction In 3D 57 51 Basic Interaction Tasks 57 52 The Nature of Hand Based Interaction 60 521 Gesture Recognition 61 522 Optical Tracking of Hand Shape 63 523 Two Handed Interaction 64 524 Other 3D Input Devices 65 53 List Selection 65 531 General Design Considerations List Selection 65 532 Interaction Techniques for List Selection 66 54 Object Selection 69 541 General Design Considerations Objects 70 542 Interaction Techniques for Selection 70 55 Point Selection 73 551 General Design Considerations Points 73 552 Interaction Techniques for Point Selection 74 56 Object Manipulation 74 561 General Design Considerations Object Manipulation 74 562 Interaction Techniques for Geometrical Manipulations 75 563 Interaction Techniques for NonGeom etrical Manipulations 77 57 Navigation 78 571 General Design Considerations Navigation 78 572 Interaction Techniques for Navigation 79 58 Object Combination 81 581 General Design Considerations Object Combination 81 582 Interaction Techniques for Object Combination 82 59 2D Techniques in 3D 83 510 Interaction Technique Summary 84 Bibliography 85 List of Figures 21 22 31 awn 35 41 43 51 EJ EJ EJ EJ EJ EJ Immbww 58 When a surface is perpendicular to the line of sight its texture appears uniform 11 a tilted chessboard at eye level with far squares more densely packed visually than nearby squares 11 A diagram of a HITD system with every possible hardware device included 19 show a picture of the 5 rendering stages 20 Canonical view volume 20 Optically transparent HMD 26 Hand Anatomy 46 The Cognitive Coprocessor Architecture 49 The Decoupled Simulation Model 50 The Workspace Mapping 52 Buttons mounted on the sensor of an electrom agnetic tracker 61 A daisy menu 68 A ring menu 69 A sundial menu 69 Selection errors caused by tracker noise 71 lsodistance surfaces for the spotlight selection technique 72 Rack Widget in Use 76 Use of Tools 76 List of Tables 51 A simple gesture classi cation scheme 62 Chapter 1 Introduction Over the past decade interest in interactive 3D graphics has increased dramatically A decade ago interactive 3D graphics was largely restricted to the research laboratory but now it is the topic of popular books and newspaper and magazine articles There are several reasons for this increase in interest First display hardware for 3D graphics is now widely available in the PC marketplace This was de nitely not the case a decade or more ago when any form of 3D graphics hardware was very expensive Second the use of 2D graphics in user interfaces is now well established so the idea of interactive 3D graphics does not seem so strange Now every PC has a graphical user interface but until the middle 1980 s the user interfaces were still based on commandline interactions Third the general population is now more sensitive to computer technology and the effect it can have on their lives A signi cant segment of the population uses computers at work or has a computer or video game at home Thus advances in computer technology can be of direct use to them The popular press uses terms like Virtual Reality to describe interactive 3D computer systems futuristic ap plications that contain a considerable amount of hype about the potentials of this technology While these articles may be fun to read they do not accurately represent the current state of the art in this technology In this book we present the technical details behind this technology and how to use it in real applications A number of terms have been used for this new technology Our favorite term is Highly Interactive Three Dimensional HlTD user interfaces since this covers a wide range of user interfaces that make use of interactive 3D graphics technology A HlTD user interfaces addresses a 3D application uses 3D input and output devices displays information in a 3D format and uses natural 3D interactions It provides the user with a complete 3D environment for solving his or her problem The term Virtual Reality VR is usually used to describe an interaction 3D computer system that simulates a virtual world or some aspect of the real world The emphasis is on the creation of a 3D world that the user can explore and interact with VR is a subset of HlTD user interfaces and most HlTD user interface techniques are are used to develop VR user interfaces The term VR has been associated with a lot of hype so some researchers in the eld have used other terms such as Virtual Worlds or Virtual Environments to describe this type of user interface and avoid the hype associated with VR While this may make them feel a little bit better the question eventually arises 7 aren t you really doing VR 11 Why Interactive 3D Graphics Why should we bother with interactive 3D graphics The use of 2D graphics is already well established and a wide range of software exists to support the development of 2D applications Threedimensional graphics is much more complex and requires a new set of interaction techniques and software tools What is it about interactive 3D graphics that will make all this extra work worthwhile We can address these concerns by examining the following four issues 1 Some applications are naturally three dimensional 2 People are three dimensional 3 HlTD interfaces provide extra channels of communication 4 HlTD interfaces may eventually integrate computing with the environment 1 2 Chapter 1 Introduction Three dimensional applications Two dimensional graphical user interfaces work well for two dimensional applications such as spreadsheets and word processing but they are not appropriate for three dimensional appli cations There are applications which are naturally three dimensional which involve three or higher dimensional data and use three dimensional manipulations A number of computer aided design applications fall into this cat egory The designer of a complex mechanical linkage wants to work in 3D so the t of the different parts of the mechanism can be easily seen Forcing the designer to work in 2D complicates the task and the designer may easily miss design problems that would be very obvious in a 3D representation The same thing holds for scienti c visualization Most scienti c computations produce large volumes of multidimensional information Trying to summarize this information in 2D presentations is very dif cult Consider the case of a simulation of air ow past an aircraft In 3D this information can be presented on a 3D model of the wing allowing the researcher to easily detect patterns over the surface of the wing Detecting these patterns can be very dif cult if only 2D representations of the data are available For 3D applications providing a 3D user interface simpli es the user s task When a user wants to manipulate 3D objects in a 2D user interface the user must rst think of the manipulation in 3D and then convert it into a sequence of 2D manipulations This involves an extra cognitive load No matter how good the 2D interaction is the user must still perform the mapping from 2D screen representation to 3D mental representation and then from 3D mental representation to 2D manipulations These extra tasks take the user s concentration away from the application at hand and this problem can be avoided if a 3D user interface is used People are naturally three dimensional We have developed all our manual and cognitive skills in a 3D en vironment and that is the environment that we are used to operating in Ecological psychology 15 is based on the thesis that people perform best in the environment that they evolved in Our environment strongly in uences ow our perceptual system works Our vision system is tuned to motion 7 we naturally explore new objects and environments by moving around them we do not stand in one place and stare at them from a single viewpoint for long periods of time Similarly we are accustomed to using our hands to manipulate most of the objects in our environment We pick objects up in order to move them from one place to another and we make extensive use of tools that are almost always based on the use of our hands Three dimensional user interfaces take advantage of these skills To move around in a 3D user interface we only need to walk which is a skill that all able bodied users have Similarly we already know how to use our hands to grasp or point at objects so almost no training time is required to use these skills in a 3D user interface A properly constructed 3D user interface is much easier to use and more natural than any 2D user interface that depends on the use of a keyboard and mouse Extra channels of communication A typical 2D user interface makes very little use of the perceptual and motor bandwidth of the user A 2D user interface is based on a single screen that only covers a small part of the user s visual space and the same information is sent to both eyes A screen that covers the user s complete visual space is capable of displaying much more information and users are capable of moving their heads so a great deal more information can be presented by matching the display system to the user s capabilities Even more information can be displayed if effective use is made of sound tactile and force displays Most users have two hands and two feet and they are capable of moving all four limbs simultaneously A mouse only requires one nger and the ability to push it around a table top leaving most of the user s motor bandwidth unused By using both hands and the ngers on both hands simultaneously we are capable of entering much more information than with typical 2D devices In addition speech can be used as another independent communications channel Integrate Computing with the Environment One future scenario for user interfaces is the complete disappearance of the computer from the home or of ce environment Present computer systems force the user to sit in front of the computer in order to use it The user is not free to move around the room while he or she is using the computer For a wide range of tasks this is not very convenient since it forces the user to completely devote all of his or her attention to the computer system For example using a telephone and computer system at the same time is often quite dif cult It should be possible to completely integrate the computing system with the room environment if 3D user interface techniques are used The user need not be restricted to using a keyboard and mouse which essentially forces the user to work in one place but the user interface can monitor the user s motions throughout the room and respond to his or her gestures voice and other actions This could be done with video digitizing techniques Similarly large screen projection video could be used to produce displays on the walls of the room or indeed any suitable surface The result of this vision would be to supply seamless computing resources to the user anywhere in the of ce with computer display available anywhere and input available from anywhere in the room Page 2 11 Why Interactive 3D Graphics 16th April 2002 12 A Brief History 3 12 A Brief History The ideas behind interactive 3D graphics and the construction of virtual environments have been around since the early 1960 s This section provides a brief overview of the history of the eld with an emphasis on technical advances Most of the early work in this eld was based on custom built hardware so there were few people working in the eld in the early years Once commercial hardware became available in the late 1980 s the number of researchers in the eld grew rapidly The early work in this eld was done by Ivan Sutherland when he was at Harvard University in the 1960 s 35 At that time Sutherland developed the rst headmounted computer display This display was based on two miniature CRT s that were mounted on either side of the user s head A mirrored optics system was used bring the images to the user s eyes All the graphics were wireframe and they were produced by specialpurpose display hardware Two techniques were used for head tracking One system was based on ultrasound and was somewhat noisy and inaccurate The other technique was based on the use of a mechanical linkage between the user s head and the ceiling of the laboratory The two main problems with this display were the very high voltage that was required to drive the CRT s and the very limited form of 3D graphics that was available A considerable amount of pioneering work has been done at the University of North Carolina at Chapel Hill UNC led by Fred Brooks They have been active in the area of interactive 3D graphics since their work on interactive molecular modeling in the 1970 s The UNC team also did extensive work on the use of force feedback in user interfaces However Brooks observed that in the early 1970 s computers were simply not fast enough for the enormous computational demands of interactive 3D graphics so he set aside this research agenda until CPU speeds were suf cient 6 Brooks took up the HITD agenda again in the mid 1980 s creating the rst interactive architectural visualization system in 1986 4 In 1986 the rst ACM Siggraph Symposium on Interactive 3D Graphics was held at UNC This symposium in many ways marked the start of current research in interactive 3D graphics One of the rst affordable VR systems was developed by 1Iike McGreevy and Scott Fisher at NASA Ames research center 11 Their main goal was to provide computer support for the space station and the display of planetary data This system was based on the use of a headmounted display and the VPL DataGlove The headmounted display was custom built using two LCD s from portable televisions The LCD s were mounted on a motorcycle helmet later a bicycle helmet was used and a wideangle optics system was placed between the displays and the user s eyes This produced an affordable and effective headmounted display VPL Ltd an early manufacturer of VR systems used this design as the basis for their rst headmounted display By today s standards the graphics and interaction in this system were quite primitive but in many ways it marks the start of current research efforts in VR VPL was probably the rst commercial enterprise selling VR hardware and software Indeed VPL s Jaron Lanier coined the term Virtual Reality and played a signi cant part in bringing popular attention to VR In 1987 VPL introduced the DataGlove which was the rst device for measuring hand position orientation and nger bend angles with a reasonable degree of accuracy and portability They later produced a commercial version of the NASA headmounted display VPL also produced software packages to support the design of virtual environments In the late 1980 s they were selling complete hardware and software systems called Reality Built for Two or RB2 that could be used to produce prototype VR applications The availability of the VPL hardware made it possible for a number of research groups to start exploring the software techniques required to produce good 3D user interfaces In 1990 Mark Green and Rob Jacob organized a workshop at Siggraph dealing with nonstandard user interface technologies The title of this workshop was nonWIMP User Interfaces Its main purpose was to encourage user interface designers and researchers to think about user interface styles that are different from traditional 2D Windows Icons Menus and Pointing WIMP user interfaces This includes 3D user interfaces and VR in addition to penbased interaction wearable computers and some forms of multimedia The researchers at this workshop identi ed 5 characteristics that separate nonWIMP user interfaces from more traditional 2D user interfaces and these characteristics are typical of the user interfaces addresses in this book The 5 characteristics are 1 High Bandwidth Input and Output NonWIMP user interfaces typically employ more than one input and output device and these devices often provide many interactions per second In a 3D user interface the screen is not static but is constantly changing as the user moves around the environment 2 Many Degrees of Freedom Both the devices and applications have many degrees of freedom For example 16th April 2002 12 A Brief History Page 3 4 Chapter 1 Introduction a DataGlove has 16 degrees of freedom 3 for position 3 for orientation and 10 for nger bend angles Similarly most simulation applications allow the user to control many parameters and in many cases several parameters can be controlled simultaneously L RealTime Response A 3D user interface must respond quickly to user interactions When a headm ounted display is used the images in the display must be updated in less than 01 seconds after the user s head moves If this type of response is not maintained the user nds it dif cult to interact Longer delays worsen interaction For WIMP interfaces realtime response is required only for the mouse cursor position and for the echo of typed characters which is signi cantly less computationally demanding 5 Continuous Response and Feedback Unlike a 2D user interface there is not stream of token or events in a 3D user interface In 3D the user interface must be continually watching the user s motions and respond to them as a continuous process For example the user interface must continuously track the user s head motions in order to produce the correct images for the headmounted display and must continuously track the hand motions to produce accurate feedback of hand actions in the environment The user interface cannot wait for a signal from the user before it starts responding to his or her input Lquot Probabilistic Input A traditional 2D user interface does not have to guess the user intentions since there is a well de ned set of comm ands and interactions in the user interface In 3D user interfaces the user interface must often recognize the user s actions and this process is not 100 accurate When the DataGlove is used for gesture input there is a certain probability that a given gesture will not be recognized correctly When this happens the user interface must be able to backtrack and correct any of the problems that have occurred due the incorrectly recognized gesture 13 Some Applications What are the successful applications of VR and highly interactive 3D Until the late 1990 s VR had great potential but few real successes The one exception to this is 3D video games such as Mario World Gauntlet Legends DOOM Quake HalfLife Tekken F355 and their successorsl It is the popularity of these games that has created consumer demand for highperform ance 3D graphics allowing economies of scale to bring commodity 3D graphics to the PC platform In the Action genre of 3D games Super Mario World Gauntlet Legends one moves around a complex 3D space in the pursuit of an overarching goal and the player views a representation of hisher character from a view above the action By contrast in the FirstPerson Shooter genre Doom Quake HalfLife the player sees the action from the character s point of view and usually progresses by killing every enemy character in sight The Adventure genre Shenmue Myst The Longest Journey advances the action by the solving of puzzles rather than by combat In the ghting genre Tekken Soul Calibur oneonone sticuffs take place from a thirdperson view with each player controlling a character though an increasingly arcane set of button hits Button combinations result in a preprogramm ed sequence of punches kicks throws and so on Finally in the Racing genre Metropolis Street Racer Ridge Racer series players race vehicles through a realistic 3D track either from a cockpit view or from behind the vehicle The usual interface to these games is the keyboard and the mouse but one could add a forcefeedback joystick or other exotic input devices to spice up the action and in some cases make the game somewhat easier to play The key to the technical success of these games is appropriate balance of realism and responsiveness Every animated frame must appear within 160th of a second so game designers are careful to maintain graphical loads at a level appropriate to the hardware they are using There are a number of recreational applications that t in the entertainment category that are outside the usual genres for 3D For example it is easy to simulate most racquet sports using simple VR systems A large screen video projector and an instrumented racquet can be used as the main IO devices Why would we want to do this The VR system can provide a virtual partner to play with This can be used when a real partner is not available or when learning a new sport Multiplayer keyboard and mousebased games usually have singleplayer mode for these occasions Since the level of play of the virtual player can easily be controlled this can provide a good learning mechanism Second the physics of the game can easily be controlled and thus challenging games can be produced for players at different skill levels For beginning or older players the ball can be slowed down so the games is not quite as dif cult while for expert players the ball can be speeded up to make the game much more challenging 1This is not intended to be an exhaustive list Page 4 13 Some Applications 16th April 2002 13 Some quot 39 5 A related set of applications is the use of VR and other types of interactive 3D graphics in the production of works of art citeidxgromalal994dancing Most of this work is centered around the production of reactive art works that is art works that respond to the viewer and encourage the viewer to interact with them This is not a new idea in art but the use of VR technology allows the artist to construct complete environments that surround the user and interact with him or her There are many interesting technical issues that are raised by this type of work For example how do we produce software tools that are usable by artists and at the same time are capable of producing a wide range of interesting art pieces There are a number of social issues that are raised by this work For example how do people react to computer systems that appear to be active and that respond to them in a nongame context How do we represent ourselves and other people and how does this representation effect our interaction with other people Are we satis ed with a caricature of ourselves or do we need more accurate representations One important application of VR technology is VR Therapy 16 in which a patient wears a headmounted display HMD to experience a therapistcontrolled virtual world VR Therapy has been successfully used to treat patients with certain phobias such as fear of ying or fear of heights 16 In the therapist s of ce the patient experiences a VR version of the fearinspiring event such as a virtual airplane ride The therapist controls the intensity of the virtual environment allowing the patient to grow accustom ed and therefore less fearful of that type of event in real life Another style of VR Therapy is to allow people undergoing painful medical procedures such as a spinal tap or skin burn abrasion to experience an escape into a compelling virtual environment One other longstanding application of VR technology is scienti c visualization 7 9 Most scienti c data is at least 3 dimensional and the ability to interactively navigate through the data or interactively steer large compu tations greatly increases our understanding of the phenomena under study VR technology has been used in the visualization of uid ow and planetary data 20 Major oil companies use interactive 3D graphics to display geological formations during oil exploration Interactive 3D graphics is also very useful for information visualization and knowledge discovery in databases Quite often it is very dif cult to discover structures and patterns in large databases For example the organization chart of a large company does not t on a single display screen or sheet of paper By using 3D graphics the entire organization chart can be placed in a 3D model all of which can be viewed at the same time While some parts of the chart will be hidden in such a 3D representation the general structure of the chart will still be obvious and the user can move around the chart to see the parts that are hidden These techniques have also been used to study the structure of complex law cases and in the evaluation of currency exchange options 3 10 Collaborative work is an area where VR techniques can have an impact With VR it is possible to construct a 3D environment that can be shared by several users In addition these users do not need to be in the same physical location The 3D model can be duplicated at each location and only the user s actions need to be sent between the locations Using this approach several engineers or designers at different locations can work on the same problem Computeraided design packages are now used in large number of design and engineering applications so the subjects of the design activity are in some type of machine readable form This information only needs to be transferred to a VR system in order for several users to cooperate on the design problem This can result in a signi cant cost savings when long distance travel would have been involved a 0 allows for a closer simulation of traditional meeting settings In current computer and video confer encing system most of the social structures found in traditional meetings are not present There is little eye contact between the participants and there is only one communications channel and thus no possibility of side conversa tions With VR techniques these problems can solved 26 since all the users are in a simulated 3D environment and 3D sound can be used to add multiple communications channels VR has a wide range of applications in simulation and training Currently simulation is used as a means of training operators of complex and dangerous equipment For example pilots spend a considerable amount of time in ight simulators performing maneuvers that would be dangerous in a real plane The are other tasks where simulation would be an effective training mechanism One example is re ghting 36 In this area a VR based simulator can be used to simulate a burning building and the re ghter could practice different types of tasks in this environment The main bene t of this type of simulator is that the re ghter is not injured if he or she makes a mistake which is de nitely not the case in a real burning building VR is an ideal simulation mechanism in elds where the trainee is not sitting in a xed position and where cost is an important consideration Not all of the application areas listed above are strictly in the standard VR style with the user wearing a Head Mounted Display DataGlove hearing spatialized sound and so on Many of these application areas use selected HITD techniques without subscribing completely to the VR style VR style is most effective in application areas that require the user to be fully involved in the Virtual Environment and at most slightly aware of the real world 16th April 2002 13 Some Applications Page 5 6 Chapter 1 Introduction around them Applications like Virtual Therapy simulation and training and entertainment all bene t from the VR style because the user wants or needs to be fully involved Typically users of Scienti c and Information Visual ization or collaborative design applications need to deal rapidly with physical artifacts of the of ce environment such as the telephone and paper documents so the VR style is less successful there HighlyInteractive 3D is still useful in this context because the bene ts of rapid 3D understanding cannot be delivered any other way 14 Where Do We Go From Here The state of the art in the design and implementation of 3D user interfaces is still a little primitive in comparison to 2D user interfaces This is partly due to the current state of the hardware devices used in these user interfaces These devices are still much more expensive than the devices used in 2D user interface and are not quite as reliable But the main reason is the length of time that researchers have been seriously studying 3D user interfaces The mouse and the basic ideas behind windows were developed in the 1960 s and most of the techniques used in 2D user interfaces were developed in the 1970 s at places like the Xerox Palo Alto Research Center Through the 1970 s and early 1980 s numerous research groups worked on prototypes of the software tools that are now used to generate 2D user interfaces Thus the techniques that are currently used in the development of 2D user interfaces were under development for about 15 years before this style of user interface became popular In the early 1970 s graphical user interfaces had many of the same characteristics that 3D user interfaces now have they used expensive equipment mouse and bitm apped display they were hard to program and the interaction was not as smooth as the textual user interfaces that were then popu ar The evolution process is currently occurring in 3D user interfaces Researchers are exploring the design metaphors and techniques that are required for this style of user interface designing better interaction techniques developing software tools and improving interfaces The appropriate software architectures and types of software tools required for the development of 3D user interfaces is a very active research area with a considerable amount of room for innovation The purpose of this book is to explain how to build Virtual Reality applications and other applications that share similar techniques 15 What Do You Need To Know A VR system is a collection of hardware and software elements that take input from the user and display output to the user in realtime How do we construct such a system What do we need to know to make the user most effectively with the system 0 Human capabilities and needs The most important knowledge is of the capabilities and needs of the user This knowledge helps one to understand why certain things must be done a certain way For example a VR system must react quickly to the user How quickly What happens if the system is not quick enough What happens of the reaction time varies o The basic VR system structure A VR system is a collection of hardware and software elements that take input from the user and display output to the user in realtime What is the best way to organize this system Input What input modalities are available What hardware choices are there What is the best way to collect input from the user How should the hardware be driven How should the input streams be managed and integrated What userrelated issues need to be addressed 0 Output What output modalities are available What hardware choices are there What is the best way to generate output to the user How should the hardware be driven How should the output streams be managed and integrated What userrelated issues need to be addressed Page 6 14 Where Do We Go From Here 16th April 2002 15 What Do You Need To Know 7 0 Integration How should the input and output streams be tied together What software and hardware organization makes VR possible Easy 0 Larger systems In building a VR system it helps to start small How can large systems be built What techniques can be used to make large collections of data manageable in realtime 0 Applications How do we build applications What are the building blocks that we will need and how do they work How do you build applications with a minimum of effort These and other questions will be dealt with in the following chapters The organizational idea of the book is to start from the bottom with the basic building blocks of HITD systems then build and combine these subsystems into more complex systems This bottom up approach allows the reader interested in building a system to start progress through the book in order We will start with a quick introduction to perceptual psychology since the purpose of HITD applications is to convince the user that the application looks and sounds just like the real world We then start from basic hardware technology explaining the basic principles of HeadMounted displays DataGloves and other input and output devices and systems We next discuss the mathematical notations and software technologies that drive 3D computer graphics We assume that we are dealing with OpenGL written in the C language in the software code samples in this book OpenGL has been around a while and has settled into wide acceptance C has been around longer and although it is showing its age most readers will understand it The next chapter deals with organizational metaphors and models of system architecture The goal in this chapter is to provide an organizational basis for the chapters that follow Another goal is to outline the various HITD styles that have been developed The next N chapters deal with system architecture issues Next we do highlevel stuff 16th April 2002 15 What Do You Need To Know Page 7 8 Chapter 1 Introduction Page 8 15 What Do You Need To Know 16th April 2002 Chapter 2 Perception HITD interfaces take advantage of the spatial and perceptual capabilities of people HITD interfaces combine much more sensory input than non3D interfaces so it is important to understand what a person s perceptual capabilities are so that a HITD interface will be successful This chapter will outline what these capabilities are and will describe the implications that these sensory and perceptual capabilities have on HITD interface design There are traditionally 5 human senses vision hearing touch taste and smell There are two other important senses that come into play in HITD interfaces Prapn39acepn39an the sense of where one s body is in space and Balance the sense of which way is up These two senses will be de ned and discussed later in this chapter Unfortunately taste and smell are dif cult to stimulate through nonchemical means and so we will not deal with these at all 21 Light Vision is of course the sense that detects light Visible light is electromagnetic energy with wavelengths between 400 nanometers nm and 700 nm corresponding to the rainbow colors from violet indigo blue green yellow and orange to red The range of visible colors is called the visible spectrum If you pass a beam of sunlight through a triangular glass prism the beam will be split into a rainbow which indicates that white light has a certain amount of light from every wavelength in the spectrum The rainbow colors are spectrally pure because they are each a single narrow band of frequencies of light Most colors are not spectrally pure because they are made up of a number of wavelengths The color you perceive is typically the dominant wavelength or the wavelength of the spectrum that has the most amount of energy Therefore two colors may be made up of a different spectral energy distribution but be perceived as the same color This is true of white light as well For example light bulbs and sunlight are not the same color but they are each perceived as white in isolation 211 Vision Each eye has an array of lightsensitive cells at its back inner surface called the retina There are two types of retina cells rods and cones Cone cells sense visible light in one of three bands of color traditionally called Red Green and Blue For red and green this is a little misleading because the peak sensitivities of the Green cones is a greenish yellow and for the red cones it is a reddish yellow The sensitivity of the red and green cones is about 10 times higher than for the blue cones implying that variations in the brightness of blue light are harder to detect than the same variations in the red and green range Television signals were designed to take advantage of this by allocating most color bandwidth to Red and Green information and only a small amount to blue Cone cells are concentrated at the Favea which is the focal point of the image that your eye is looking at The fovea is very densely packed with cones with only a few rods The fovea is therefore the area where the eye s ability to see detail is highest Rod cells sense visible light of any color are more sensitive than cone cells and are distributed over a wide area of the retina Because of their higher sensitivity night vision is mostly provided by rods Cone cells do not 10 Chapter 2 Perception get enough light under lowlight conditions which is why color is hard to perceive in low light In contrast to the fovea which contains mostly cone cells the peripheral vision area surrounding the fovea contains mostly rod cells The peripheral zone is also much less densely populated and hence can resolve much less detail than the foveal area The highest density is around the fovea with progressively lower densities toward the periphery In terms of the temporal performance of the eye almost everyone can detect a change in the image within 140 milliseconds ms This implies that to give the appearance of motion a new image must be presented to the viewer at least every 140 ms or at 7 images per second Some people will not nd this convincing however so the usual benchmark is to supply a new image every lOOms or 10 images per second The rst movies were projected at 16 frames per second which seemed acceptable to most people However people perceived icker hence the nickname icks for movies because the movie projector would tuIn off the image while the lm was being advanced then turn on the light to expose the next image Motion picture lm was standardized at 24 frames per second with the introduction of sound in order to reduce the ickering sensation to reduce jerkiness of the motion and to ensure consistent sound reproduction What this says about vision is that there are different temporal scales at work with the rods in the periphery operating much more quickly and therefore detecting icker while the cones operate slower and sense color 212 3D Vision At the lowest level the eyes receive light and the rods and cones generate nerve impulses that communicate to the rest of the visual system Somehow out of this array of light quantities the brain perceives a 3D world It is not really known how this works but we can enumerate ways in which features of the visual input are used by the visual system to infer 3 dimensions Traditionally these features of the visual input are called depth cues as follows Occlusion This is the optical property that nearer opaque objects hide objects that are more distant This is the strongest depth cue 18 and it will override other con icting cues Hiddensurface removal at some point in the 3D graphics rendering pipeline provides this cue Perspective This is the geometric property that objects look smaller at greater distances The closer an object gets the bigger the eld of view it takes up Perspective projection in 3D graphics rendering delivers this cue Aerial Perspective This is a property of optical media such as air that objects at greater distances become more indistinct tending to lose contrast and converge towards the color of the optical medium One common example is fog or smog which makes distant objects look indistinct and monochromatic This can be simulated in the rendering pipeline by blending object color with fog color using a ratio that increases the proportion of fog at greater distances Another name for this operation is depth cuing which blends object color with the background color Motion Parallax This is the geometric property that objects that are more distant are seen to move lesser distances than closer objects moving at the same actual speed An animated object rendered in perspective will deliver this cue Kinetic Depth Effect This cue arises from the tendency of people to assume that moving objects are rigid unless there is ample evidence to the contrary Minimal visual stimuli such as a few moving dots will typically be perceived as points on a moving solid object This cue is a direct consequence of motion parallax and can be viewed as the integration of the motion parallaxes of many points on an object Shadows Given a known source of light a shadow cast by one object upon another can show which object is farther away A very strongly builtin visual assumption is that light comes from above ie the sun For example object A casting a shadow on object B implies that B is underneath A One very strong version of this cue is the appearance of indentations and outcrops on a surface These will both will cast a shadow in the direction of the light and since the light is assumed to come from above the shadow distinguishes indentation from outcrop Shadows are not handled automatically by the 3D graphics system and must be set up by the modeling system Relative size of familiar objects This is a property of people s experience with the visual world People have an internal conception of object sizes and use this internal conception to judge the size of unfamiliar objects One typical example in technical literature is to place a coin or a ruler beside the unfamiliar object being described An example of where this is not used is in advertisements for gurines which have the caption actual size on the page Another example of how powerful this cue can be is in old science ction movies where tiny models such as cars and buildings are placed beside Godzilla to show how big he is This cue is not provided directly by the rendering pipeline but must instead be provided by the model being rendered An important constraint is that all objects in the model should be at a consistent scale to ensure that familiarity is available at all times Page 10 212 3D Vision 16th April 2002 212 3D Vision 11 Figure 21 When a surface is perpendicular to the line of sight its texture appears uniform Figure 22 a tilted chessboard at eye level with far squares more densely packed visually than nearby squares Texture gradient This depth cue is similar to relative size in that people assume that any given object has uniform visual texture all over its surface When a surface is perpendicular to the line of sight its texture appears uniform as shown in gure 21 As the plane of the surface tilts the texture at the far end of the plane becomes denser than the texture at the near end For example if one looks at a chessboard at eye level far squares are more densely packed visually than nearby squares as in gure 22 The denser packing of distant squares is due to perspective Texture gradient is good for showing direction of surface tilt but it is not an effective means of communicating the slant angle because people always underestimate slant angle when given only texture gradients as a cue Texture mapping with the 3D graphics system can supply a texture gradient assuming the model is created with textures Surface Re ection Like texture gradient the surface re ection depth cue is an important indicator of surface shape Smooth shiny surfaces such as ceramics billiard balls and new cars re ect light such that there is a specular highlight or bright spot on the object which is typically an image of the light source Surface re ections are governed by a few simple mathematical rules such as angle of incidence angle of re ection so our visual experience with smooth surfaces enables us to understand surface shape Surface shininess affects the size of the specular highlight with shinier surfaces having smaller highlights Shiny surfaces are smooth microscopically so light is not scattered much by surface microgeom etry The rougher the microgeom etry the larger and dimmer the specular highlight Smooth dull surfaces such as chalk or plain paper are not microscopically smooth so light that hits such a surface is scattered in all directions and no specular highlight is formed Focal Accommodation This is a property of the eye s focusing mechanism Within a distance of about 10 feet or less assuming normal vision the eye must focus on an object being viewed At distances greater than 10 about feet the eye focuses at optical in nity Traditionally all 3D computer graphics images are in focus at all depths because this is easier to simulate Some rendering algorithms such as ray tracing can simulate focus or depth of eld but these algorithms do not operate in real time Convergence This is a property of the relative view directions of the viewer s two eyes When the brain tells both eyes to focus on an object if the object is within approximately 10 feet the eyes converge on the object The closer the object the greater the convergence Convergence operates in tandem with focal accommodation to focus on a single nearby object This discussion of 3D impression has concentrated on cues that encourage a 3D impression One can also state cues which discourage 3D impression and these are socalled ainess cues Flatness cues can also be regarded as the negation of depth cues but for stylistic reasons we have stated each cue by what it is instead of what it is not For example lack of focal accommodation is a atness cue However the following two cues are best stated as atness cues The rst cue to atness is a uniform texture gradient perpendicular to the line of sight which will give the impression that a at surface is being viewed An example of this in real life is the rectangular mesh screens one puts in windows to keep the insects out while letting in fresh air The uniform square array of pixels on a typical color CRT form such a uniform at texture Fortunately this atness cue relies on the user being able to optically resolve individual pixels which cannot be done on most highresolution CRTs This brings up the question What is resolution Graphical output devices quote resolution either as the num ber of pixels on the display or as the number of pixels per linear measure such as dots per inch The measure of resolution that the user is interested in is optical resolution or the number of pixels that are visible per degree of arc centered at the eye The more pixels there are per degree39 the better is the optical resolution Two ways of improving display resolution are to add pixels per unit display area or move the display farther from the eye The highest resolution of the eye is 1 minute of arc located at the fovea The resolution of peripheral vision is coarser Improving device resolution beyond 1 minute of arc is useless because the eye does not have the ability to resolve rt This implies that to avoid the atness cue of uniform texture gradient the optical resolution for any graphical output device needs to be 12 arcminute or better due to the Nyquist sampling theorem However if the number of pixels per inch is xed optical resolution can be increased by increasing viewer distance 1f increasing viewer distance is not possible as in the case of a headmounted display the pixel array can be blurred in order to make 16th April 2002 212 3D Vision Page 11 12 Chapter 2 Perception the individual pixels undetectable The second atness cue is a frame around the display A frame usually cuts off the picture so that it does not extend into the area of peripheral vision occluding objects that are in that region The resulting inconsistency between foveal and peripheral visual information results in a compromise between non atness and atness which means that the 3D scene appears at An example of this a tourist attraction in Laguna Beach California called Pageant of the Masters in which famous paintings are recreated by actors posing on stage Tourists who view the real 3D scene through a picture frame usually report a marked decrease in perceived depth despite the fact that the scene is really threedimensional To defeat atness in a graphical display one should have a display that is large enough so that the frame is beyond the peripheral range 213 Stereopsis One of the essential means of giving the viewer a good 3D impression of the rendered scene is to take advantage of binocular vision In real life each eye gets a slightly different view of world The eyes both point at an object of interest and the brain uses these two disparate views to construct an internal 3D representation of the object The use of two different views is called Stereopsis When the eyes are simultaneously pointed at the same object they are converged on that object and the perception is that the two images of the object from each eye are fused into one It is not known how this fusion works but the objects must look similar Objects that are in front of or behind the convergedupon object appear doubled although one has to ignore the convergedupon object to notice this Geometrically each eye moves so that the object falls on the foveal region of the retina and the 3D impression of the objects in front and behind is a result of optical disparity between the left and right eyes One very important point to note is that optical disparity can only be in the horizontal plane Vertical optical disparity results in binocular rivalry in which one eye dominates and seems to suppress the other eye s view in the con icting area for a moment then the other eye dominates Optical domination oscillates back and forth between the eyes Binocular rivalry also occurs if the two eyes are looking at two different things but one would not expect fusion in this case All of the cues mentioned in the previous section can successfully be picked up with one eye and all of these should be provided in a HITD interface if possible Stereopsis should also be provided because it can be quite powerful but it requires special hardware to perform Because of the different eye locations one image must be generated for each eye to deliver stereopsis These images must be rendered using different eye locations This cannot be faked by rendering an extrawide picture and giving the left eye the left portion and giving the right eye the right portion of the picture Rendering one large picture to be viewed by both eyes is called biacular display where there is no disparity in the region where the eyes views overlap In contrast binocular display has optical disparity 214 Vision Summary This section has summarized the spatial and temporal characteristics of the human visual system We have at tempted to cover the areas appropriate to giving the HITD user a strong 3D visual impression of the animated virtual environment To do this the HITD system must account for human visual capabilities that promote a 3D impression and minimize temporal or spatial problems that would destroy that impression 22 Sound Sound is created by variations in air pressure caused by moving objects A vibrating object alternately compresses and relaxes the air around it and this compression and relaxation generates a pressure wave that propagates out from the moving object at a speed that is determined by the medium of transmission In air sound travels at about 1000 feet per second Sound travels faster and farther in denser media such as water or rock The audible range of frequencies of sound for most people is from 50 Herz cycles per second to 22000 Herz 221 Hearing The hum an auditory system is made up of the outer and inner ear The outer ear or pinna the part you hang earrings from directs sound waves into the ear canal onto the eardrum which passes the sound wave along a mechanical Page 12 213 Stereopsis 16th April 2002 222 3D Hearing 13 linkage to the cochlea The cochlea is a seashelllike spiral whose inner surface is covered with vibrationsensitive hair cells As the sound wave passes up the cochlear canal these cells transform vibration into nerve impulses which are transmitted to the brain Human audition is signi cantly more timesensitive than vision For example if two events occur in quick succession the time between these two events needs to be more than 140 ms to be seen as visually distinct In the auditory domain this time difference is about 20 ms Also shortterm auditory memory is somewhat longer than shortterm visual memory People can remember much longer phrases when the phrases are spoken than if the same phrases are read once off the written page 8 222 3D Hearing Assume that a sound comes from a sound emitter which emits sound equally in all directions The perception of the location of a sound emitter in 3D depends on a properties of the emitter the environment and the listener This section will discuss how the listener s body modulates sound to extract the 3D direction of the sounds emitter The next section will discuss emitter and environmental properties As a sound arrives at a listener the person hearing the sound will perceive it as coming from a certain direction The human brain takes the sound input arriving at each inner ear and uses the sonic properties to generate the perceived direction The sonic spatial cues are as follows Interaural Time Dwference or lTD is the different time of arrival of a particular sound to each ear Similar to stereo vision stereo audio requires that the sound arriving in each ear be recognized as the same thing in order to have access to this 3D cue Sounds coming from the left of the observer will arrive at the left ear before the arriving at the right ear Up to a certain limit the larger the difference in arrival time the more extremely left or right the sound If the time difference is beyond this limit it will be perceived as two distinct sounds The smaller the difference the more it will sound like it is coming from in front or behind Sounds generated in a zone called the cane afcanfusian cannot be reliably distinguished as coming from in front or behind The cone of confusion extends outward from a person s face and outward from the back of the head Interaural Intensity Di erence HD or Head Shadow arises because as sounds pass from one side of the head to the other the head tends to absorb and re ect some of the sound energy Sounds are therefore much quieter in the second ear to hear the sound This cue can be provided in a pair of stereo headphones by simply panning the volume between the left and right speakers One can simulate these two main cues quite simply by placing earphones on the listener and delaying and attenuating the signal to the ear that is to receive the soun last Unfortunately this simulation is not terribly effective because the sounds appear to be coming from inside the listener s head For externalizatian the sense that the sound comes from outside the head one must supply Pinnae Filtering 2 Pinnae Filtering This cue is a complex set of signal lters that are applied by the user s body head and especially the pinnae as the sound travels to the inner ear For example as a sound re ects off the pinnae it is not simply diminished in loudness at all frequencies Some frequencies are attenuated signi cantly while others are attenuated slightly The pinnae nonuniformly attenuates increases and even shifts the frequencies of the incoming sound To demonstrate this for yourself simply distort one or both of your ears and hear how the sound changes More importantly this ltering function depends on the direction from which the sound source arrives at the listener As you turn your body and your head the sound arriving at your inner ear changes based on the ltering function that is perform ed passively by the Pinnae The combined effects of HD lTD and Pinnae ltering can be thought of as a single lter which is called the HeadRelated Transfer Function or HRTF Because these lter functions depend directly on the listener s actual physique each listener has a unique HRTF However each person is roughly the same shape so a generic HRTF will often suf ce The process of applying the HRTF to a sound is called spatializatian because it transforms an un ltered monaural sound into a sound that appears to emanate from a certain position outside the listener s body Atrue HRTF cannot be described as a simple and compactmathematical formula because it is the combination of ltering actions perform ed passively by the user s body However it can be determined experimentally by simply measuring the way in which sound is ltered as it reaches the inner ear A tiny microphone is placed inside each ear canal and a series of sounds are played at different locations throughout the room The dependence upon emitter location with respect to the head means that the stimulus must be played from many locations Sequences of white noise bursts that cover the 5022KHz range are usually used as the stimulus As each white noise burst is played the resulting ear sounds are recorded The recordings are then analyzed and a ltering function that mimics the 16th April 2002 222 3D Hearing Page 13 14 Chapter 2 Perception sampled HRTF is then computed Typically this recording process must take place in an anechoic chamber to avoid recording the ltering effects of the room The ltering function itself can take one of two forms Filtering can be done in the frequency domain where each frequency band in the input signal is multiplied by the HRTF s measured attenuation at that frequency band The other method is to lter in the time domain where a Finite Impulse Response FIR lter representing the HRTF is applied using a convolution operation to the most recent N sound samples As each new sample arrives the oldest sample is deleted and the FIR lter is applied to the newest N sound samples to generate the new output sample 223 Environmental Effects The previous section deals with how the listener s body passively lters sound in order to determine its direction This section deals with other 3D cues to a sound s location Many of these cues depend on a listener s experience with the sound environment The simplest spatial audio cue is loudness 7 louder means closer If the listener has heard the sound before the listener will probably be able to guess the emitters distance For example experience tells us that an individual ying insect is not as loud as a jet airplane taking off when these two objects are the same distance from the listener If a listener hears a very loud insect buzzing around it is likely to be very close to the listener s head Like the visual depth cue of relative size of familiar objects auditory experience tells the listener the scale of the object emitting sound Doppler Effect Because sound travels at a low speed compared to light the speed of the sound emitter relative to the observer affects the frequency that the observer hears If the two are moving towards each other the frequency is higher and if the two are diverging the frequency is lower One hears this most commonly from sirens blaring on an emergency vehicle as it passes 7 rst highpitched as it approaches then lowpitched as it speeds away Echoes and Reverberation Sound emitting from an object propagates nonuniformly into the environment Sound bounces off the surfaces of many objects and arrives at the ear of the listener The listener will usually get the sound direct from the source and will receive the re ections of the sound off the various objects around the emitter and listener The listener will hear an echo if the direct path from the emitter to the listener is somewhat shorter than the re ection path Of course larger rooms have larger re ection paths and therefore longer echo delays At its simplest auditory room dynamics can be faked by simply passing the sound through a delay of the appropriate length Older electric instrument ampli ers often had a rever control which passes the incoming signal through a set of loosely stretched springs The springs mechanically vibrate from incoming signal and inertia allows the springs to continue to vibrate after the signal has passed The ampli er combines the input signal and the reverberation signal coming from the springs A more complex environmental audio simulation would add together echoes caused by the major surfaces in the environment This depends on the location of both the emitter and the listener so as a sound is generated the direct path to the user is computed as is the re ection paths off the major surfaces 13 The surface properties determine how the sound is ltered and the path length determines the delay The ltered and delayed samples are added to the listener s sound signal at the appropriately delayed time The cheap version of this simply delays the signals by the appropriate time while a more accurate simulation will spatialize each echo For VR applications detailed simulation of more than 4 walls oor and ceiling this might not be useful but modern acoustic simulations of concert halls require a much greater level of detail 23 Touch Touch is the sense of pressure and vibration on the surface of the skin Touchsensitive cells lie underneath the skin all over the body in varying densities The most sensitive and most densely packed with sensors are the tongue and the ngers followed by the palms of the hands the face soles of the feet and then the rest of the body The back is the least touchsensitive area You can test this by gently poking the skin with two points separated by some distance On the ngertips this distance is l or 2 mm while on the back it is 2 cm or more The sensation of texture arises from the stimulation of many individual touch sensors and by the vibrations induced in the skin by rubbing the textured object with the skin Vibrations up to 5000 Hz seem to be detectable by touch and unlike sound vibrations can be detected across a surface 22 Page 14 223 Environmental Effects 16th April 2002 24 Proprioception 15 Touch is more temporally sensitive than vision but less sensitive than hearing If you touch two points in rapid succession and if the time difference is below a certain threshold the two touches will be seen as one object rapidly jumping from one point to the other This is similar to visual animation where two separate images are shown in rapid succession to give the impression of movement 14 Two words often used together are Tactile and Haptic Haptic is derived from Greek while Tactile comes from Latin 1n HITD interfaces Tactile has come to mean touch in the small scale such as the sense of surface texture perhaps displayed at more than one point Haptic refers larger scale forces that would be displayed using farce feec mck generally displayed at a single point These two lie on a continuum of force magnitude and frequency Tactile becomes haptic when the forces used for display affect the body s force sensing system beyond the single point of contact with the skin That is the forces of haptic display are strong enough to induce forces in the joints beyond the point of skin contact 24 Proprioception Touch gives people a clear spatial sense of their skin surface and by extension the space immediately surrounding them However this spatial sense is relative to the surface of the skin that is touching the object The sense of one s body in space is called Prap acepn39an This is the sense that enables you to know where the various parts of your body are in relation to each other For example if your knee itches your proprioception enables you to move your hand to your knee without any other sense being active Once your hand touches your knee the sense of touch is also involved to help you accurately scratc Proprioception is supplied by sensors in the muscles all over the body and seems to be about equally accurate everywhere Both touch and proprioception seem to be mostly unconscious senses as concentration does not make them perform better nor does conscious attention need to be paid for them to work Proprioception and touch work together to help form a mental map of the 3D space you occupy For example while driving a car one quickly learns where to locate the various controls purely by feel One never has to look to nd the gas pedal or the brake pedal in a familiar car Under most circumstances proprioception works accurately but there is a method to fool the proprioceptive sense If an approximately 60 Hz vibration is applied to the muscletendon intersection the vibration relaxes the muscle thereby lengthening it The proprioceptive sensors pick up the locally lengthening muscle and the attached joint is therefore perceived to be rotating 19 As long as the vibration is applied the joint feels as if it is in motion For example if the biceps are vibrated near the elbow the elbow feels as if it is rotating to be straightened out even if it is already straight A more fascinating illusion occurs if you hold your nose close your eyes and apply biceps vibration near your elbow The same extension feeling will occur which con icts with the feeling that you are holding your nose Subjects in an experiment in which this was done reported that their noses were stretched out or their ngers stretched superlong to account for the disruption of body image caused by the proprioceptive illusion in the biceps 19 The key to this illusion is that the eyes are closed If the eyes are open subjects feel an odd rotating sensation in their elbow Because the visual input is more believable than the proprioceptive illusion the illusion of changed body shape is suppressed The phenomenon of vision overriding the other senses is common Ventriloquists rely on this to let the viewer believe that the dummy s voice is coming from the dummy 25 Balance Another sense related to proprioception is balance or the sense of which way is up This sense is supplied by the vestibular system which consists of a set of three semicircular canals located inside the skull around each inner ear This system tells you the direction and magnitude of gravity and other gross accelerations being applied to your head The major function of the vestibular system is the vestibuleocular re x which is a re ex that steadies the eye s gaze as the head turns The semicircular canals pick up head acceleration and this information is used to control the eye to move smoothly in the opposite direction to maintain a steady retinal image Vision also overrides the vestibular sense An amusement park ride that exploits this is a building that is rotated around a stable seating area To a person sitting in the seat it looks as if the seats are turning upside down but in fact the room is tilting and the seats are stable People in such rides inevitably brace themselves for a fall that will not occur 17 A milder version of this occurs on commercial air ights when the plane banks for a turn 16th April 2002 24 Proprioception Page 15 16 Chapter 2 Perception According to the physics of ight there is no difference in gravity when the plane is banking because the wings are still lifting the plane with the same force Verify this by watching your drink as the plane turns It will not spill However you might feel a mild tilting sensation as you look out the window because the earth is tilted 251 Simulator Sickness A comm on compliant about VR systems is that they may induce simulator sickness in the user The con ict theory of motion ickhe fate that con icuinu 39 f 39 from the vestibular system and the visual system is interpreted by the body as evidence of sickness Anatomically a nerve directly connects the vestibular system to the center in the brain that controls vomiting 32 Simulator sickness is thought to arise from a con ict between the visual display and the user s head motion while the user is wearing a headmounted display Experiments to induce simulator sickness by changing various display parameters indicate that the main causes of simulator sickness are visual distortion and image lag That is if the 3D graphical images displayed on the HMD s displays do not match the optical properties of the display then the user will be more likely to become simulator sick For example setting the perspective projection to have a wider eld of view than the HMD s optics actually deliver has been found to make users sick Thankfully distortions of this nature are easy to x 1n the same study image lag was not found to be a strong causer of simulator sickness 26 Interaction and Lag 1n the previous sections we have brie y outlined the hum an sensory and perceptual systems that the user uses to comprehend the outside world We have concentrated on the steadystate aspects of perception In this section we will brie y outline different gradations of time that the user is aware of and how they might affect user s use of computer systems The central feature of HITD interfaces is that they are interactive That is they react immediately and directly to the user s commands and inputs The term interactive has also been used to refer to computer systems that do not seem very interactive at all The reason for this is that there are three different levels of interactivity which are related to hum an perceptual and cognitive time constants In order of increasing length and decreasing interactivity these time constants are calledPerceptual Processing Immediate Response and Unit Task 8 261 Perceptual Processing Perceptual Processing is the lowest level of interaction and the one that requires the least amount of thought Indeed the point of perceptual processing is that it is unconscious and a person can perform it automatically The time constant for perceptual processing is the shortest of the three levels of interactivity at about 100 ms and is directly linked to the times of the human perceptualmotor loop For example a perceptualmotor loop operates when a person decides to move the right hand to a target As the hand moves the motor nervous system causes the right arm muscles to ex in order to move the right arm and the perceptual system vision proprioception operate to determine the right hand s current position As the right hand gets closer adjustments are made to muscle exion and the hand trajectory changes as desired All trajectory adjustment happens with no conscious effort whatsoever The perceptual and motor systems operate in a loop with motor neurons causing motion and sensory neurons reading the results and feeding those results back to the motor control centers in the brain The traversal time for this is about 70ms 8 1n mechanical and computer systems direct hum an control of the system lengthens the perceptualmotor loop For example in driving a car the motor neurons move the hands that turn the steering wheel which turns the car The perceptual system reads the car s current direction and the motor control center of the brain and higher centers issue new commands for steering wheel con guration In essence the perceptualmotor loop includes the car Similarly when a user is interacting with a computer the user s perceptualm otor loop includes the computer Thus when the user types a key the computer gets the keypress and causes the corresponding letter to be displayed which is seen by the eyes When the mouse is moved computer receives the change and draws the mouse pointer in a new location The major difference between the car and the computer is that the car will react more or less Page 16 251 Simulator Sickness 16th April 2002 262 Immediate Response 17 immediately while the computer may incur some sort of processing delay before responding to the user Therefore a lot of time might pass between user action and corresponding computer response In a HITD system as the user s head turns images in the headm ounted display must be displayed for the new position within 100m s along with any cursors or other representation of the user such as a hand image for the DataGlove The time between user action and corresponding computer response is called the computer system s Lag time or often simply Lag or a similar term Latency Operations that rely directly on the hum an perceptualm otor loop must have lags less than 100 milliseconds to be usable such as the lag between head or hand motion and visual update Why is low latency important If the lag between user movement and image update is too long the user will nd that interaction is very dif cult When image lag gets too large users change their behavior39 either by slowing down their movements or by adopting a moveandwait strategy 31 Image lags of under 100 milliseconds are generally tolerable and will result in only a slight slowdown of user movements Image lags between 100 and 500 milliseconds cause users to slow their movements signi cantly Lags greater than about 500 milliseconds result in a cessation of interaction because the users must carefully plan each activity in order to minimize wasted motion One reason for this is that human visual input dominates the human proprioceptive sense which means that people rely more strongly on what their visual system tells them about body position than what their muscles tell them If the user s muscles give different information about body location than the user s eyes then the brain believes the data from the eyes In the following explanation the task of placing the user s hand on a target illustrates the effects of visual lag but other movements such as head or body movement will be affected the same way When the user tries to place his or her hand on a target in the 100millisecond lag situation the user will slow the hand when the hand is close to target However the visuals are late so the hand is in fact beyond target when the user sees it on target When the user eventually sees hand overshoot the user reverses course to get to the target but at a much lower speed Using high hand speeds typically results in large overshoot so the user slows down to minimize overshoot As lag increases the user must slow more and more in order to minimize overshoot When the lag reaches 500 milliseconds the overshoot minimization speed is so low that the user switches to a different strategy The user instead plans where the hand is to be placed then waits for confirmation of the new position When the user must plan hand trajectories no interaction is taking place because the user thinks carefully about the next move to make In contrast true interaction relies on lowlevel perceptualmotor capabilities bypassing the user s cognitive processes 262 Immediate Response The immediate response time constant is on the order of 1 second 23 and relates to the time needed to make an unprepared response to some stimulus For example this might be the time it takes for you to answer a question that you know the answer to In the computer context if a computer responds within one second of typing return at a commandline prompt the user concludes the computer is running When sur ng the Web the browser should indicate that it is starting to load a new page one second after a hot link is clicked In the HITD context semantic input to the application e g invoking a comm and by making a hand gesture should generate a response within one second39 otherwise the user will think that the application is dea The immediate response time constant is 10 times as long as the perceptual processing time constant This distinction here is between lowlevel human motor control and a slightly higher level of cognitive response That is the immediate response time constant refers to computer response that is somewhat richer than simply pressing keys or moving the mouse However mouse motion would be extremely dif cult to control with a 1second lag and HMD displays with 1second lag would be intolerable 263 Unit Task The unit task response time constant is between 5 and 25 seconds which is the time required to complete some elementary task In the web sur ng example this is the time one would expect a page to have completed loading This is up to 25 times longer than the immediate response time constant because something much more signi cant is happening Instead of just acknowledging that a hot link has been clicked the web browser gets the entire page One of the challenges in HITD user interfaces is building the system in such a way that it matches the user s expectations particularly these time constants This is particularly challenging in the VR case because the entire scene must be redrawn to re ect the user s new gaze within the shortest time constant of 100ms This is because 16th April 2002 262 Immediate Response Page 17 18 Chapter 2 Perception head turning to look at a new object invokes the perceptualmotor control loop which cannot tolerate lag of much more than 100ms without user fatigue In designing HITD systems it pays to move some tasks into the imm ediate response or unit task domain so that the perceptualm otor requirements are met 27 WrapUp In this chapter we have brie y outlined the perceptual and perceptualm otor systems of the user We enumerated the Visual Auditory TactileHaptic Proprioceptive and Balance senses with a view to their impact on HITD interfaces Developers of VR systems should bear in mind that although vision will override con icting information from other senses this emphatically does not mean that other senses are to be ignored Spatialized video and audio can work together to provide users a much more compelling VR experience than either would in isolation The review of Card Moran and Newell s three time scales of Perceptual Processing Immediate Response and Unit Task rem ind us that applications can be structured in such a way that users are satis ed even not every aspect of the application reacts within lOOms Application developers can use these observations to their advantage Page 18 27 WrapUp 16th April 2002 Chapter 3 Hardware The availability of appropriate hardware is one of the essential factors in the development of HITD user interfaces because HITD user interfaces demand realtime 3D graphics and realtime 3D position and orientation tracking Due to its enormous computational requirements realtime 3D graphics could only be delivered by ight simula tors costing several millions of dollars until the mid 1980 s 5 Threedimensional sound spatialization also has very large computational demands and systems to perform this in realtime became available in the early 1990 s Reliable realtime 3D tracking has been available since the mid 1970 s but the computational requirements are also quite large Consequently cheap 3D tracking did not become available in a small package until the early 1980 s Devices for measuring nger position did not become available until the mid 1980 s mainly because no need was seen for such devices until that time Of course the reason why we can now talk about HITD user interfaces is that VLSI technology continues to deliver faster and cheaper computers to perform 3D graphics position tracking hand tracking 3D spatial sound and so forth Advances in VLSI have also brought inexpensive display devices into existence from which safe head mounted displays can be built As consensus builds on which are the best devices to use the expanding market may result in lower prices making HITD user interfaces accessible to everyone This is certainly happening in the realm of home entertainment where very strong 3D graphics performance is available on inexpensive computer game consoles 31 How Does The Hardware Fit Together Unfortunately this technology is not yet at a stage where one can just buy the equipment plug it in and have it work A certain level of effort is required to integrate the equipment into a useful system Figure 31 shows a diagram of a HITD system with every possible hardware device included The idea is simply to visually enumerate the hardware that one may possibly use At the center of the diagram is a central computer that runs the virtual environment collecting data from the input devices and distributing data to the output devices The arrows on the lines indicate the direction of data ow For devices that can do both input and output the arrows point in both directions We will come back to this diagram in Chapter 4 when we discuss software structures for HITD interfaces The user is assumed to be interacting with every device so we have not included the user in the diagram so that it is visually comprehensible The user does really care what the hardware structure is as long as it works Now that you can see where all the devices t in the grand scheme of things the rest of this chapter will introduce you to ve major classes of hardware focusing on the general issues and technologies available These classes of hardware are 3D graphics engines input devices output devices and sound lO Figure 31 A diagram of a HITD system with every possible hardware device included 20 Chapter 3 Hardware 32 Graphics Engines A graphics engine is the component of a HITD user interface that is responsible for drawing the virtual world based on the user s current viewing parameters This drawing process also known as rendering translates the raw geometric data that makes up the virtual world into a picture of that world from the user s point of view The graphics engine continually redraws or updates the image according to the most recent point of view The output of the rendering pipeline is a rectangular array of pixels stored in a special memory area called the frame bu er which must be scanned out onto the display screen for to user to see it Typically one views the frame buffer s image on a color picture tube CRT This need not be the case however and the section on output devices will review the choices available for image output 321 The Simple Rendering Pipeline This section summarizes the rendering process Chapter 2 deals with this material in signi cantly more depth and other graphics textbooks 12 give a more detailed treatment Figure 32 show a picture of the 5 rendering stages At its simplest rendering is done in ve stages Before the rendering process starts the each of the objects in the virtual world is speci ed in terms of its own coordinate system The rst rendering stage geometrically transforms each object into the world coordinate system This transformation operation is different for each object or geometric primitive At this point the viewpoint the user s eye and view direction are stated in terms of a position and orientation within the virtual world The next stage further transforms the objects into the canonical viewing coordinates and then clips or eliminates any objects that are partially or completely outside the canonical view volume The canonical viewing coordinate system has the viewpoint at the origin and the view direction down one of the axes typically the Z axis To generate a perspective view the canonical view volume is a truncated square pyramid as shown in gure 33 The canonical view volume ensures that the clipping process can be coded as ef ciently as possible Figure 33 Canonical view volume The clipping process determines whether the object in question is entirely inside the view volume entirely outside the view volume or crosses the view volume boundary If the object is outside the view volume it can be eliminated from further consideration for this view direction If it crosses the view volume boundary the outside portions of the object must be clipped off to ensure that errors do not occur in later processing stages The clipping of an object consists of nding the intersections of each of the object s polygons with the view volume and cutting all intersecting polygons at the intersection line The next stage is to project the clipped 3D objects onto the canonical 2D projection rectangle which is geomet rically equivalent to the area of the screen on which the user will view the image The purpose here is to prepare the geometry of the scene so that it can be drawn into the frame buffer using 2D drawing operations The nal stage rasterization draws the 2D projected objects into the frame buffer to be viewed by the user For each polygon the color of each pixel within the polygon is computed A common pixel computation approximates the effect of light being cast onto a smooth surface Alternatively a pixel s color may be retrieved from a sample image of a real surface such as a brick wall in a process called texture mapping To deliver the allimportant depth cue of occlusion the Graphics Engine must perform hiddensurface removal All hiddensurface algorithms seek to sort the incoming polygonal data based on its depth or distance from the viewer The most commonly used hidden surface method is the Zbuffer technique which uses a rectangular array of depth values which correspond oneforone with the pixels in the frame buffer When drawing starts the frame buffer is cleared to the background color and the 2 buffer is cleared to the maximum depth value When a polygon is to be drawn for each pixel the depth of the pixel is compared against the value stored at the corresponding location in the 2 buffer If the 2 buffer value is deeper than the new value then the pixel is drawn and the new 2 value is stored in the 2 buffer at that location If the new value is deeper than the stored 2 value then the current pixel value hides the new pixel and the new pixel is not drawn This computation can take place on a perpixel basis during the rasterization step Page 20 32 Graphics Engines 16th April 2002 322 Display Output 21 322 Display Output Once an image is completely rendered into the frame buffer it is displayed to the user A complex model will take a nontrivial time to draw which implies that the user should have something to look at while the screen is being updated The standard technique is to supply a double frame buffer 7 frame buffer A contains the image being currently displayed while frame buffer B has the current view being drawn on it When drawing is nished for frame buffer the hardware switches roles and displays B while drawing into Without this double bu er arrangement the user sees the scene being continually cleared and redrawn which destroys the perception of smoothly animated objects Refresh vs Update There are two closesly related terms that describe the process of painting an image on a display device Re esh and Update Re esh is a process where the the image is repainted onto the CRT by scanning the electron beam across the display in a raster pattern from top to bottom Refresh is required for CRTs because the electron beam brie y hits each phosphor dot and the phosphor stops glowing after a short time To maintain a stable image the image must be repainted every 8 to 20 milliseconds even if the actual image content does not change A CRT s Refresh Rate is thus the rate at which this repainting occurs usually between 50Hz and 120Hz The computer graphics system thus scans out the contents of its visible frame buffer once per refresh cycle While the internal mechanisms of LCDs or projectors may not actually require a continual refresh they accept a refresh signal to maintain compatibility with CRTs Update is the process where a new graphics image is generated for display In the simple rendering pipeline all of the objects have been rendered into the frame buffer and this new image is presented for scanout Update rate is thus the rate at which new images are appearing and this can vary anywhere from the same as the refresh rate down to hours per image The key distinction is that Update depends on image content while refresh does not Indeed the purpose of the doublebuffered frame buffer is to allow updates to invisibly occur on the invisible buffer while refresh is happening from the visible buffer Video Signals All graphics engines will scan out the frame buffer row by row using a video signal of some sort that can be plugged into an appropriate color video monitor Frame buffers range anywhere in size from 320 rows by 240 columns to 1280 rows by 1920 columns with a video refresh rate between 60 and 120 Herz Typically the frame buffer will output individual Red Green Blue and Synchronization signals Usually the choice of the video monitor is dictated by what the graphics engine can produce However in the case of headmounted displays the reverse is true because some currently available headmounted displays accept only a standard television video such as NT SC in North America The graphics engine must therefore be able to produce NT SC video The nominal NTSC frame size is 640 by 480 pixels refreshed 60 times per second Each NTSC frame consists of two elds where the even eld contains the evennumbered pixel rows and the odd eld contains the odd rows The even eld is scanned out in one 60th of a second followed by the odd eld in the next 60th then the even eld again and so on This evenodd eld scheme is called interlaced video because the scan lines of the two elds interlace to form one frame In this case the update rate of NTSC television is really 30Hz because only half of the image is refreshed per 60Hz cycle The most common way to communicate NTSC video is by a composite video signal which combines all of the synchronization and color information onto one signal line The frame buffer must be able to either output composite NTSC directly or output RGB and Sync with NTSC timing which can then be fed into an NTSC encoding device which converts the four input signals to composite NTSC 1f NTSC compatible output is not possible higher resolution video can be fed into a scan converter which is a device that resamples the incoming video signal and outputs the same picture using a different video standard Thus a scan converter could accept 1280 by 1024 video and output the same picture in NT SC format but the resampling process would result in a loss of information Some graphics engine manufacturers will supply hardware that allows distinct subsets of the frame buffer to be scanned out to two or more separate video outputs For example a 1280 by 1024 pixel frame buffer have enough pixels to accommodate four nonoverlapping 640 by 480 frames and Silicon Graphics Inc SGT sells hardware that will scan out these four frames simultaneously using NTSC 16th April 2002 322 Display Output Page 21 22 Chapter 3 Hardware The low cost last resort for getting NTSC video from graphics hardware that cannot output NT SC is to point a video camera at the monitor displaying the graphics and feed the cam era s output into the head mounted display This approach is not ideal and the results may look quite bad if the refresh rate of the monitor is not very close to 60 Hz because the video from the camera will icker This icker is due to destructive wave interference between the 60 Hz NTSC and the refresh frequency of the graphics monitor For graphics engines that output the frame buffer using a 15pin VGA connector life is somewhat simpli ed because many HMDs will accept VGA input The user must simply program the Graphics Engine to output video at the appropriate resolution Typically HMDs will only accept one display resolution and will simply display nothing if the wrong image size is used Two of the most common image sizes are 640 by 480 and 800 by 600 33 Graphical Display Devices Good 3D Impression Of course the purpose of displaying 3D geometric data is to give the user the visual impression of a 3D virtual world One way of expressing the quality of the 3D impression is enumerate how successfully the graphical output scores on various 3D visual cues The classical 3D cues were laid out in chapter 2 in section 212 Here we brie y list what 3D graphics process provides the perceptual cue Occlusion is provided by hiddensurface rem oval at some point in the rendering pipeline Perspective is pro vided directly at the projection stage of the rendering process Also available in graphics engines is orthographic projection in which an object will appear the same size at any distance Aerial Perspective or fog can be simu lated in the rendering pipeline by blending object color with fog color using a ratio that increases the proportion of fog at greater distances Another name for this operation is depth cueing which blends object color with the background color Motion Parallax and the Kinetic Depth Effect are provided by animation 7 repeatedly drawing rigid moving objects Light re ecting off a surface can be approximately modeled by the shading and light model facilities of the graphics engine Even fake shadows on the ground can be an effective cue because people assume that light com es from above unless information to the contrary is present One must take care not to have con icting shadow information because the user will be confused about where the lights are supposed to be Relative size of familiar objects is not provided directly by the rendering pipeline but must instead be provided by the model being rendered An important constraint is that all objects in the model should be at a consistent scale to ensure that familiarity is available at all times Texture gradient can be supplied by texturem apping an image onto a surface One simple method of adding texture to a at plane is to draw tile lines on it Focal Accommodation is typically not supplied by the graphics pipeline because all the common displays show their image at one focal distance Some display systems on the market allows manual focus adjustment but to support accommodation this adjustment would have to be based on the current focal distance of the eye Conceivably a system that tracks eye focal distance could be used to adjust the focus of the display but it is not clear that such a mechanism would be worth the effort Convergence also cannot be supplied directly by any graphics output device since it is a property of the user s actions However an eye tracker could concievably use convergence as a method of inputting commands to the HlTD system 331 Display of Stereo Images Because of the different eye locations stereopsis must be delivered by presenting two separate images rendered using different eye locations This cannot be faked by rendering an extrawide picture and giving the left eye the left portion and giving the right eye the right portion of the picture There is no disparity in the region where the views of the eyes overlap Recalling the simple rendering pipeline an image for each eye must be rendered using a viewpoint that cor responds to the location of the eye The two view directions and view up vectors for each eye must be parallel Converging or diverging the view directions results in a vertical disparity of corresponding points on either side of the midline This will yield binocular rivalry if the vertical disparity is larger than 10 arcminutes Page 22 33 Graphical Display Devices 16th April 2002 332 Passive Stereoptic Displays 23 332 Passive Stereoptic Displays When a single display screen is to be viewed by both eyes some sort of apparatus must be used to get the cor responding image to each eye All systems rely on some sort of eyeglasses which modulate the display s emitted light in tandem with the display This section will cover systems that rely on passive eyewear That is the user wears glasses that do not change state in order to provide the stereo effect The system that most people are familiar with was used in the 1950 s for 3D movies This method presents the left and right views one on top of the other in noncomplimentary colors such as red and green or red and blue This intermixed picture called the anaglyph is then decoded by wearing glasses with a red lter over one eye and a green or a blue lter over the other The resultant picture is perceived as black and white stereo and this can be used in lm in print or on a CRT with no extra equipment Only glasses are needed for this technique to work Of course a viewer not wearing 3D glasses sees a double image A similar system is to polarize the left and right views 90 degrees out of phase with each other and again superimpose the two pictures The viewer wears polarizing glasses with one lens 90 degrees out of phase with the other In contrast to anaglyph stereo some sort of special display equipment is needed The polarizing system is in common use for 3D lms today and it has the advantage that viewers see full color There is not a problem with putting the glasses on backwards since the polarizing lters are at 45 degrees from vertical and 90 degrees from each other When a CRT is used to display the images using the polarizing scheme there needs to be some sort of polarizing lter in front of the screen which can quickly switch between left and right polarities The CRT rapidly switches between left and right images so that when the left image is being displayed the left lter is on and vice versa for the right image The lter mechanism needs a signal from the frame buffer to indicate when to change state and what state to go to For stereocapable projectorbased systems an active lter in the projector selects between left and right im ages One could also place a lter in front of the projector lens if the projector itself is not stereo ready One must make sure that the projection screen does not depolarize the images either in the frontproj ected or backproj ected case as this would eliminate the stereo effect One important issue that arises is how the frame buffer should perform the rapid switching One method is to load the front image buffer with the left image and load the back buffer with the right and rapidly switch between front and back The main problem with this approach is image icker and if smooth animation is required another two buffers are needed to do doublebuffering Another approach is to cut display resolution in half and refresh the screen twice as often In this scheme the upper half of the frame buffer contains say the left image and lower half contains the right To display this only the top or the bottom half of the frame buffer is scanned out during 1120 second then the other half is scanned out in the next 1120 second Both halves of the frame buffer appear at the same location in successive refreshes because the display hardware receives a signal to start each halfframe at the top of the display Refreshing the screen at an overall rate of 120Hz also solves a potential problem with icker because each eye gets 1120 second of image followed by 1120 second of darkness This setup can be handled by most frame buffers because one half as much data is being scanned out twice as often However one does need a monitor that can handle this doubled vertical refresh frequency Another common approach is to rowinterleave the left and right images in the frame buffer in the style of NTSC video and to scan out the frame buffer in an interleaved fashion That is rst scan out all of the odd rows of the frame buffer that represents the lefteye view while the polarizer is set up for left eye viewing then scan out the even rows representing the right eye image while the polarizer is set for righteye viewing One advantage of this is that it simpli es how a stereo video signal is to be sent to the display since NTSC alread does it It also simpli es framebuffer setup because typically one large rectangular area contains both interleaved views The disadvantage is that image quality is typically not as good because draing in this style is typically done by stenciling out the left pixels while the right pixels are being drawn and vice versa This results in broken edges appearing in the image The chief advantage of the passive eyewear systems is that the passive eyewear is cheap and so many people can see stereo images from a single display device at a moderate cost One problem is that the onscreen polarizer will attenuate screen brightness somewhat and may blur the image In addition the polarizer must be large enough to accommodate the entire screen area which may incur a price disadvantage if only one user at a time uses the system 16th April 2002 332 Passive Stereoptic Displays Page 23 24 Chapter 3 Hardware 333 Active Eyewear In contrast to passive eyewear the user can wear glasses that change state based on which eye s image is visible on the screen Systems of this type simply have a shutter in front of each eye which opens when the correct image is on screen closes when the opposite image is displayed Modern shutter glasses use a large oneelement LCD for each lens One problem that must be solved with active eyewear is how to signal the glasses when to switch eyes Obviously some sort of signal wire will do the trick but some manufacturers supply a signal box with infrared LED s which signal an infrared receiver on the glasses when to switch Without a signal wire such glasses also need a battery to power the shutter and the infrared receiver As with the polarization system image icker is a problem best solved by refreshing at 120Hz The problems of how to rapidly refresh two views is solved by presenting the top half of the frame buffer during one 120Hz refresh interval and the lower half during the next Again there is a loss of 12 of the screen resolution with this scheme The advantage of the active eyewear system is that the total active area of the glasses is typically much smaller than the CRT so a pair of active glasses are cheaper to make than one screensized switching polarizing lter With the infrared signal systems each wearer must maintain a line of sight with the broadcasting box in order to pick up the switching signal so the active eyewear systems are not suitable for many viewers Cost is also a factor if more than a handful of viewers is contemplated since passive glasses can be made for less than a dollar Both the active and passive eyewear systems have a problem with image crosstalk because the switching time from fully left to fully right is nite and can be about two milliseconds or more In the active system the issue is how quickly the LCD can turn opaque or transparent There is a time where both LCD s are semitransparent which results in both eyes seeing both images for a short period Ideally the opaque phase of each LCD should overlap the other but this results in some ickering and a signi cant darkening of the display if the time to go transparent is long In the switched polarizing system the problem is less severe because only one element needs to switch 34 HeadMounted Displays Given that one is not required to use only one display stereopsis can be supplied by a separate display for each eye Stereoscopic viewers such as the viewmaster are of this type The optical path from each eye to its respective display is open but the path to the other display is blocked by a baf e andor some arrangement of mirrors and lenses 1n the case of the viewmaster and stereo picture viewers the image is a photograph 1n the case of a headmounted display each eye views a graphics display through a lens system that is worn on the head The rst requirement for a headmounted display is one display for each eye mounted binocularly in such a way that the whole display system can be moved by the user The display system must be lightweight enough to be carried worn or manipulated directly without the aid of motors or any other mechanical intermediary The second requirement is head tracking That is the user s view direction is continuously updated by some sort of position and orientation tracking system connected to the headm ounted display To meet these two requirements some modern headm ounted displays used LCDbased pocket televisions tracked by lightweight magnetic trackers There are three advantages to a headm ounted display 1 Stereopsis is delivered directly by the display hardware 19 The display can freely move about the room so the user can look around objects and explore the geometry of the virtual world 3 Some headmounted displays have a wide eld of view which enhances the 3D impression There are numerous variations however and the remainder of this section will explore the characteristics important to headm ounted display design The headmounted lens system should provide a focused view of the surface of the display over its entire extent and should eliminate unwanted interference from the other display In some systems the lenses will also expand the eld of view somewhat to more closely match the eld available to the eyes in real life It is quite important that the elds of view cover enough of the region between the eyes since this is where the eyes must converge to achieve binocular disparity However a wide eld of view requires a signi cant number of pixels to maintain reasonable resolution across the entire eld For example the ideal of 12 arcminute of optical resolution requires Page 24 333 Active Eyewear 16th April 2002 34 HeadMounted Displays 25 l2000 pixels over a 100degree eld of view Currently many wide eld headmounted displays have rather poor resolution simply because the displays being used do not have the available pixels Of course this brings up another problem Since the pixels are much larger than 1 arcminute they are quite visible and form a at texture perpendicular to the line of sight Consequently the optics in the headm ounted display must also blur the display to make the pixels less evident Up to this point we have used head mounted display as a generic term for a class of devices that provide mobile viewtracked stereoptic graphics On closer examination however there are variations in how a head mounted display is built that strongly affect how it is used The four main variables are eld of view weightbearing method view tracking method and transparency The wider the eld of view the better the 3D impression one will get of the virtual world A wide eld of view has the advantage that the one can move just one s eyes to focus on a different object instead of moving one s head Typically the narrower the eld of view the more the user must move his or her head around to nd the object of interest In addition a narrow eld of view implies that the user s peripheral vision remains essentially unused which may be important in some applications where deep concentration on a foveal view is not appropriate However a narrow eld of view is cheaper to deliver since a simpler lens system can be used and it has the advantage of higher optical resolution because the pixels are spread over a narrower ran e One constraint with these systems is that mounting two displays on a person s head requires that the displays be light The obvious reason for this is that a person can only carry so much weight on his or her head Closely related to the weight restriction is a restriction on rotational inertia which arises from limits in a person s ability to turn a headmounted mass Therefore all display weight should be placed as close as possible to the axis of head rotation if possible This means that display systems on the end of long stalks are not acceptable The upshot of these restrictions is that building a headmounted optical system is a serious engineering challenge The straightforward way of meeting this challenge is to use the lightest display and optical elements possible which more or less dictates the use of plastic lenses and small atpanel displays A way around the weight problem is to have a mechanical system of some sort bear the weight instead of the user This can be as simple as suspending the display from the ceiling with a rope and pulley arrangement with the display at one end of the rope and a counterweight at the other A more complex arrangement is to use a mechanical weightbearing linkage that also tracks position and orientation When the weight is borne by the mechanism the user s weight constraint is lifted but the rotational inertia constraint is not so manipulation of the display may require that the user s hands be used One clear bene t of a weightbearing mechanism is that heavier displays such as CRTs can successfully be used with resultant bene ts in resolution The next main variable with respect to headm ounted display design is the tracking method With respect the preceding discussion of weight the best tracking technology is that one that is the lightest given good accuracy and latency Ultrasonic magnetic and inertial trackers t the bill Another desirable feature is that there are no dead spots in the tracking volume Ideally there should be no set of points or orientations where the tracker stops returning useful tracking data Of course highly unlikely positions such as standing on one s head are less important than slightly unusual positions like bending over by 90 degrees If there are dead spots then the tracker system should be physically arranged to have its dead spots in the least likely positions e mechanical tracking systems are less successful for the reasons of added weight and inertia unless the mechanism bears its own weight All mechanical tracking systems suffer from the problem of dead spots which are known as gimbal locked orientations When e mechanism is gimbal locked a degree of freedom is lost and the user must unlock the mechanism by navigating out of the locked orientation This navigation process can be performed quickly but it is somewhat annoying and requires that the user disengage from the interaction to solve the mechanical problem By contrast the magnetic and ultrasonic trackers have no restriction on how the user navigates out of the dead spot Because of these problems with inertia and gimbal lock mechanical trackers are best suited to usage contexts where total freedom of movement is not required The higher inertia usually requires that the user navigate the display by using his or her hands instead of just moving the head The problem is not just that inertia is present but that inertia varies dependent on current position Variable inertia is easy to handle with the hands because people have lots of experience handling light and heavy objects Variable inertial loading of the head is somewhat more dif cult to endure and can sometimes engender motion sickness However the inertial problem can be ameliorated by moving slowly which may be appropriate in situations where the headmounted display is being used to examine a scene carefully Mechanical tracking is thus most appropriate in lowspeed small range of motion tasks such as scienti c visualization The best placement for the tracker is at or near the top or crown of the head because the rotational speed is 16th April 2002 34 HeadMounted Displays Page 25 26 Chapter 3 Hardware likely to be least there This is because most human head rotation is sidetoside or yawing motion which has its axis through the crown which means that the center of the tracker is on this axis The user spends most of his or her waking life upright so one can be reasonably sure that the crown of the head is pointing up The bene ts to the tracker are that the velocity is low and there is usually a line of sight between the top of the head and the ceiling Consequently the best place to put the source point of the tracker is to hang it from the ceiling With a mechanical system this may not be possible although the mechanism may be constructed to simulate a drop from the ceiling 341 Transparent Head Mounted Displays The last variable with respect to headm ounted display design is the transparency or opacity of the display Most commercial headm ounted displays used in the Virtual Reality context are opaque meaning that the external world cannot be seen Helmetm ounted displays used in aircraft simulators are transparent because the pilot must main tain visual contact with real world objects such as the cockpit controls and displays The previous discussion has implicitly assumed the use of an opaque display For a transparent headm ounted display the important technical decision that must be made is how to combine the virtual and real images There are two approaches The rst approach is to project the virtual image onto a beam splitter or halfsilvered mirror mounted at a 45degree angle to the line of sight The beam splitter passes the realworld image directly through it to the eye and re ects the projected image into the eye as shown in gure 34 In this optical combination method the viewer sees a more or less unobstructed view of the real world with a virtual image overlaid Figure 34 Optically transparent HMD The other major approach is to mount two video cameras on top of an opaque headmounted display and combine the camera image and the virtual image for each eye at some stage in the graphics engine A simple combining algorithm could clear the frame buffer with the camera image instead of the background color then draw the virtual objects on top as if the cam era image were at in nity A transparent display adds three signi cant challenges which are focus visual registration and hidden surface removal The problem of focus is simply that the virtual image must be superimposed upon the real image in such a way that the virtual image is in focus at depths comparable to the real objects The user should not have to reaccommodate each time he or she switches from focusing on real to virtual objects or vice versa With an optical combiner the virtual image is at a xed focal depth because the virtual image projector would be dif cult to refocus With a video combiner the real image is at a xed focus Visual registration simply means that the real and virtual objects should stay static relative to each other and should line up visually For example the virtual image should not lag the real image when the user s head is turned Another aspect of visual registration is that if two objects are in contact they should appear in contact not separated by some small distance This is particularly important in the XRay Vision style of interaction because the real and virtual objects are supposed to overlay exactly This has proven to be an enormous challenge over the years The last challenge is hidden surface removal between real and virtual objects For an optical combiner the real world is always visible so virtual objects cannot hide real objects The chief advantage of the video combiner is that virtual objects can hide real objects Conversely with both combination methods real objects cannot hide virtual ones unless realworld depth information is known Some sort of ranging system could be used so that for example a real wall could hide the virtual furniture behind it if the distance from the viewer to the wall is known 342 Examples of Head Mounted Displays In this section we will review some headmounted displays available commercially and a few research prototypes In the commercial world headmounted displays run the gamut from ve hundred thousand dollar highresolution transparent helmet systems to ve hundred dollar oscillating mirror systems It seems that for every magnitude of price there s a system available In the research world the emphasis is on new capabilities that are not com mercially available We will begin with commercial headmounted displays working from the highend systems towards the low en Page 26 341 Transparent HeadMounted Displays 16th April 2002 342 Examples of HeadMounted Displays 27 FiberOptic Helmet Mounted Display The rst system is the FiberOptic Helmet Mounted Display FOHMD manufactured by CAE Electronics in Montreal Quebec The display itself is part of a ight simulation system built to train pilots in such tasks as low level ying and airtoair combat CAE has therefore optimized the display hardware for use by a person strapped into a seat with limited freedom of movement so some of the display optics are not necessarily appropriate for more generalpurpose use The FOHMD has three sets of components mounted on a modi ed pilot s helmet worn by the user The components are a position and orientation tracking system a rate sensor system and an optical sensing system The tracking system uses a small array of infrared Light Emitting Diodes LEDs mounted on the top of the helmet which are pulsed in sequence and sensed by four overhead cameras The three rate sensors monitor the velocity of angular rotation of the head and are used to estimate future head orientation We will cover these systems in more detail in the section on 3D trackers The optical system consists of four graphics engines which display their images into the input of two beroptic cables These cables are exible bundles of optical bers that transmit an image The output of each beroptic cable is re ected off a beam splitter mounted in front of each eye and then projected onto a specialized semi transparent optical surface The beam splitter also allows the wearer to see the real image of the cockpit The three specialized system components not found in other headm ounted displays are the projection system providing the images the ber optic cables and the display surface These will be reviewed in turn For more details see Barrette 25 The image projection system consists of four graphics engines which display their output using four TALARIA lightvalve largescreen video projectors These are highresolution highbrightness devices which provide a 1000 line picture in full color and are typically used in auditoriums to project television images For each eye one graph ics engine and its projector displays a lowresolution background image and the other displays a highresolution inset that is in the foveal region of the user The background image has a hole in the region of the inset so that the user does not receive con icting information The optical resolution of the highresolution inset is 15 arcminutes per scan line while the background image has a resolution of 50 arcminutes The eld of view for the background image is 825 degrees horizontal by 66 degrees vertical while the inset is 24 degrees horizontal by 18 degrees ver tical The binocular overlap of the two displays is 38 degrees at its widest point which results in a total eld of view for both eyes of 127 degrees by 66 degrees The beroptic cables consist of approximately 4 million individual optical bers grouped as multi ber ele ments arranged into a coherent bundle The cables are six feet long and weigh 05 pounds per foot but very little of the weight is borne by the user s head The cables are reasonably exible and the manufacturers report that they allow reasonably unimpeded movement in the normal range of pilot head motion To reduce the visibility of the individual pixels each pixel of the image is spread over many bers using wavelength multiplexing The projection surface is a Pancake Window manufactured and trademarked by Farrand Optical A Pancake Window is a semitransparent holographic optical element that collim ates incoming light so that it appears focused at in nity Thus the focus challenge is answered by focusing real and virtual images at in nity The Pancake Window gives an 825 degree horizontal eld of view and combined with the beam splitter allows the user to see the cockpit controls directly In order to not wash out the real image of the cockpit with graphics the graphics engines blank out areas of the image where the cockpit of the aircraft is to appear Hiddensurface removal is easy to handle in this case because all of the virtual objects are outside the cockpit and are thus farther away than the one real object which can be precisely modeled to produce an accurate blanking mask The remaining challenge is visual registration which in this case means visual registration of the virtual aircraft outline with the real aircraft Part of this task is supported by the precise aircraft model and part is supported by the specialpurpose position and velocity tracking systems The position tracking system in this case must be quite accurate but needs only operate over a small range The velocity trackers are used to predict head position compensating for lag in image generation lmage lag appears as registration error when the user turns his or her head because the virtual image lags the real one There is a range of products manufactured by CAE and the least expensive systems use CRTs instead of projectors with the entire image at background resolution and the eld of view reduced to 100 degrees by 45 degrees because of a loss in brightness The most expensive versions of the FOHMD also include eye tracking in which the highresolution inset moves based on eye position 16th April 2002 342 Examples of HeadMounted Displays Page 27 28 Chapter 3 Hardware VPL EyePhone The rst commercially available colour headm ounted display was the VPL EyePhone rst manufactured by VPL Research of Foster City California in 1989 The EyePhone and its successors are opaque systems which use two color LCD pocket television displays mounted in front of a wideangle stereoscopic lens system The user s head position and orientation is tracked by a Polhemus 1sotrak VPL s rst system called the EyePhone 1 used the LEEP optics system and attached to the head with a rubber diving mask and fabric straps with a soft counterweight at the back of the head The total weight of the EyePhone 1 was 4375 pounds half of which was the counterweight The EyePhone LX is the successor to the EyePhone 1 with a redesigned headmounting system based on a rigid ring that holds the optics and display system inside which is a soft headband The stereoscopic optics for the LX is a proprietary fresnel lens system To x problems with resolution VPL have also introduced a high resolution version of the EyePhone called the EyePhone HRX which has a 640 by 480 nominal RGB resolution The HRX and the LRX have the same ringbased headmounting system and both are lighter than the EyePhone 1 weighing 25 pounds The EyePhone 1 evolved from a headmounted display rst designed at NASA Ames Research Center by Mike McGreevy and Jim Humphries The rst NASA headmounted display used LEEP optics and monochrome LCD pocket televisions mounted inside a motorcycle helmet Problems with simulator sickness and overall weight prompted the redesign of the NASA system to replace the helmet with a sturdy adjustable headband similar to those found in hardhats The display was mounted on a ring that was supported by the headband on pivots above the ears One problem with the NASA system was that the distance from the user s eyes to the LEEP optics was not identical each time the user wore the display a problem which is solved by VPL s use of a diving mask Virtual Research Flight Helmet Virtual Research of Sunnyvale California introduced their Flight Helmet in 1991 This display system also uses LEEP optics and LCD pocket televisions although the headmounting system is different As the name implies the Flight Helmet is a hard plastic shell with the optics mounted in the front and an adjustable headband similar to those found in hardhats The display weighs about 4 pounds with some of the electronics at the back of the head to balance headborne weight The Flight Helmet also has an onoff switch to reduce LCD television wear and tear and also comes with a pair of earphones mounted inside the helmet This system can be tracked using the Polhemus Ascension or Logitech trackers Because of its mounting system the Flight Helmet is much more rigidly xed to the head and does not slew around as much as the EyePhone 1 when the head is quickly rotated However there is no diving mask to x the viewing distance from the user s eyes and there is some light leakage from the external world which will tend to wash out the display a little in a bright room LEEP Systems CYBERFACE LEEP Systems Inc formerly known as PopOptix Labs in Waltham Massachusetts is the manufacturer of the LEEP optics system used in the EyePhone 1 and the Virtual Research Flight Helmet LEEP was also the the rst to market a commercial headmounted display the CYBERFACE 1 in March of 1989 This rst system was essentially identical to the NASA headmounted display with monochrome LCDs and was sold as part of a telepresence system with a cameraequipped dummy head The current system is called the CYBERFACE 2 which is an opaque display that uses two 4 inch diagonal color LCD televisions as the display elements This system can be tracked using the Polhemus Ascension or Logitech trackers LEEP sells two different head mounting systems the rst is the hat and counterpoise system and the other is a fairly conventional adjustable headband that can be moved quickly from one person to another The hat and counterpoise system is designed for longterm wear by one person and as such the hat is adjustable to t comfortably on the user s head The hat consists of a number of pads arranged in a ladder pattern with a detachable mounting at the front for the optics and display system and two sets of wire connections to the counterpoise at the rear left and right of the hat The wire bundles transmit video signals to the displays and audio signals to optional earphones The counterpoise is a box about one inch thick one foot long and 8 inches tall which contains the offdisplay electronics for the Cyberface 2 The counterpoise sits at on the user s chest suspended from the rear of the hat by a pair of wire bundles thus providing a counterweight for the displays This arrangement distributes most of the display weight directly over the top of the user s spine which is the ideal place to put headm ounted weight Page 28 342 Examples of HeadMounted Displays 16th April 2002 342 Examples of HeadMounted Displays 29 Because the hat and counterpoise system is detachable from the display subsystem each longterm wearer can customize his or her own hat for Cyberface use LEEP sells hats at a fraction of the full display cost for this purpose Three advantages of this mounting system are user customizability weight is distributed directly over the spine and the counterweight does NOT directly add to the rotational inertia of the system There are three disadvantages of the hat and counterpoise system the rst being that it is not very easy to don or remove Second while the counterpoise works well when the head turns left right up or down the counterpoise twists awkwardly when the head rotates about the line of sight Of course such roll motion of the head is uncommon so this problem does not arise too readily A more serious disadvantage is that when the user undergoes radical body motion such as bending forward and swinging around the counterpoise is apt to swing off the chest which can be disconcerting Clearly the hat and counterpoise system is optimized for the situation where a handful of users will be using the Cyberface 2 for long periods of time Optically the Cyberface 2 is very similar to the other LEEP Opticsbased headmounted displays with the important exception that the optical axes of the two lens systems diverge by 25 degrees This is accomplished by simply repackaging the lenses at a 25 degree divergence and mounting the LCDs appropriately There are two reasons for doing this the rst being that the 4 inch diagonal LCDs are 32 inches 813 mm wide and it would be dif cult to place them edgetoedge in the same plane without losing a signi cant amount of display area to the far left and far right Diverging the axes allows each LCD to be moved towards the center thus allowing more of the periphery of each LCD to fall under the visible area of the lens Second the 25 degree divergence also gives 25 degrees more peripheral range which Howlett claims is very important for creating the illusion of immersion The disadvantage of the 25 degree axis divergence is that the overlap region is 25 degrees smaller which means that stereopsis is available in a smaller region than in standard parallel axis LEEP based displays The total eld of view in the Cyberface 2 is 1095 degrees due to the divergence of the optical axes More detail on CYBERFACE weight more optics detail The two key elements of the EyePhone 1 the Virtual Research Flight Helmet the CYBERFACE are the LCD pocket televisions and the LEEP optics The following two sections will discuss issues in LCD television design and the LEEP optics system respectively LCD Pocket Televisions LCD televisions displays are identical to those found in consumer electronics products The resolution of these displays is quite low nominally 320 pixels horizontal by 240 pixels vertical for a total of 76800 pixels 1f the displays were monochrome this gure would be accurate since this is the number of individually controllable lighttransmitting cells However to create a color display each cell is overlaid with a red green or blue lter dividing the 76800 cells into three sets of 25600 pixels Since a color pixel is usually a triad consisting of a red a green and a blue cell the color resolution is in fact one third of 320 by 240 or nominally 18475 by 13856 pixels determined by dividing the vertical and the horizontal resolution by the square root of 3 Since the eld of view of each eye is 75 degrees 28 this results in an optical resolution of 24 pixels per degree or 243 arcminutes per RGB pixel Even with a monochrome LCD the optical resolution per pixel is 141 arcminutes Due to the obviousness of each pixel a diffusion screen is placed in front of the LCD to blur each pixel making the edges less sharp and therefore reducing the apparentness of the screen s texture gradient LEEP Optics The other key display component for the most headm ounted displays is the LEEP optics system which consists of three plastic lenses per eye mounted binocularly inside a plastic mount with a cutaway area for the nose The LEEP system magni es the object of interest and focuses the object at a distance between 30 cm and optical in nity depending on obj ect s distance from the lens system The LEEP optics system was designed for mounting two square stereoscopic photographs of a scene each photograph approximately 64 mm on a side It the stereo photography application the eld of view for one eye is approximately 90 degrees horizontal by 90 degrees vertical centered about the optical axis The optical axes for the two eyes are about 64 mm apart and are parallel Each eye s optical system is radially symmetric about its optical axis and the two optical systems are bilaterally symmetrical The exit pupils of each eye s lens system is quite large which means that the eye can range over a large area of the eyepiece and still see the object of interest The exit pupil was designed to be large enough to allow almost all of the adult population to see objects binocularly and as a result there is no means of adjusting the distance between the optical axes of the display By contrast 16th April 2002 342 Examples of HeadMounted Displays Page 29 30 Chapter 3 Hardware binoculars have a fairly small exit pupil which means that the eye must be precisely placed at the exit point in order to see out of the lens As a result binoculars must be adjusted by each user if they are to see using both eyes In order to perceive good stereo the optical properties of the lens system and the displays should be known so that the graphics engine can draw the images correctly One easy set of assumptions to make are that the eld of view is symmetric the lens system is a simple magni er and that there is no distortion In the case of the EyePhone 1 none of these three assumptions are true The eld of view is not symmetric in the EyePhone 1 because the LCDs cannot be placed close enough together to make the optical axis fall on the center point of the LCD The LEEP optics purposely incorporate substantial eld distortion and chromatic aberrations which means that the LEEP system cannot be treated as a simple magni er The effect of the eld distortion is to transform a rectangular grid as shown in gure 210A into the nonlinear grid shown in 210B Clearly this is not ideal for headm ounted display use and one solution is to predistort the image before it is displayed This is not practical with a typical graphics engine because it cannot be accomplished by linearly transforming vertices Robinett and Rolland have investigated the optical model of the LEEP optics with the EyePhone l and their model is summarized here The important point to bear in mind is that this model will give stereopsis Robinett and Rolland s model for the optical performance of the LEEP system is intended to account for the nonlinear distortion of the lenses which can therefore be used to determine the viewing parameters for the graphics engine The distortion function D models the lens system distortion with the domain being the real screen the range equal to the virtual image of the screen If 15 and ys are the real screen pixel coordinates and xv and yv are the virtual screen coordinates then Ivy yv Dzs ys Because of radial symmetry about the optical axis D can be simpli ed to a singlevalued function of the radius from optical center 7 1 D05 In order to make this model work 7 1 and 7 5 must be normalized to the maximum virtual image eld radius wv and real screen eld radius ws respectively resulting in TunTvwv TSTL Tsws The conversion from real screen coordinates to virtual screen coordinates has been found by Robinett and Rolland to be approximated by a third degree polynomial Tun TSTL kvsrsng where M3 is the coef cient of optical distortion need info on how msv is calculated based on dobjemve and ws For the EyePhone l the important parameters are listed below der 294mm Eye relief or distance from eye to lens w 281mm Real maximum screen width 21 3982721721 The focal distance kvs 032 Distortion coef cient for D Given these parameters and the following measurements for distances from the optical axis to each edge of the screen we can derive the measurements for the virtual screen locations to be used by the graphics engine The equations to use for each edge are TSTL Tsws Tun TSTL kvsrsng 7 1 T1an phi tanil vzv The resulting values for the EyePhone l for the right eye are in the following table The data for the left eye switch the values for left and right The column headed phi raytraced is calculated using a commercial optics ray tracing package Page 30 342 Examples of HeadMounted Displays 16th April 2002 342 Examples of HeadMounted Displays 31 Edge Is ys 7 5 TSTL Tun 7 1 phi phi mm mm mm mm model raytraced Right Edge 333 0 281 100 132 3585 420 450 Left Edge 209 0 209 0744 0876 2379 309 303 Top Edge 0 218 218 0776 0926 2515 323 318 Bottom Edge 0 185 185 0658 0749 7 5 is listed as 281 for the right edge because the actual right edge of the screen is vignetled or hidden from view For an eye relief of 294 mm the rightmost visible edge of the screen is at 281 mm These angular measures can be used to determine the viewing frustum for the graphics engine Chapter 4 will use the results shown here to generate a perspective matrix for the EyePhone 1 Fake Space Labs BOOM Fake Space Labs of Menlo Park California manufactures a line of headtracked displays called the BOOM systems for Binocular OmniOrientation Monitor As the name suggests the BOOM systems are tracked by a mechanical linkage which supports the weight of the displays and which tracks the position and orientation of view In the strictest sense of the term the BOOM system is not headmounted because its weight is supported by the linkage and because it has no head mounting apparatus It must be emphasized that the interaction with the BOOM is substantially different than with a headmounted display because the navigation of view is done with the hands holding two handles connected to the display not with the head The BOOM is an opaque display system which uses parallelaxis LEEP optics to stereoscopically view two small monochrome CRTs The mechanical linkage is built from light aluminum tubing and uses a shaft encoder to report the angular position of each joint A classical forward kinematics calculation is used to derive view position and orientation from the 6 joint angles Each joint has an accuracy of 4000 counts per full joint revolution resulting in an overall tracking resolution of 016 inch The linkage bears the weight of the CRTs with two counterweights connected to the two long main links The rst long link is connected to the base by a two degree of freedom joint system The rst joint rotates about the ver tical axis and has no damping or counterweight The second joint maintains the rst long link in a vertical position by a counterweight and damper mounted inside the ground stand which is attached by wire to the midpoint of the rst long link The second long link is usually horizontal and bears the weight of the CRTs with a counterweight at the opposite end of the link There is no damping between the rst and second long links so the user must be careful about accidentally moving the BOOM and then shifting attention away from it because it may collide with other objects in the room such as the user s head The total trackable volume of the BOOM is a vertical cylinder 25 feet in radius and 25 feet high with a dead zone 6 inches radius at the center Because the weight is borne by the linkage the BOOM uses small CRTs instead of LCD pocket television displays The LEEP optics are mounted in front of the CRTs and they have a wider eld of view than either the EyePhone 1 or the Flight Helmet because the CRTs are wider and taller than the LCDs and can be mounted closer together for a larger overlap region The advantage of the CRTs is that they provide much higher resolution than the LCDs with the most expensive models nominally providing 1280 by 1024 pixels per eye The BOOM systems provide two handles under the display block with a thumb button for each hand Typically these buttons are used for navigation in which one button moves the user forward along the line of sight and the other button is either a brake or moves the user backwards Because of the variable inertia of the system it is best to slowly move the BOOM while viewing the scene Otherwise the thrash on the user s head that arises from moving the BOOM s display block at high speeds can cause the user to feel mildly motion sick The sixth joint is the last on the mechanism which is the pitch axis for the display block the fth joint is the yaw axis and the fourth joint is the roll axis The BOOM experiences gimbal lock when the sixth joint and the fourth joint axes are parallel which occurs when the user is facing the center of the mechanism Forward and backward motion while facing the center usually results in the long horizontal link bumping the user in the head if the user moves too fast due to simultaneous rotation about the fourth and sixth axes When the pitch and roll axes are parallel or nearly parallel the user loses the ability to roll the view also due to gimbal lock The mechanism is built in such a way that looking straight down is very dif cult because the user s head must be located in the same place as the end of the BOOM while looking down The counterweighted design ensures that when the user lets go of the mechanism gimbal lock can be exited simply by rotating the display block about the yaw fth axis 16th April 2002 342 Examples of HeadMounted Displays Page 31 32 Chapter 3 Hardware Private Eye The least expensive headmounted display system is the Private Eye made by Re ection Technologies of Waltham Massachusetts Unlike the other systems mentioned above the Private Eye does not use LCDs or CRTs but instead uses an ingenious re ective optical system that scans a single column of pixels across the user s eye The Private Eye is sold as as a miniature monocular display suitable for viewing monochrome images such as text and 2D line drawings Its intended application area is to provide handsfree information display in an unobtrusive area of the user s eld of view allowing the user to occasionally refer to the display as the need arises Of course two Private Eyes can be used to create a headmounted stereoscopic display by moving the display moved from below the eyes at cheek level to directly in front of the user s eyes This rearrangement was rst introduced by Pausch 24 who mounted two Private Eyes on a pair of sunglasses and blocked out the remaining eld ofview with electrical tape At time of writing Re ection Technology sells the Private Eye only as a monocular display mounted on a light headband The display element itself is 12 inches by 13 inches by 35 inches weighing 225 ounces The display resolution is 720 pixels horizontal by 280 pixels vertical with one bit per pixel The eld of view is about 22 degrees and the exit pupil is quite large The display mechanism is based on a vertical linear array of 280 LEDs an adjustable magnifying lens and a scanning mirror as shown in gure 211 The linear array of LEDs is similar to those found inside laser printers with the column arranged in a zigzag pattern The zigzag pattern allows the LEDs to be packed together closely enough so that there are no vertical gaps between each LED Between the LED column and the mirror is a magnifying lens mounted on a track which allows the focal distance to be adjusted between 9 inches and optical in nity To guide the column of LEDs into the user s eye the LEDs are re ected off a vibrating mirror mounted at a 45 degree angle The mirror is mounted on a vertical hinge and vibrates throuin an angle of 15 degrees At any instant of time the user sees a vertical column of LEDs at a location corresponding to a column of pixels in the image As the mirror turns the LEDs are scanned horizontally across the user s eld of view The user perceives the scanned column as one image due to persistence of vision The hinge mounting is a pair of exible metal springs instead of a pin and bearing arrangement At the back of the vibrating mirror is a voice coil similar to those found in audio speakers which pushes against a springm ounted magnetic counterweight The magnetic counterweight is in fact attached to the opposite end of the same springs as the mirror and both the mirror and the counterweight vibrate at the same resonant frequency The mirror and counterweight vibrate in synchrony in a scissorlike motion which is kept at the same amplitude by input pulses applied to the voice coil Since the resonant frequency is used the the input energy applied to the voice coil is quite minimal and because the mirror and counterweight have the same inertia and move in opposite directions the net vibration of the entire system is quite small For mechanical reasons the timing of the horizontal scan in the Private Eye is dictated by the mirrorcounterweight mechanism By contrast CRTs or LCD pocket televisions are constructed to synchronize instantly to synchroniza tion pulses supplied by the raster scanout system The Private Eye therefore has a stationary light source and photosensor mounted on the counterweight and a tab mounted on the mirror which interrupts the beam of light when the mirror and counterweight are closest together The pulse generated by the photosensor is used to signal the scanning system when to start and this pulse is also applied to the voice coil in order to push the mirror and counterweight apart Two other aspects of timing must be handled by the Private Eye controller The rst issue is that the mirror moves sinusoidally with its highest speed in the center of its travel The scanout system must account for this by scanning out columns of pixels slowly at the left and right edges of the frame and fastest in the middle The other timing constraint is that the zigzag arrangement of LEDs requires that the left subcolumn of LEDs must be illuminated at a slightly different time than the right subcolumn so that the resultant image is a straight line instead ofa zigzag ine The image of the Private Eye is bright red on a black background with a contrast ratio of 701 The image is refreshed at 50 Hz and the manufacturers claim that icker is not a problem because the display is viewed by the fovea which won t notice icker as much as peripheral vision does Because of the unusual timing requirements the Private Eye can controlled by an IBM PC card into which the user feeds a bitmap of the image to be displayed One result of this is that the interface to the Private Eye is not NTSC standard sync which limits plug compatibility with standard video sources Currently this IBM PC card has somewhat limited memory bandwidth so display update can be rather slow Re ection Technology also sells the timing controller chip suitable for use in a custom circuit Page 32 342 Examples of HeadMounted Displays 16th April 2002 35 Position and Orientation Trackers 33 Of all the display systems outlined so far the Private Eye holds the most promise because of its compact design high resolution and high contrast Re ection Technology is working on better versions of the Private Eye which will feature higher resolution and six bits per pixel A color display is also anticipated but this requires columns of green and blue LEDs which are still a few years off in high production volumes HeadMounted Display Summary To wrap up this section there are a number of headmounted display technologies available today ranging in price and performance from highend military simulator equipment to lowcost highvolume displays The CAE FOHMD offers the best performance at certainly the highest price The total eld of view of 127 by 66 degrees driven by full color highbrightness lightvalve projectors with highresolution inset offers almost the entire wish list for headmounted displays except for freedom of movement The main drawbacks of the FOHMD are the strong limitation on range of movement and the fact that the display and projection hardware will ll a decentsized room The next step down in price are the BOOM systems which offer good resolution brightness and eld of view with a lowlatency weightbearing tracking mechanism The chief advantages of the BOOMs are that the resolution is good and that there is little time required to suit up in order to start viewing the virtual world The disadvantages of the BOOM are that it does not offer complete freedom of movement due to gimballocked orientations high inertia may cause dizziness and mild motion sickness when fast movements are used and no damping on the main horizontal link may sometimes cause the BOOM to collide with other objects in the room including inattentive users Again we wish to emphasize that the interaction with the BOOM is different that with a true headm ounted display Clearly the BOOM is best suited to situations where careful examination of a scene in detail is the main goal as would be the case in Scienti c Visualization applications The VPL EyePhone family and the Virtual Research Flight Helmet are at the next level in price with fairly low resolution low brightness displays The advantage with these systems is that they are fairly inexpensive and offer color display at NT SC standard sync rates The main disadvantage of these displays is the rather low optical performance of the LCD pocket televisions coupled with the nonlinear distortion of the LEEP optics system However one can anticipate steady improvements in the LCD TV resolution department due to market forces in the consumer electronics area Lastly the Private Eye holds the most promise for a lowcost highresolution headmounted display The main advantages of this display are its high optical resolution high brightness and high contrast ratio The main disadvantages are its 1 bit pixels lack of full color and small eld of view In their technical literature Re ection Technology states their intent to solve the rst two problems but the eld of view limitation appears not to be a priority This concludes the section on headmounted displays Except for the BOOM each headmounted display requires some sort of tracking device to report the wearer s head position and orientation to the graphics engine These devices will be covered in the next section 35 Position and Orientation Trackers The central purpose of a tracker system is to report the 3D spatial position and orientation of one or more objects In a VR application the tracker is typically used to track some of the user s body parts such as the user s hands and head The tracking data is then used by the user interface to enact operations in the application according to hand location and to determine view direction from head position and orientation HITD tracking systems must report position and orientation at an update rate of at least 10 times per second Therefore we exclude for the purposes of this discussion 3D digitization systems that collect surface elevation data terrain data and the like Such digitization systems contain some sort of ranging device that is scanned over the surface of interest gathering elevation data These systems are designed to collect information about a large number of points on an immobile surface over the course of several seconds or longer Thus both the orientation and the realtime requirements are not met A realtime ranging system would be useful in transparent headm ounted display applications for determining the distance from the viewpoint to objects in the external world Abstractly a tracking system consists of three parts 0 A source or reference device is the origin of the coordinate system in which the tracker data is reported 0 A sensor device is attached to the object that is being tracked 16th April 2002 35 Position and Orientation Trackers Page 33 34 Chapter 3 Hardware 0 An electronics unit coordinates data collected from the source and uses this information to determine the position and orientation of the sensor with respect to the source In magnetic and sonic systems the source device radiates energy which is detected by the sensor and sent to the electronics unit In mechanical systems the source is the ground link of the mechanism and the sensor is the distal link or end effector of the mechanism The actual sensing mechanism is a collection of joint measurement devices distributed along the links of the arm In optical systems light can ow from the reference frame to cameras on the object or from the object to cameras about the room Nevertheless we will stick to the convention that the sensor is mounted on the object being tracked and the source is the origin 351 Tracker System Criteria The quality of a tracking system can be judged on nine criteria which are 1 Resolution 2 Accuracy 3 Lag 4 Update Rate 5 Range 6 Interference and Noise 7 Mass Inertia and Encumbrance 8 Multiple tracked points and 9 Price Not all tracking systems perform well on all criteria so when choosing a tracking system it is important to choose a tracker that best ts the requirements of the task at hand For example if high lag is not important but low mass is a mechanical system is not likely to be the best choice This set of tracker properties will allow the user to nd a tracker that best suits the demands of the task The following paragraphs will de ne these criteria Resolution Resolution is the neness with which the tracking system can distinguish individual points in space Typically resolution is the minimum amount that the sensor must be moved to report a different value For example if a given tracking system quotes resolution of lmm then the sensor can be moved within a radius of lmm of a starting position without changing the reported value If the sensor were moved more than lmm the value would change by 2mm The resolution de ned in this section is measurement resolution 7 the ability of the tracker to measure different points On the other hand a tracking system s numeric precision indicates the number of bits that have been allocated to report position and orientation Numeric precision is usually xed usually 14 to 24 bits for each of X Y and Z Measurement resolution is not constant over the entire measurement range Acc uracy In contrast with resolution accuracy measures how close the tracker s reported position is to its actual position Thus while resolution is a measure of spatial performance in the small accuracy is a measure of overall spatial performance Typically an inaccurate tracker can be made more accurate by performing some sort of calibration process which measures the difference between real space and the tracker readings The measurements are used to generate some sort of adjustment function that can be applied to tracker readings while the tracker is in use In most HITD user interfaces high resolution is more important than high accuracy because people can com pensate for the simple spatial distortions that are the manifestations of tracker inaccuracy As already mentioned however transparent headmounted displays require high accuracy since the problem of virtual image to real im age registration must somehow be solved and large tracker inaccuracies will result in large deviations between virtual and real images The term registration refers to a calibration process in which the virtual and real images in a transparent head mounted display are made to line up visually For example special cross marks on the real scene could be du plicated in the virtual scene and the user adjusts the virtual imagery until the virtual marks register with the real ones In opaque headmounted display applications not AR only moderate resolution and accuracy are needed since the user executes very tight closedloop control over head orientation based on current visuals High resolu tion is more useful for hand or nger tracking since the hand is much more capable of ne motion and the user expects ne control to be available Possibly the best use of moderate to low accuracy tracking is in the sensing of gross body motions such as upper arm torso legs and feet Essentially the resolution of the tracking system should match the movement resolution of the body part being measured since any higher resolution will go to Page 34 351 Tracker System Criteria 16th April 2002 351 Tracker System Criteria 35 waste Appropriate tracker accuracy depends on whether the user can notice inaccuracy or if it affects perfor mance The user may notice mismatches in real versus virtual images or may notice mismatches in proprioception versus virtual images Lag Lag in a tracking system is the time that passes from when a sensor rst arrives at a point to when the tracking system rst reports that the sensor is at that point Within a tracking system this lag can be broken down into two largely independent subcomponents that sum to produce the overall tracking lag The rst lag is the data generation lag which we de ne as the amount of time that the tracking system takes to gather sensor readings to calculate the tracker s output and to commence sending the rst bit of the output along the communications link to the receiving computer The second lag is transmission lag which is the amount of time that is spent in transmitting the entire tracker output over the communications link Most tracking systems overlap these two processes sending the current tracker readings while collecting and calculating the next Tracker manufacturers will usually quote data generation lag in their marketing literature since this is guaran teed to be less than the total tracker lag The problem with this approach is that the buyer is really interested in the time it takes for the tracker data to completely arrive at receiving computer not the time it take for data to start owing However since transmission lag amost always xed and can be calculated by multiplying the amount of data being sent by the communications rate it makes the most sense to quote only data generation lag In a HITD system tracking systems are not the only source of lag Time to pass the tracking data from the tracker to the graphics engine and the graphics engine s rendering time will also add to lag Recalling the discussion from section 26 high lag from user motion to graphical update will cause the user to slow down the interaction or to carefully plan user motion destroying the interactivity of the system Therefore tracker lag should be as small as possible Update Rate Tracker update rate is the number of tracker data samples per second that are transmitted to the receiving computer A high update rate is of course preferable although rates greater than 60 updates per second are probably most useful for highprecision applications and for situations where the tracker is controlling a mechanical system Typically 30 updates per second per sensor is good enough for most applications since human neuromuscular control of the limbs and ngers seems to run at 5 Herz The communications rate between the tracker and the receiving computer places an upper limit on the update rate Communications rate also affects lag so while it may seem suf cient to have a communications rate just large enough to handle the maximum update rate generated by the tracker higher data rates are desirable because they aid in reducing latency If a serial communications link is used the tracker system should provide data rates more than 19200 baud Some manufacturers provide faster interfaces such as an ethernet localarea network connection or a card that can be plugged into a PC Range There are two types of range to consider for a tracking system position and orientation Position range is the volume within which the tracker system reports position data with a given resolution Position range is also referred to as the working volume and at its simplest is stated as a sphere whose radius is the maximum distance from source to sensor 1f the source is within this maximum radius the position data will be accurate to a certain level This simple spherical volume usually overstates the total working volume for reasons that have to do with the particular sensing metho 1n mechanical systems the maximum radius is simply the maximum reach of the links in the mechanism The working volume will be a subset of this sphere because joint limits and mechanical stops do not allow full travel of each joint at all locations With sonic or magnetic transmission systems the maximum radius is related to signal strength However some magnetic systems accurately report position for only one hemisphere When the sensor is in the opposite hemisphere these trackers incorrectly report it at a location in the true hemisphere Position range in sonic systems is similarly limited to a single hemisphere because the source usually radiates energy in only one hemispherical direction Position range in optical systems is typically related to the eld of view of the cameras 16th April 2002 351 Tracker System Criteria Page 35 36 Chapter 3 Hardware Position range is related strongly to resolution If the user is interested in highresolution position measurement the user should be prepared to keep the source and sensor close to each other If the tracking system allows the user to change the numeric precision of position and orientation data the user may be able to take advantage of higher measurement resolution when the source and sensor are close This tradeoff between range and resolution also applies to mechanical tracking systems because larger range requires the use of longer links in the mechanism For a given angular change about the links joint the longer the link the larger the displacement at the link s endpoint In camerabased systems the resolution of the camera determines the range of the sensor since the optical system must somehow recognize some feature in its eld of view If the feature is too far away it might not be large enough to recognize or it may be too faint depending on the system Orientation range is the set of all sensor orientations in which the tracker system reports orientation with a given resolution When the sensor is out of orientation range the tracker will no longer report useful orientation data For convenience we de ne orientation range to be dependent on position range since position ranges are usually much easier to visualize Similar to position range orientation range can be brie y stated as the set of orientations that sweep out a sphere hemisphere or some other solid For example the magnetic trackers have a spherical orientation range since the tracking technology is able to report all orientations at each point in the working volume Sonic and optical systems require that the source has an unobstructed line of sight to the sensor and so have at most a hemispherical orientation range Thus as the sensor moves around the orientation range changes depending on position Mechanical systems have a nearly spherical orientation range at most positions but gimbal lock is usually a problem somewhere For example the BOOM system does not allow rotation about the line of sight while the user is looking towards or away from the center of the mechanism see section Depending on the number of missing degrees of freedom due to gimbal lock some orientations are not available or not all orientations are directly accessible from the locked orientation In addition some orientations may be unavailable due to joint limits and restrictions in position range Interference and Noise Interference is the action of some external phenomenon on the tracking system that causes the system s perfor mance to degrade in some way Interference arises from imperfect environmental conditions and typically mani fests itself in the tracker output as noise inaccuracies or both Noise can be de ned as a random variation in an otherwise constant reading One effect of noise is to reduce the instantaneous resolution of the tracking system since noise increases the range of readings that correspond to a single point Thus the distance between distinct points resolved by the tracker is increased by the amplitude of the noise With magnetic systems interference from other electromagnetic signals can cause the readings to be inaccurate noisy or both Sonic systems react to other sound sources and to the blockage of the line of sight from the source to the sensor Optical systems cease to function when one or more camera views are blocked and mechanical systems will fail to track accurately when extraneous forces are applied to the mechanism Numerous strategies can be applied to eliminate the effects of interference either by eliminating the source of the interference 7 in the case of mechanical or line of sight blockages 7 or by applying appropriate lters to eliminate or counteract expected interference For example some magnetic trackers can lter out CRT vertical retrace interference by attaching a sensor to the side of the offending CRT Typically noise is a high frequency low amplitude signal that can be eliminated by applying a low pass lter Obviously noise in sensor readings is undesirable and the source of the interference causing it should be eliminated if possible If this is not possible the appropriate lters should be used Mechanical trackers are mostly immune to noise and in any case interference is easy to eliminate The magnetic and sonic systems are more prone to interference and in the magnetic case interference is dif cult to eliminate since equipment like CRTs generate plenty of it Mass Inertia and Encumbrance This criterion refers to the mass and the inertia of the sensor and the amount by which the tracking system restricts the freedom of movement of the person or object being tracked Obviously the less the sensor weighs the better Page 36 351 Tracker System Criteria 16th April 2002 351 Tracker System Criteria 37 particularly when tracking lowmass objects like the ngers The sensor itself is not the only concern since any wires that connect the sensor to the electronics unit should not be so short as to restrict movement The ideal tracking system would eliminate wires from the sensor to the electronics box and would weigh 10 grams or less allowing the user free rein The mass of the rest of the tracking system is less of an issue since by de nition the user does not have to wear it Mechanical systems by their very nature have a signi cant amount of mass and inertia to contend with since the mechanical linkage must be strong and rigid enough to afford accurate readings The encumbrance of a mechanical system is also severe since the tracking range of the system has a hard outer limit Sonic and magnetic systems are reasonably the low mass and low inertia but the presence of wires from the sensor can be a bit annoying Sonic systems can replace the sensor wire with a radio or infrared link to the electronics unit although this means that the sensor must contain some sort of power supply Magnetic systems could similarly bene t from a radio link but the link must be designed so as not to interfere with the sensor radio frequencies Optical systems are of low to moderate mass when the sensor is a camera or a beacon worn by the user Optical systems that rely on pattern recognition are zero mass since the sensor is the object itself The sensor could also be a target pattern or ducial painted on the object to be tracked Multiple Tracked Points The nal technical criterion for a tracking system is the ability to track multiple sensors at once within the same working volume This is not simply a matter of buying two trackers and putting the sources close to each other since in all systems there is bound to be some sort of interference that must be dealt with For mechanical systems moving two arms in the same space invites mechanical interference severe enough to damage one or both of the trackers For magnetic and sonic systems some sort of multiplexing scheme must be used to allocate source signals to each sensor so that each sensor gets enough energy to generate a reading The two most common schemes are time multiplexing and frequency multiplexing Time multiplexing or lime slicing simply consists of allocating a sampling period for each sensor so that if N sensors are used N sample periods are used to generate one sample per sensor The advantage of time multiplexing is that it is simple to implement and can be implemented between multiple independent tracker systems using some sort of centralized time allocation signal The disadvantage is that the sampling rate for all of the sensors is reduced by the number of sensors Time multiplexing therefore trades off multiple points against update rate Frequency multiplexing consists of allocating a subset or band of the entire frequency bandwidth to each sensor When a sensor is to be used the source broadcasts the signal within the band of the sensor Multiple sensors can be used simultaneously by broadcasting within each distinct band at the same time The electronics unit applies the appropriate bandpass lter for each sensor to extract the signal intended for the sensor The advantage of this scheme is that it allows multiple sensors without decreasing update rate since all sensors are sampled in parallel The disadvantage is simply a matter of cost since multiple bandpass lters are needed and the electronics unit must be fast enough to handle N parallel position and orientation calculations In optical systems tracking of multiple points may simply be a matter of recognizing more targets or more beacons Multiple targets are visually distinct and so can nominally be sensed in the same image but the computer doing the image processing has to do the work required to detect these extra targets Ideally the tracking system should allow the possibility of multiple sensors although some applications will not nd that more than one sensor is necessary In VR applications 2 to 4 sensors are needed per person One each for the head one for each hand and perhaps one for the torso Multiple sensors should not cause a drop in update rate per sensor although if there is limited communications bandwidth update rate may be limited by data communications 1f extremely low lag is desirable high communications rates and frequency multiplexing are necessary If moderate lag is acceptable then time multiplexing can be used Price Clearly this is not a technical criterion at all although it is at least weakly related to the level of technical complex ity of the tracker system Obviously the lower the price the better but bear in mind that you get what you pay for Typically the more desirable features cost more so choosing a tracker system is a matter of getting your priorities straight and choosing a system that best meets these priorities 16th April 2002 351 Tracker System Criteria Page 37 38 Chapter 3 Hardware 352 Tracking Technologies The previous section brie y introduced the ve main types of tracking technologies which are mechanical mag netic sonic and optical This section will explain the principles of operation of these technologies in some detail Mechanical Trackers Mechanical tracking systems consist of a collection of bars that are linked together in a kinematic chain by either rotating or sliding joints Each joint allows its two neighbor links to rotate or translate by each other in one dimension only That is rotation is about one axis and sliding is along one line Each joint has a onedim ensional measuring device that tells the position of one link with respect to the other The ground link is the base of the mechanism and is xed to something solid like the oor of the room At the other end of the chain of links is the end e ectar or sensor of the tracking mechanism The tracking task is therefore simply a matter of generating a list of rotations and translations derived from the joint readings and link geometry As the joint values change the geometric equations are evaluated to generate a new position and orientation for the end effector These equations are typically products of trigonometric functions like sine and cosine of each joint angle Because of the relative simplicity of the mathematics and the instanta neous availability of all joint values mechanical trackers typically offer low lag and high update rates Resolution in mechanical trackers depends on the resolution of the joint angle encoders and the length of the links of the mechanism Accuracy is also quite good in general because these systems are precisely machined Distortions in the measured space can arise from improperly calibrated link lengths or from looseness in joints near the groun link which will result in errors magni ed by the total length of the mechanism Accuracy and resolution are high est at short range Interference and noise are usually minimal with mechanical systems because the sensors can usually be adequately shielded from electrical interference and mechanical interference is usually easy to x The outer range of mechanical system is strongly xed at the sum of the lengths of the links Exceeding the outer range is impossible without disengaging from the mechanism or breaking something which leads to the chief disadvantage of mechanical systems mass weight and encumbrance Mechanical systems are typically heavy and the user will usually experience inertia that depends on the current mechanical con guration At its worst rotational inertia increases astronomically if the mechanism enters gimbal lack A mechanism enters gimbal lock when an intermediate joint rotates to make an inner joint axis and an outer joint axis parallel When the inner and outer joint axes become parallel a degree of freedom is lost because rotation about the inner axis is also a rotation about the outer axis The mechanism is locked because rotations that would norm ally occur about the outer axis are no longer possible because the outer axis has effectively disappeared The solution is to rotate these inner and outer axes out of parallel which is done be rotating the intermediate axis by some amount Gimbal lock is typically avoided by designing the mechanism in such a way that such rotations are rare Some systems also uses the weight of the mechanism to help exit gimbal lock Magnetic Tracking Magnetic systems consist of a radiator of magnetic energy and one or more sensors Both the radiator and the sensor contain three mutually orthogonal coils of wire The electronics box sends an electrical current to each of the radiator s coils which induces a magnetic eld that permeates the surrounding environment The strength and orientation of the eld varies over the tracking range and this variation is used to determine sensor position The magnetic eld induces a current in each of the three sensor coils which is relayed to the electronics box Typically the radiator is somewhat larger than the sensor in order to make e icient use of the electrical power and to ensure that the sensor can pick up the eld within the speci ed tracker range The mode of operation is for the electronics box to output the signal on the radiator and collect the electrical results from the sensor The electronics box translates these electrical signals into position and orientation information and sends this information down the digital communications channel The electronics box is performing each step in parallel in pipelined fashion Sending sample 1 while translating sample 2 while sensing sample 3 There are two types of systems depending on the type of signal that is sent to the radiator The technical di iculty with any magnetic tracking system is that the magnetic eld of the earth must be compensated for The AC Alternating Current systems send a brief oscillating electrical signal to the radiator This signal is designed to create a nutating rotating magnetic eld that is customized to suit the location of the sensor Because the eld changes the magnetically induced AC current in the sensor can be used to nullify the earth s magnetism by apply ing a lowpass lter to eliminate the nonoscillating Earth s eld This results in lower power requirements and Page 38 352 Tracking Technologies 16th April 2002 352 Tracking Technologies 39 therefore a smaller radiator The disadvantages of the AC system are that one cannot track multiple sensors simul taneously and that the AC nature of the signal causes eddy currents around electrically conductive materials Eddy currents cause distortion in the magnetic eld and the result is tracking inaccuracy In order to use multiple sen sors the electronics unit must dedicate a unique period for each sensor in which it radiates a customized electronic signal for that sensor This time multiplexing scheme typically polls each sensor at a time in roundrobin fashion with the corresponding disadvantage that the update rate for each sensor is divided by the number of active sensors Another way for AC systems to handle multiple sensors simultaneously is to perform frequency multiplexing on the output signal By contrast pulsed DC systems create a xed magnetic eld during the sampling interval unlike the oscillating signal of an AC system To handle the Earth s magnetism DC systems typically leave a blank sensing period per sample to allow the sensor to detect the Earth The DC signal has the advantage that it is suitable for any sensor attached to the electronics box Any number of sensors could be used with a single radiator simultaneously as long as there is electronic equipment to read them The disadvantage of the DC system is that it requires a large radiator because of the large amount of power required to overcome the earth s magnetism at the outer reaches of the tracker range Manufacturers of DC systems claim that the DC system is immune to metal interference but this is not quite true Pulsed DC systems induce eddy currents just at the beginning of the sampling interval but these quickly die down and the sensor values are sampled soon after DC systems are therefore immune to interference from electrically conductive materials but they are not immune to the bulk magnetic properties of large pieces of steel or iron like ling cabinets or desks Neither system is immune to electromagnetic noise from CRTs and other electrical equipment but this is only a moderate di lculty Some magnetic trackers have a special noise detector that connects from the electronics box to a CRT reducing the severity of its effect The user can sometimes avoid this problem by either staying away from CRTs altogether or by putting the radiator closer to the CRT than the sensors will usually be Thus as the sensor gets closer to the CRT the sensor is also getting closer to the radiator so the interference effect is not too severe Some care should be taken not to put the radiator too close because the magnetic eld will distort the CRT s picture 33 34 21 Another disadvantage is that these systems are only moderately accurate A few researchers have tested the accuracy of these systems by putting a sensor at successive locations on a rectangular grid The output of the trackers was compared to the known sensor location to generate an error vector Their ndings were that what the tracker reports as a straight line is in fact somewhat curved with greater errors at greater distances This is probably due to magnetic eld curvature in the local environment Largescale inaccuracies of this nature may be severely problematic in some applications The major advantages with 3D magnetic trackers are that the tracker position range is a hemisphere of at least 5 feet the orientation range is a full sphere and the sensors are very light with long wires For VR applications there are no better choices for hand tracking and only one better choice for head tracking 7 inertialsensor fusion systems Ultrasonic Tracking Ultrasonic systems consist of a set of speakers to radiate ultrasonic energy and a set of microphones to pick this energy up The principle of operation is to measure the time it takes for the sound to travel from a speaker to a microphone To determine position and orientation the principle of triangulation is applied If you know the distance between two beacons you can determine the 2D position of a sensor by measuring its distance from each beacon These three distances form a triangle of known size which can be used to determine the position of the microphone To nd a position in 3D space a third beacon must be used that lies off the line between the rst two beacons For position and orientation 3 speakers and 3 microphones are needed Using only 2 microphones allows the sensing array to be rotated undetectably around the line between the two microphones Typically the source is a large rigid triangle and the sensor is a small rigid triangle Ultrasonic sound is used for three major reasons to make the sensing hardware inaudible to human ears to enhance proper detection and to enhance precision To be assured that the microphones are picking up the intended signal and to reject spurious garbage enough wavelengths of sound information must arrive so that the electronics box can verify its correctness The shorter the wavelength the faster this can be done Short wavelengths also make it easier to determine precise arrival time because the signals peaks or zero crossings are spaced closely in time 16th April 2002 352 Tracking Technologies Page 39 40 Chapter 3 Hardware Acoustic noise generated by CRTs mechanical equipment like disk drives and impact printers and jingling car keys can be a source of tracking error Echoes of the sensing signal off surfaces in the environment can cause a major problem with ultrasonic systems One means of dealing with this is to reject any copies of the expected signal that arrive after the rst one Weaker than expected signals are also rejected Usually these builtin timing and signal strength expectations serve to limit tracking range Another problem with ultrasonic sensing is simply the time it takes for the signal to travel from source to sensor Sound travels at about 1000 feet per second so for distances of 5 feet or less a single burst takes 5 ms to travel to the sensor If each speaker in sequence makes a single burst 15 milliseconds are needed with a corresponding effect on lag Another major problem with ultrasonic systems is that the position range is limited to the hemisphere in front of the source because almost no sound energy is propagated behind the source Similarly orientation range of the sensor is limited to the hemisphere that faces the source because the body of the sensor and possibly the user blocks the sound path to the microphones Multiple tracked points are quite achievable in ultrasonic systems if the source is the speaker array and each sensor holds microphones Any number of microphones can receive the signal from each speaker and transmit this signal to the electronics box An advantage with ultrasonic systems is that they are inexpensive to build If only position is needed then the sensor can be a single microphonespeaker with the source still being 3 speakermicrophones Also ultrasonics operate with slow enough signals that infrared or radio signalling can be used to transmit timing data instead of Wires Optical Tracking There are two broad classes of optical tracking systems One class has the user wearing cameras pointing at objects to be recognized in the environment and the other has cameras in the environment looking at recognizable points on the user In most systems cameras in known positions are pointed at the user who wears some kind of active or passive recognizable items The easiest item to recognize is a single point of light that lights up at a designated time to be instantaneously recorded by the camera The image processing system locates this point in the image and based on the image location determines the line from the cam era out into 3space along which this point must lie A second camera generates a second line and the intersection of these two lines determines the position of the point Nominally any number of points could be tracked this way but the limitation is on how fast the camera and imageprocessing system can determine the location of a point There is also the issue of identifying the same point in each camera Systems using video cameras in which one frame is scanned out every 16ms can track only a few points in real time Another alternative is to be more careful about recognition either by tracking multiple lights in a single image or by recognizing shape or other features of the scene In some less demanding applications such as tracking the user s head location while sitting in front of a CRT recognizing common facial features in a small working volume can be effective 37 Some systems use special imaging hardware such as a lateraleffect photodiode which can quickly deter mine the location of a light The lateraleffect photodiode collects photons along a lightsensitive line generating a signal that represents the centriod of all the light on this line This centroid point along the image line can be used to determine a plane in 3D that is perpendicular to the lateral effect photodiode line and which passes through the cam era and the light With enough cameras oriented appropriately the intersection of each cam era s candidate planes determines the light s 3D location Because lateraleffect photodiodes act very quickly they can be used to quickly generate each camera s candidate plane so many lights can be activated in sequence and located in 3D In fact strict sequencing is required since two lights on simultaneosly will generate a spurious average location 37 The converse scheme where the user wears the cameras requires a recognizable environment An optical headmounted tracking device designed at University of North Carolina Chapel Hill UNC uses a large set of infrared LED beacons mounted in the ceiling As the user moves around the LED s are ashed in sequence and the ones visible to each camera are used to locate the user The system maintains a list of visible LED s so that the entire ceiling does not have to be scanned Each camera contains a lateraleffect photodiode Thus each camera image yields a result very quickly allowing many beacons to be scanned if necessary Since multiple cameras are rigidly mounted together and are pointed towards the ceiling position and orientation can be calculated The disadvantages of this system are the need for the specialized beacon ceiling and the fact that only the hemisphere above the user can be tracked Page 40 352 Tracking Technologies 16th April 2002 352 Tracking Technologies 41 As in all optical systems the resolution of the imaging system is affected by the eld of view of the camera In systems where the cameras converge on the user a wide eld of view for each camera yields a large tracking range but because the imaging device has limited resolution wider elds of view yield lower accuracy When the user is turning quickly in the UNC system a wider eld of view means the currently visible LED s are visible for a longer time allowing the beacon system more time to catch up if necessary Interference in optical tracking systems can arise from two main sources the rst being occlusion of the item in the environment to be recognized and the second being ambient light Infrared LED s are often used so that the camera will not get confused by a point source of light that is not a beacon The main disadvantage with optical tracking systems is the need to wear beacons for truly effective realtime 3D tracking Beacons are needed because they are easy to recognize and in systems that use a lateraleffect photodiode or some similar device recognition is trivial Beaconbased systems can therefore track the position of many points simultaneously If orientation is needed a rigid triangle of beacons is required Without beacons imageprocessing algorithms must operate on the entire image in some way to nd the desired feature Consequently an enormous amount of data must be collected and processed in order to determine position and orientation with corresponding effects on lag However if it can be done the user can operate unencumbered which is a signi cant advantage If only coarse tracking accuracy is required and users are expected not to wear tracking gear optical systems can be quite useful for approximating the gross location of the user Inertial Tracking Unlike the previous technologies that track 3D position Inertial Tracking systems operate by sensing acceleration of the sensor The frame or enclosure of the inertial tracker suspends a mechanically exible sensing mechanism As the tracking sensor enclosure is moved the sensing mechanism s inertia causes it to tend to stay at rest which makes it move with respect to the enclosure This causes the exible elements of the sensing mechanism to ex and the amount of exion is measured When the acceleration that causes the motion ceases the suspension devices pull the sensing mechanism back to equilibrium and an acceleration of 0 is reported When the motion is slowed down deceleration negative acceleration of the enclosure is measured in a similar way When an Inertial Tracker is sitting still it will measure only the gravitational force Forces relate to acceleration by Newton s equation F The mass number M is the mass of the sensing mechanism The position and orientation of the sensor is computed by integrating Newton s equations of motion to transform accelerations into velocities and then velocities into postionsorientations oXclVfoXclX2ch02 13 As each new sample of acceleration A arrives the current velocity V is evaluated and then current position P The constants cl and 52 are initially zero and at each new update of A are set to the previous values of V and P respectively One class of inertial trackers is build with Micro ElectroMechanical Systems MEMS technology in which microscopic mechanical systems are built from silicon wafers using the same silicon etching and deposition pro cesses used in building silicon integrated circuits For sensing acceleration a structure that looks like a diving board is used with a weight at the far end and a thin strip of contuctive material along the length of the diving board As the silicon chip is accelerated perpendicular to the plane of the silicon wafer the diving board exes stretching the conductive material increasing its resistance When the acceleration stops the diving board straightens out again registering 0 acceleration Decelerations ex the diving board in the opposite direction decreasing the conductors resistance The diving board structure is built somewhat wider than it is thick so that only accelerations perpendicular to the plane of the silicon wafer signi cantly affect the diving board exion To build a 3D accelerometer for changes in position one of these diving board chips can be placed on 3 faces of a cube sensing accelerations perpendicular to each cube face To sense rotation about an axis one sensors 51 must be placed parallel to the axis like a ag on a agpole Another sensor SQ must be placed opposite 51 as if the wind were blowing in the opposite direction This allows rotations about the axis to be sensed because rotations will cause accelerations of equal value but opposite sign for each of 51 and SQ If the value of the acceleration on 51 y 752 then some other linear acceleration is taking place which can be easily removed from the computation One source of linear acceleration is gravity and since gravity on Earth is constant the set of orientation accelerometers can detect the direction of the gravity vector To build a 6 degreeoffreedom sensor at least 6 simple accelerometers are used Another type of acceleration sensor is the laserring gyroscope in which a laser is pointed at a target and acceleration 16th April 2002 352 Tracking Technologies Page 41 42 Chapter 3 Hardware The major advantage of inertial tracking is that the source and sensor are located together the enclosure is the source and the exible mechanism is the sensor The MEMS sensors can be packaged quite compactly and so have fairly low mass and inertia Because there is no separate source device the user is free to move anywhere that the wire from the sensor to the electronics box will allow Thus inertial tracking has a very large range If the user also carries the electronics box and its power supply there would be no encumbrance at all and in nite range Unfortunately the most signi cant disadvantage with inertial tracking is drift which is the accumulation of errors in acceleration readings resulting in less and less accurate tracking For orientation readings using MEMS sensors drift is a small issue for three reasons 1 Gravity is always present and a known value This can be used to partially correct orientation errors because a vector representing gravity can be constantly updated During periods of low or no acceleration gravity can be sensed in isolation Unfortunately rotations about the gravity vector cannot be corrected in this way so directional readings N orth South East West will drift more 19 Two sensors are used per degree of rotational freedom so the redundant readings can be used to limit error W When using an inertial tracker to track hum an motion one can assume an upper limit on rotational accelera tions and one can assume that most rotational accelerations will be followed by rotational decelerations For position readings using MEMS sensors drift is so signi cant that the values are accurate only for about one second This is because there is no gravity reference for position and there is no opposite sensor giving the same reading of the opposite sign Laserring gyros are signi cantly more accurate but are signi cantly more expensive One major advantage of inertial tracking is that it directly senses acceleration which can be used to predict motion 1 In the absence of prediction lag is very small because accelerations can be sensed and integrated thousands or more times per second Typically update rates are in the lOOHz range Interference and noise could come from two sources depending on the sensing hardware For MEMS sensors electrical noise in uencing the electrical ex sensors could have an effect on acceleration readings Another source of interference is bumps other momentary highvalued accelerations hitting the sensor which would cause outof range readings on the ex sensors Inertial Tracking With Sensor Fusion Because of problems with drift reasonably priced inertial position tracking is currently unavailable One successful technique of dealing with drift is to combine inertial tracking with another tracking method such as ultrasonic or magnetic tracking The technique of combining multiple sensor types to generate more accurate readings is called sensor fusion In this case the position and orientation accuracy of the magnetic or ultrasonic tracking is supplemented by acceleration information from the inertial tracker Conversely the acceleration data is used to predict the next position which can be used by the ultrasonic or magnetic tracker to more nely re ne the ultrasonic or magnetic signal The disadvantage with this is that such a system no longer has in nite range Another style of sensor fusion is to combine an inertial orientation sensor with a digital compass which pro vides an external reference orthogonal to gravity vector Thus one can build a sourceless orientation tracker that does not drift because the Earth s magnitic eld can be used to correct the buildup of error in rotation about the gravity vector 353 Position and Orientation Tracker Summary In this section we have tried to review the salient position and orientation tracking technologies and their advan tages and disadvantages The magnetic systems are used perhaps the most common because unlike all but inertial systems the orienta tion range is allattitude Similarly the position range is an easily understood radius from the source Resolution and accuracy of these systems is reasonable Lag in these systems is about 4ms or better and the update rate is 60Hz or greater Mass and inertia is negligible with these trackers but the sensor wires induce an undesirable encumbrance Depending on the budget many tracked points are possible Some systems can track 32 or more points The most signi cant drawback is magnetic interference which is generally less severe in the DC systems than the AC systems The magnetic interference causes either noise or inaccuracy Gross inaccuracies induced by Page 42 353 Position and Orientation Tracker Summary 16th April 2002 36 Joysticks 43 local magnetic eld effects can be compensated for by taking tracker measurements of a reference grid at regular intervals throughout the tracker volume This will generate a 3D volume of errors that can be used to generate a correction function that corrects the errors from the tracker Ultrasonic systems do not fare as well in comparison to magnetic trackers except in price Ultrasonics have lower resolution about equal accuracy higher lag lower update rate and slightly greater bulk The position range is typically lower and the orientation range is less than half of the magnetic trackers Interference arises from simply blocking the line of sight and sonic noise can disrupt some trackers However ultrasonics are immune to magnetic effects and the technology is simple and therefore cheap Mechanical systems typically have much better resolution accuracy lag and update rate However the most recent highend magnetic systems can probably compete on these grounds Where mechanical systems fall short is range mechanical interference mass multiple tracked points and price Mechanical systems can be designed to carry other equipment such as displays so the weight disadvantage may have a compensating bene t Optical tracking systems offer the opportunity of no encumbering devices whatsoever The cameras may simply recognize your hand and track it Optical systems offer good resolution and accuracy and multiple tracked points Interference can come from simply blocking the line of sight so an optical system is likely a hemispherical position and orientation range like an ultrasonic system If a specialized lowlag imaging element is used tracking can be quick yielding high update rates and low lag Current commercial systems of this type are quite expensive Conversely if a standard inexpensive video camera is used lag will be at least the image grabbing time 33ms and some object recognition must take place If only a few points are to be tracked then such a system can be inexpensive Inertial systems offer very low lag with in nite range and good accuracy for orientation alone Like magnetic trackers these systems are allattitude Encumbrance is minimal and these sourceless systems offer the possibility of many tracked points Interference may arise from very large momentary forces but otherwise is not an issue and electrical noise can be handled by surrounding the sensor with metal The major disadvantage of these systems is that they are useless for tracking position due to drift Sensor fusion systems that combine inertial and ultrasonic or magnetic tracking offer lowlag tracking from the inertial tracker without drift from the ultrasonic or magnetic tracker The two sensor types compensate for each other s drawbacks allowing better tracking overall 36 Joysticks A joystick is a mechanical tracking system that sits on the desk and is pushed by the user s hand Usually the handle is mounted on a universal joint of some kind and the user grips the handle and rotates it about this central joint Unlike the mechanical trackers described in section 352 a joystick has a very small working volume All mechanical linkages are hidden inside the joystick base and the user is presented with a single handle to manipulate If the joystick is xed to the desk the user can easily nd it without having to look There are three classes of passive joystick devices Isometric Spring Return and Isotonic Isometric joysticks essentially immobile when the user presses on them The word isometric means relating to equality of measure which means in this case that the joystick moves the same zero amount in all directions The spring return joystick does move when you push on it and the spring mechanism resets it back to its central zero point when released Isotonic joysticks have no springs and stay where you leave them when released An isometric joystick can be thought of as a spring return joystick with a very hard spring while an isotonic joystick lacks springs altogether While these three types of devices share comm on characteristics they are not used in the same way Isometric joysticks operate by sensing and reporting the force and torque twist that the user is exerting on it When there is no force the device reports zero Consequently the most suitable control method is to use it as a rate controller That is the user s force indicates the rate at which the associated object should move The user can imagine that the hand is pushing the joystick and this force is being applied to the object resulting in object motion To determine object position the integral of the rate should be taken over the time that the joystick is pushed An isometric joystick in some sense has in nite range because the user can continually move the object as long as the joystick is pressed Rate control also offers a certain amount of noise immunity because the integrating function serves to lter out small variations in force However the fact that the user is controlling rate induces a certain amount of lag because the user cannot stop instantaneously 16th April 2002 36 Joysticks Page 43 44 Chapter 3 Hardware Encumbrance and mass do not apply to these devices because they are typically a unit that sits at a xed location on the desk to be operated by the user s hand Like mechanical trackers only one point can be tracked One class of isometric joysticks has a rubbercoated ball as the main operating element Because this ball remains xed a button can be placed where the user may reliably nd it under the rubber skin Buttons are also supplied on a ridge in front of the ball where the user can extend a nger to press a button Another class of isometric joysticks presents an object like a hockey puck for the user to push and turn with a similar layout of buttons reached by the ngers A common problem with both types of isometric joysticks is that it is dif cult to effect just an orientation change or just a position change One way of dealing with this is to assign a button to position only mode and another to orientation only mode In fairness a lack of isolation between position and orientation is true of other tracking technologies lsotonic joysticks can be viewed as mechanical trackers with sticky joints With isotonic systems position control is the most appropriate as with all the other tracking technologies we have described With isotonic joysticks most of the issues discussed for mechanical trackers apply with the small exception that because they are intended solely for hand control the user can quite easily deal with joystick mass and inertia 361 Force FeedbackJoysticks Perhaps the strongest reason to use a mechanical tracker is to make it into a farcefee mck device With force feedback the mechanical tracker controls a cursor in the HITD scene Forces that are confronted by the cursor are fed back to the user by a set of motors built in to the tracker The simple version of this is the forcefeedback joystick which is a joystick that has a set of motors in the joystick base that push back on the user Force feedback systems have a much higher computational requirement than passive mechanical systems because the force that is being felt by the user must match the force that being exerted on the user If the user s force is suddenly changed the feedback system must be able to sense this Consider the example where the user suddenly lets go of a forcefeedback joystick Until the feedback system reads the lack of an opposing force the joystick motors will continue to move the joystick in the direction of the force accelerating the joystick Eventually the joystick will crash in to something and one hopes that this something is not the user s hand Thus the forcefeedback control system must quickly read and react to changes in its force environment Typically joysticks are controlled by continually sampling the current opposing force and adjusting the force exerted by its motors The sample and control rate is usually 5000 times per second or more 22 Such high sample and control rates are supplied by a dedicated processing system The control system con tinually reads and updates the input and output forces while at the same time communicating at a much lower sample rate to the application The application transmits the goal force direction to the joystick and receives the current force and position from the joystick The control system must also safeguard the mechanical integrity of the system to make sure that the user does not apply too much force and burn out the motors Because joysticks occupy a small amount of desk space and have a small working volume they are easy for the user to trust The user does not have to worry about a joystick arm reaching out and hitting the user 362 Force FeedbackArms A more expansive version of the forcefeedback joystick is the forcefeedback arm Unlike a joystick a force feedback arm has one or more mechanical linkages outside the electronics box The linkages outside the electronics box allow the arm to sweep out an area or volume with a control movement that corresponds directly to the cursor movement For example a 2D joystick rotates about its base making the end of the stick sweep out a spherical surface section A 2D arm s end effector can directly sweep out a 2D at plane Also unlike a joystick an arm can be used in a position control mode Because a joystick rotates about a center a natural interpretation for this movement is rate control Some joysticks are physical models of aircraft sticks which are rate controllers for aircraft A 3D forcefeedback arm is a mechanical linkage that sweeps out a volume of space The mechanism is designed so that force can be directed along any 3D vector while the end effector is in the working volume The user may grip a pen that is mounted on the end effector by a freerotating universal joint The pen or grip may have one or more buttons mounted on it to allow the user to trigger events Like a forcefeedback joystick the forcefeedback arm needs a dedicated controller system to continually input and output force information The computer running an application controls the arm by updating the goal force Page 44 361 ForceFeedback Joysticks 16th April 2002 37 Hand Sensors 45 and receives position and user force updates Both types of forcefeedback devices are capable of generating gross forces and ne tactile simulations As an arm is moved across a simulated surface the force system can quickly vary the force out put of the system which is interpreted by the user s ngers as a texture For this type of simulation very high update rates are essential 37 Hand Sensors The device that probably excited the public s imagination the most during the VR boom in the late 1980 s was the em DataGlove The idea of such a hand sensor is that the shape of the user s hand can be detected by the computer and the computer can interpret the hands shape to generate commands in the application The user makes hand gestures that are interpreted by the computer in the same way that a speaker of American Sign Language ASL uses hand gestures to communicate A complete hand sensing system is made of two parts a 3D tracker and a hand sensor The position of the user s hand in space is typically by a magnetic 3D tracker because of its allattitude range and immunity to line ofsight interference The hand sensor measures the detailed shape of the user s hand There are two classes of hand sensors 7 em Gloves and em Exoskeletons The fundamental idea behind each of these systems is to measure the amount of bend at each joint Each joint in the hand accounts for one or two degrees of freedom in the hands shape Each degree of freedom can be loosely thought of as a revolute joint where one part revolves arount the other 371 Hand Anatomy The human hand is an enormously complex collection of muscles bones ligaments tendons and nerves The hand has 3 major nerves as shown in gure 35 Each nger has 4 degrees of freedom 7 one each at the two distal outer joints and two where each nger joins the palm of the hand Each of the 3 nger joints can exextend When these are all fully exed the hand forms a st When these are all fully extended the hand is at The fourth degree of freedom for each nger is abductionadduction which is a sidetoside motion that can most easily be seen by holding the hand at and spreading out the ngers We will refer to this sidetoside motion by the term adduction The thumb also has 4 degrees of freedom 7 one each at the two distal joints and two where the thumb joins the base of the hand The two distal exionextension are easy to measure The remaining two degrees of freedom at the base of the thumb are somewhate harder to describe and measure With the hand held at one degree of freedom allows the thurnb to move in a adduction motion from beside the base of the index nger to extend at right angles from the hand The other degree of freedom takes the thurnb from a at con guration across the palm of the hand to rest on the palm near the base of the smallest nger producing an arch in the palm as it progresses There are two remaining degrees of freedom at the wrist These are adduction motion and exionextension Rotation of the hand about the axis of the forearm actually takes place along the forearm by rotating the forearm s radius and ulna bones about each other The total number of degrees of freedom in the hand is therefore 2239 4 each in the ngers and thumb and 2 in the wrist Clearly the thumb is the most dif cult to deal with because of its highly exible nature 372 Degrees of Freedom Measured The minimum useful set of degrees of freedom to measure are the whole exionextension of each digit for a total of 5 degrees of freedom Without thumb adduction the software using this data must assume that the thumb is in an opposition con guration so that as the thumb and ngers bend they will meet in a grip posture With 10 sensors a glove can measure both the inner and middle exion of the ngers and the outer two joints of the thumb The outer joints of the ngers are ignored in this con guration and are either measured with their respective middle joints or ignored For anatomical reasons most people cannot bend the outer joint of a nger without also bending the middle joint With 14 sensors a glove can add nger adduction With 18 sensors a glove would gain thumb adduction and palm arch and wrist exion and adduction If the magnetic tracker sensor is placed on the back of the hand the wrist sensors could be dispensed with if forearm con guration is not of interest Because of the aforementioned anatomical oddity of ngertip exion the distal nger joints are viewed as the least important joint con gurations 16th April 2002 37 Hand Sensors Page 45 46 Chapter 3 Hardware Figure 35 Hand Anatomy showing from left to right the in uence of the ulnar median and radial nerves Page 46 372 Degrees of Freedom Measured 16th April 2002 373 Glove Sensors 47 373 Glove Sensors A glove sensor is a glove worn by the user that incorporates some sensing hardware The basic sensing mechanism is a bend sensor which is some sort of exible material that changes its output as it is bent A bend sensor is placed over each joint to be measured at the outer part of the joint on the back of the hand The structure of the glove is constructed to make sure that the bend sensors bend with the nger joints For hinge joints like the 2 distal joints on each nger and thumb bend sensors are effective For the proximal nger joints and the wrist simple exionextension is accompanied by adduction which may affect the accuracy of the bend sensor readings Finger adduction can be measured by a bend sensor that is wrapped around a exible post placed between each nger This measures the angle between each nger instead of the adduction angle that a single nger makes with the hand Adduction of the index nger can be measured by a sensor on the side of the nger Wrist adduction can similarly be measured at the side of the hand nearest the small nger Both wrist sensors must be able to sense bends which have a somewhat larger radius of curvature than the nger bends Bends at the base of the thumb can be sensed by measuring palm arch which is the curvature in the palm that arises as the thumb crosses over the palm to rest on the base of the small nger Thumb adduction can be measured by an adduction sensor between the thumb and index nger Bend Sensor Types In the DataGlove a length of optical ber was looped over the joint of interest In the region where joint bend was to be sensed the surface of the ber was scratched to allow light to leak out of the ber as it was bent Both ends of the loop were brought to an electronics box where one end of the ber was attached to a LED and the other was attached to a photodetector As the joint bent the light detected by the photodetector decreased nonlinearly The major problem with this system was that the scratching treatment was nonuniform from ber to ber so some sort of calibration was required Moreover the purpose of the scratch was to create cracks in the surface of the optical bers glass As the bers were bent over and over these cracks propagated completely across the ber rendering it useless as a bend sensor A much more robust system uses electrically resistive material that is sprayed on a exible substrate such as Mylar As the substrate is bent the resistive material stretches out increasing the resistance in a predictable way As long as the bend sensor is not bent so acutely that it exceeds the material s strain limit and breaks the sensor the sensor will take a long time to wear out The predictable resistance allows a minimal level of calibration because the sensor will usually operate within designed limits 374 Exoskeletons An exoskeleton is an apparatus that is clamped to the user and is used to measure joint angles Unlike a glove which ts the user as a piece of clothing and embeds measurement devices in the cloth of the glove an exoskeleton places the measuring equipment on an articulated frame above the user The articulated frame is the shape of a hand and consists of a collection of links connected by somewhat complex joints Using a soft clamp each link connects to the the corresponding link on the hand such as a nger segment or the palm Each joint connecting two of these links measures one or two revolute degrees of freedom with a rotation encoding mechanism These angle encoders are accurate and precise and can be designed to return the accurate bend angle regardless of the user s hand size The advantage of an exoskeleton is that it offers high accuracy from wearer to wearer assuming it is worn correctly In addition an exoskeleton can also be used as a base for force feedback since each joint encoder could be supplemented with some sort of actuator that sets the joint angle to be a particular value An exoskeleton provides the rigid base that allows the forces to be delivered to the desired places The disavantage to an exoskeleton is that it can be cumbersome to put on and take off and it occupies a certain volume of space above the back of the hand Users may temporarily forget the exoskeleton s bulk and do an automatic motion such as putting their hand in their pocket or wiping their brow The price paid for the inconvenience of encumbrance should be worth the increase in performance Medical assessment of patient s hand exibility and range might be an example application that nds the exoskeleton s accuracy is useful For applications that require only good relative hand measurement an exoskeleton may deliver more accuracy than the user needs Some applications may simply need good measurement resolution and not accuracy A suitable calibration process may be able to make such repeatable systems accurate 16th April 2002 373 Glove Sensors Page 47 48 Chapter 3 Hardware 375 Contact Sensors One assumption built into the glove sensor idea is that nger bend angle information is suf cient to detect when the thumb touches any of the ngertips The problem with this idea is that it depends on having an accurate physical model of the user s hand including nger and thumb segment lengths To avoid this issue entirely a glove can have electrical contacts built into the tips of the ngers and thumb that detect contact between the digits of the hand These contacts are simply switches that can be directly activated by pinching a nger and thumb The advantage if these sensors is that they are direct The user knows immediately when nger and thumb touch and can rely on the application being able to correctly detect this The alternative 7 interpreting nger bend angles to determine nger contact 7 is fraught with error In general recognizing a hand gesture or posture is dif cult A disavantage of contact sensors is that the user may touch the gtertips together without intending to activate an event The problem is simply that the glove cannot be dropped for a moment while the user attends to something else A glove must be removed which takes time and mental effort Contrast this with the keyboard and mouse which can be ignored at a moment s notice while other events occupy the user s attention A related issue is the convenience of a glove or exoskeletion while using a keyboard or mouse A glove with open ngertips allows the user to type write and grasp objects while wearing the glove A contact sensor glove will cover much of the nger and thumb tips reducing ngertip sensitivity making it more dif cult to manipulate objects An exoskeleton has the previously mentioned problem of making the hand much larger and more fragile and increases the time to engage or disengage from the computer system Page 48 375 Contact Sensors 16th April 2002 Chapter 4 System Architecture In this chapter two high level models for HITD user interfaces are presented These models provide a general framework for the implementation of these user interfaces The next section provides a more detailed architecture of HITD user interfaces that divides an individual application into a number of components The architecture issues discussed in this section mainly deal with the support software that is required by all applications while the issues in the next section deal more with the individual application programs Put another way the typical programmer will be aware of the issues presented in the next section but not necessarily those presented in this section The Cognitive Coprocessor Architecture developed by Robertson Card and Mackinlay 27 addresses two key problems with the development of 3D user interfaces These issues are 1 Multiple Agents A HITD user interface often has several agents operating simultaneously Two of these agents are the user and the application program In a HITD user interface the user needs to continuously interact with the user interface for example continuously tracking head motion the user can t wait for the application to nish a long computation before it responds to the user s motion Similarly the application shouldn t wait for the user before it performs any of its functions Both of these processes should be executing in parallel N Animation The images presented to the user must evolve smoothly as the interaction with the user develops The display must be updated at a uniform rate of at least 10 frames per second to show uniform motion within the environment Without this uniform motion the user no longer has the feeling of interacting with a simulated env ironm ent The Cognitive Coprocessor Architecture shown in gure 41 presents a solution to these problems The main component of this architecture is the animation loop that coordinates the execution of the other components in the system When the user interacts with the user interface his or her requests are added to the task queue These requests are either for the application or other parts of the user interface such as 3D navigation or interaction objects Similarly a display queue contains the information to be displayed to the user In each cycle of the animation some or all of the information in this queue is transferred to the display so the user can view it The animation loop provides a way of time slicing the execution of the components of the system On each cycle of the animation loop each component of the system is allocated a time slice The size of this time slice depends upon the amount of work to be done Figure 41 The Cognitive Coprocessor Architecture The governor is responsible for allocating the time slices to the components and ensuring that the whole system meets its performance goals For example if we want to have 10 updates per second the governor ensures that in each second there are 10 cycles of the animation loop At the end of each cycle the governor examines the amount of time that was required to complete the cycle If this time was less than 01 second the governor waits until a complete 01 second interval has past If this time is greater than 01 second the governor adjusts the time slices for the system components in such a way that on the next cycle the system will be closer to the desired 01 second cycle time This may mean that the application will not have as much time to compute on each cycle or it may 49 50 Chapter 4 System Architecture take more cycles to process some of the user s requests Note that the governor doesn t guarantee that every cycle will be exactly 01 second just that on average there will be 10 cycles per secon The Cognitive Coprocessor Architecture deals with the case of a single processor that must be time sliced The decoupled simulation model extends these ideas to multiple processors Most HITD user interfaces are im plemented on multiple workstations or on operating systems that allow multiple processes In this environment a single application can be divided into a number of processes that avoids the problem of one part of the application forming a bottleneck for the rest of the application It also allows several parts of the application to compute in parallel thus reducing the cycle time In a single process it is very tempting to implement the main loop of a HITD user interface in the following way WHILE true DU BEGIN Read Input Devices Compute one time step Update Display END This implementation approach wastes a considerable amount of time For example reading a single input devices can take a considerable amount of time on the order of 005 seconds If there are several devices to read 015 seconds can easily be consumed in this part of the loop alone In the case of the tracker on a headmounted display this value isn t needed until the display is updated therefore reading its value could be overlapped with the execution of the application Similarly the reading of all the input devices could be overlapped instead of being read sequentially The best performance is achieved when as much of the application is executed in parallel as possible Figure 42 The Decoupled Simulation Model The decoupled simulation model 30 29 shown in gure 42 takes advantage of multiple processes possibly on multiple processors to increase the amount of parallelism in an application In this model the application is divided into three types of processes which are 1 Computation This type of process performs the main computational tasks in the application This type of process requires no interaction with the user and communicates with the user interface process at xed points in the computation These communications consist of current results to be displayed to the user and the recent requests from the user 2 Server Each server process is responsible for one of the input or output devices used by the application The server process handles all the time critical components of the interaction with the device In the case of input devices the server process continuously polls the device for its most current value When the user interface needs the current value of the device the server can provide it immediately without waiting for the time required to sample the device L User Interface This component is responsible for coordinating the application It receives input from the computation processes and input devices determines the user s request and the computations requirements and generates the output to the user This process handles the synchronization of the other components in the user interface In an application there is one user interface process typically one computation process and several server processes one for each device used in the application Each of these processes executes at its own rate and has its own cycle time that is largely independent of the other parts of the application In the case of server processes they can continuously process the device information they don t need to synchronize with other parts of the application The user interface process receives data from the computation process when new data is available it doesn t wait for the computation process to produce the data Similarly it requests new device values from the input servers and since they are continuously polling the input devices they can respond immediately with this information Thus the user interface process doesn t wait for signi cant amount of time for any of the other components of the Page 50 16th April 2002 51 application allowing it to process user requests as quickly as possible The only process that may need to wait is the application process when it needs new information from the user Since the computation process is not part of the realtime interaction loop this is not a major problem The main advantages of the decoupled simulation model are l N W Lquot Maximize the Update Rate In the user interface process there are no blocking conditions therefore it can produce images as fast as possible Since most of the operations occur in parallel the maximum amount of computing can occur in each cycle Reduce Lag Each of the server processes maintains the most current value of the input device it is respon sible for and it is continuously polling the device for new values As a result the user interface process can immediately obtain the most uptodate value without waiting for a device sample time In addition since the server has its own cycle time it is easy to implement good predictive ltering the samples occur at equal time intervals Both of these conditions decrease the lag in the complete user interface Slowest Component Doesn t Set the Rate for the Entire User Interface Each process has its own cycle time and minimal synchronization requirements with the other processes As a result the slowest process doesn t slow down the other processes in the application For example if the computation process has a long cycle time the user interface can still respond to the user immediately Easier to Distribute the Application This model ts nicely into a distributed computing environment since there are well de ned communications between the processes that don t occur frequently Each process in an application can run on a separate workstation without a noticeable performance penalty Develop Components Independently Since each component is a separate process it is easy to develop and maintain the components separately The server processes are usually provided by the support software therefore the application programmer doesn t need to develop them Only the computation and user inter face processes need to be produced and they can be produced independently of each other Similarly the server processes can be updated to include better algorithms without changing or recompiling the rest of the application code One of the major problems that occur in 3D user interfaces that doesn t occur in 2D user interfaces is the large number of coordinate systems that are used in even simple applications Each device in a 3D user interface has its own coordinate system In the case of the Polhemus Isotrak the coordinate system is centered on the source and the coordinate axis are aligned with the body of the source If more than one Isotrak is used in the same application they will report different values for the same position in real physical space If one of the devices is moved to a different position or orientation the coordinates that it produces will also change In addition each site will have its devices located in a different physical position with respect to the room environment If the application itself deals with raw device coordinates there are severe problems The three main problems are l N L Multiple Coordinate Systems The application must keep track of the coordinates for each device and be able to perform transformation between these coordinate systems and the coordinate systems used in the application Device Recon guration Each time the positions of the devices change the application code must be updated to re ect the new position of the devices This can involve a considerable amount of work and can be error prone if a large number of application must be changed Application Distribution It is dif cult to distribute applications to different sites since they have different device con gurations The application must be changed at each site to re ect the local device con guration This makes it impossible to have shrinkwrapped applications that will run correctly immediately after being installed These problems can be solved by using a workspace mapping The situation for a typical application is shown in gure 43 As can be seen from this gure there are three types of coordinate systems used in a 3D user interface These coordinate systems are 1 16th April 2002 Device Coordinates Each device has its own coordinate system that is used for the raw values from the device This coordinate system is called a device coordinate system There are many device coordinate systems in an application one for each of the devices that it uses Page 51 52 Chapter 4 System Architecture coordinates i 1 1 21 1 i device 2 coordinates deVIoe 1 1 coordinates Yz i 1 1 i i i i i 1 i i 1 YE I i room 1 1 coordinates i i Y Z i R i R E s x 1 1 i environment 1 i 1 XE 0E Figure 43 Workspace Mapping Page 52 16th April 2002 41 Environment Design 53 2 Room Coordinates The room coordinate system is a common coordinate system that all the device coordi nates can be converted to This coordinate system is based on the room that the devices are located in One corner of the room is selected as the origin of the coordinate system and the two walls and the oor that meet at that corner are used as the coordinate planes L Environment Coordinates The environment coordinate system is the coordinate system that is used in the application The programmer is responsible for de ning this coordinate system Workspace mapping handles the transformations between these coordinate systems The mapping between device coordinates and room coordinates is independent of the application and doesn t change very often The position and orientation of each device in the room can be stored in a common le that is called the workspace le This le is created when the system is rst installed and modi ed each time that one of the devices is moved or a new device is added to the system When an application starts the support software reads the workspace le and automatically constructs the mappings between the device coordinate systems and the room coordinate system These mappings can be used by the support software and the application needn t be aware of them None of the application code needs to be changed when one of the devices moves An example of setting up the workspace le for a sample device con guration is presented in Chapter 7 The mapping between room coordinates and environment coordinates is divided into two steps The rst step consists of mapping the origin of the room coordinate system to the origin of the environment coordinate system This can be done in two ways either by specifying the position of the origin of the environment coordinate system in the room coordinate system or by specifying the position of the origin of the room coordinate system within the environment coordinate system The second step in the mapping speci cation is to specify a rotation and scaling of the environment coordinate system with respect to the room coordinate system This scaling is speci ed in terms of how many units in the environment coordinate system correspond to a meter in the room coordinate system The orientation of the environment coordinate is determined by specifying the vectors that the X Y and Z axis of the room coordinate system map onto The examples in the following chapters show how the mapping from room coordinates to environment coordinates is determined in a number of example applications Splitting the speci cation of the room coordinate system to environment coordinate system mapping into two steps simpli es navigation through the environment While an application is running the scale of the environment rarely changes and most of the time the orientation of the environment with respect to the room doesn t change Thus this part of the mapping can be speci ed once at the start of the application The user can be moved through the environment by moving the origin of the room coordinate system within the environment coordinate system Since the user occupies a small volume within the room coordinate system by translating the origin of this system within the environment the environment position of the user can easily be changed 4 1 Environment Design There are three key issues that must be considered in the design of any HITD user interface These issues are the level of immersion in the user interface how the environment is structured and object design Each of these issues are brie y discussed in this section before presenting a more general discussion of the environment design process The level of immersion can vary from a single screen on a graphics workstation to a headmounted display with separate images for each eye The type of immersion that is used for a particular application depends upon the nature of that application Single screen approaches are well suited to applications that require high resolution or access to devices outside 0 e 3D environment A good example of this type of application is mechanical computer aided design where resolution is much more important than being immersed in a 3D environment Head tracking can effectively be used with single display environments The images that are displayed on the screen depend upon the position of the user and his gaze direction In the design application the user can easily more his or her head to get different views of the object being designed This greatly increases 3D perception and gives the designer the ability to determine the most useful view for each design operation In this level of immersion the user is typically seated in front of the display and thus has a limited range of motion This limited motion range can be used to our advantage since it places fewer demands on the tracker technology than a full headmounted display For example video tracking with a single camera is a possibility since the head won t move in arbitrary ways We know that the user s head will always point towards the display screen and if it isn t pointed in that direction the user has been distracted and is not currently interacting with the application This has 16th April 2002 41 Environment Design Page 53 54 Chapter 4 System Architecture often been called sh tank virtual reality since the user is positioned in front of the environment gazing into it in much the same way that a person would view the contents of a s ta Stereo can be used with the single screen approach and is particularly effective when head tracking is also used The use of stereo can increase the perception of 3D but current stereo technology can reduce the effective resolution of the display This again gives rise to a tradeoff between immersion and resolution Another modi cation of the single screen approach is to use projection video The main advantage of projection video is that it covers a wider range of the user s eld of view This is not the case with other types of single screen systems where they user can view objects that on either side of the screen The use of projection video gives the user more of a feeling of immersion since his or her entire eld of view is covered by computer generated graphics The illusion only holds while the user is facing the screen if the user s head turns to either side the real world will again be visible Some systems solve this problem by using multiple screens In this way the user is surrounded by projection video no matter which way he or she turns there will be a projection screen Head tracking can also be used with this approach to generate images that respond to the user s current head position and gaze direction Stereo can be used with some types of video projectors to increase the illusion of immersion In many ways the ultimate is immersion is the headmounted display A headm ounted display responds to the user s head position and gaze direction as the user moves the images in the headmounted display are updated to re ect the position and orientation of the user With a headmounted display the user can be completely surrounded by computer generated images and the real world can be completely blocked out While this may give the ultimate in immersion in does have some problems Currently headmounted displays don t have the same resolution as stationary CRT s making them unsuitable for applications that require high resolution Most HMD s are fairly heavy this can cause head and neck fatigue if the HMD is worn for a signi cant period of time Finally since the user s view of the real world is blocked by the HMD the user is unaware of obstacles in the real world Thus the user could possibly trip over furniture and cables in the real world The design process for a HITD user interface depends upon the application domain In our work we have encountered three main application domains which are 1 Design Applications in this domain are concerned with the production of new objects using some type of design process The typical applications in this domain are computer aided design applications where the user produces the design of a new object based on a set of primitives and operations on existing objects This application domain is characterized by one or a few key objects which are the user s main concern and all other objects in the application assist in the construction or manipulation of these key objects 2 Simulation A simulation application consists of a set of objects that interact with the user and each other These applications are characterized by a set of objects that have their own behaviors and are capable of responding to actions in the environment Some of the typical applications in this domain are simulation training games and art When most people talk about VR it is this application domain that they have in mind L Visualization The main thrust of this application domain is the visualization or exploration of one or more objects These applications typically have very little interaction with most of the interaction restricted to navigation and typically there is little or no interaction with the objects in the environment Typical applications in this domain are scienti c and information visualization and walkthroughs In some ways this is the easiest type of application to design since the objects don t need to respond to the user and in other ways its the most dif cult since the designer must produce a visual representation of the object conveying all the required information As can be seen from this discussion the main component of a HITD application is a set of objects These objects have 3D geometry and they are capable of responding to the events that occur in the environment The design of a HITD application can be viewed as a two step process The rst step is the design of the objects that appear in the application called object design The second step is the construction of the environment from to set of object that appear in it called environment composition The object design step is the easiest one to understand so we will start our discussion with it The types of objects that are designed depends upon the type of application For design applications the objects are the tools that are used to build the object being designed They can be viewed as a set of widgets that the user interacts with in order to build a new object For these types of objects we need to specify their geometry or how they will appear in the environment how they react to the user and how they effect the object under design In many ways these objects can be viewed as 3D analogues of the 2D widgets that are currently used in 2D GUT s Page 54 41 Environment Design 16th April 2002 41 Environment Design 55 Object design for simulation applications is more complicated in the sense that we must design both the ge ometry and behavior of the objects In most cases the objects correspond to objects that occur in the real world therefore the design of their geometry is a nontrival task The object designer must produce geometry that con tains all the important components of the object looks real enough and at the same time is not too complex to adversely effect the update rate of the application Since the user is free to interact with these objects in any way they must be prepared to respond to any of the user s actions This implies that each object must have a range of actions that it can perform The object designer must determine what these actions are and specify how the object performs these actions For visualization applications the main concern in object design is producing its geometry or how it will appear in the application If the object corresponds to a physical object or an object that can easily be interpreted as a physical object the object design process is easier The designer converts the physical description of the object into the geometry that can be used for its visual representation The designer still has the problem of producing the geometry and ensuring that it isn t too complex to adversely effect the update rate If the information doesn t have a physical interpretation the geometrical design problem can be more complicated since the object designer must invent some way of converting the information into a geometrical form Once the objects have been designed the environment composition step can be started In this step the applica tion designer places the objects in the environment In visualization applications there are typically a small number of objects for example in a building walkthrough application there is really only one object the bulding itself The environment composition step for this type of application is typically quite simple and the designer usually doesn t think of it as a separate design step In the case of simulation applications the designer selects the objects that will be used in the application positions them within the environment and speci es how the objects will interact with each other For this type of application the designer needs a set of tools that assists with object placement and specifying their reactions For design applications the environment composition step can be more complex since the objects will be used to construct new objects In some cases it won t be possible to give initial positions to some of the objects since their position and orientation will depend upon the object being designed In other cases the support objects must be created as the object is being designed since they are responsible for modifying aspects of its design This dynamic nature of the object composition makes the environment design task more complicated 1n the following chapters we will discuss some of the techniques used to design object geometry and behavior interaction techniques that can be used in design applications and some of the tools that we have developed to assist with the application design process 16th April 2002 41 Environment Design Page 55 56 Chapter 4 System Architecture Page 56 41 Environment Design 16th April 2002 Chapter 5 Interaction In 3D A rich set of interaction techniques have been developed for 2D user interfaces These interaction techniques include menus buttons and scroll bars What are the equivalent interaction techniques for 3D user interfaces At this stage in their development there isn t a well de ned set of interaction techniques for 3D interaction We are just beginning to determine the important interaction tasks and until these tasks are well understood it is very dif cult to develop optim al interaction techniques This chapter presents a survey of 3D interaction tasks and the interaction techniques that have been developed for them The rst section of this chapter discusses 3D interaction tasks and their important properties The following sec tion examines how the hand can be used in 3D interaction techniques This section surveys the various techniques that have been used for recognizing hand postures and gestures the remaining section address the interaction techniques that have been developed for the different interaction tasks The design of these interaction techniques depends upon the available input devices and the interaction style The interaction devices can be divided into two categories Three dimensional devices that provide at least three degrees of freedom and two dimensional devices that are restricted to two degrees of freedom Ideally all VR applications should use three dimensional user interfaces but there are still applications that can t assume that availability of these devices and a number of interesting 3D interaction techniques have been developed for 2D devices The third section examines interaction techniques for comm and speci cation and the next section examines interaction techniques for obj ect selection These two interaction tasks also occur in 2D user interfaces providing a background for the development of these interaction techniques The fth section describes interaction techniques for navigation moving the user through the environment This is an interaction task that doesn t appear in most 2D user interfaces The nal section in this chapter deals with the use of 2D interaction techniques in 3D user interfaces There are some interaction tasks that are naturally two dimensional and the best way to handle them is to embed 2D interaction techniques in the 3D user interface 51 Basic Interaction Tasks Before interaction techniques can be described the underlying interaction tasks they support must be understood While each application area has its own set of interaction tasks there are number of interaction tasks that are com mon to a wide range of applications In this section these common interaction tasks are identi ed and described These interaction tasks form the basis for the interaction techniques described in the remaining sections of this chapter One approach to identifying comm on 3D interaction tasks is to consider the interaction tasks that occur in 2D user interfaces and see if they generalize to 3D While this won t produce a complete list of 3D interaction tasks it provides a good starting point and includes tasks that are well understood in the 2D domain There are two common interaction tasks that occur in almost all 2D user interfaces These operations are list selection where the user selects an item from a list usually a command to be performed and object selection where the user selects an object to operate on Once an object has been selected the user wants to perform operations on it This can be done by either selecting commands that operate on it or by using some form of direct manipulation Thus a third interaction task is obj ect manipulation The list selection operation is usually performed by some form of menu in 2D user interfaces Most of these 57 58 Chapter 5 Interaction In 3D menus contain commands but they could contain options or settings such as font size or style Most of the following discussion is in terms of command selection but the same basic techniques can also be used for selecting items from short lists In list selection there is a set of operations that can be performed at the current point in the interaction and the user must select one of these operations In 2D user interfaces various forms of menus are used to support this interaction task In 3D user interfaces this interaction task still involves selecting one operation from a list of possible operations but there are several additional issues that must be addressed The rst issue is where the interaction technique should appear In 2D user interfaces this is a simple issue since the user is always looking at the screen but in many 3D user interfaces the users are free to move their heads and move through the environment If the interaction technique has a xed position the user may not be close enough to it when he or she wants to interact with it or may be looking the wrong way The second issue is whether the interaction technique merges with the environment or appears as a separate entity in its own space In a 3D user interface the application tries to maintain an illusion of 3D so if the interaction technique appears in its own space this illusion could be broken On the other hand if the interaction technique is merged with the environment it could be hard for the user to spot it and select it for interaction The third issue is how does the user interact with the interaction technique Does the user point at the desired command grab the command or use some form of gesture The selection approach used will often depend upon the interaction style used in the user interface Object selection involves the selection of one of the objects in the environment In 3D user interfaces this interaction task is more complicated than in 2D user interfaces In 3D the object to be selected could be some distance from the user or be at least partially obscured by other objects in the environment That is in a 3D user interface the ability to select a particular object depends upon the user s current position and the other objects in the environment This usually isn t a consideration in 2D user interfaces There are two parts to the issue of distance between the user and the object to be selected The rst is whether the user can see the object to be selected since the ability to see an object often decreases with the distance to the object For example the object could be culled by a time critical rendering system or its screen area could be small enough to make identi cation dif cult Even if the user can easily detect the object to be selected selection at a distance can still be dif cult due to the small size of distant objects For example if the user must point at the object it can be quite dif cult to hit the desired object and not other ones that are close by Even if the object is close to the user it could be obscured by other objects in the environment This could take the form of being partially obscured so only a small part of the object is visible In this case it might be dif cult to accurately selected the desired object without selecting the object that is obscuring it If the object is completely obscured it will be very dif cult to select it by pointing at it The raises the question of why would the user want to select an object that isn t visible In 2D user interfaces this isn t an issue but in 3D user interfaces it can be quite important Since users is free to move through the environment they will be aware of objects that are not visible from their current position One obvious solution is to move to a position where the object is visible but this involves time and effort on the part of the user which may not produce an acceptable user interface A task related to object selection is point selection The purpose of this task is to select a 3D point This point could be used to de ne the position of a new object or be used to de ne part of its geometry In 2D this is quite a simple task the user moves the mouse to the desired point and then clicks a button This approach doesn t work well in 3D due to the problem of assigning the appropriate depth to the point In 2D its very easy to see where the point will appear by examining the position of the mouse cursor but in 3D this isn t as easy since the point s depth is quite often obvious Basically the user is trying to select a point out of thin air and their are not aids to show its relative position with respect to other objects in the scene The secret to this interaction task is providing the right type of feedback so the user knows exactly which point he or she is selecting Quite often cross hairs or shadows are used to provide this type of feedback Sometimes an orientation is also required which further complicates this interaction tasks One way of showing the orientation is to use a cursor with easily identi able x y and z axis The makes it easier to see which way the tracker is pointing This task can also be divided into two subtasks where the user selects the position rst and then selects the orientation Once the object has been selected there is a range of operations that can be performed on it The manipu lation operations can be divided into two groups The rst group geometric manipulations change the object s geometry These are the drag and stretch operations that are common in 2D user interfaces The second group nongeometrical manipulations deal with other objects properties such as colour mass and material that don t have an geometrical representation These properties are often controlled by property sheets in 2D user interfaces The geometric operations are often based on a direct manipulation style of interaction Common operations that can be performed in this way include moving the object changing its orientation and changing its shape or size All these operations rely on selecting on or more points on the object and dragging these point to a different Page 58 5 1 Basic Interaction Tasks 16th April 2002 51 Basic Interaction Tasks 59 position For example to move an object to a new position the user selects a point on the object and then moves this point to its new position This task suffers from some of the same complexities as the object selection task since the position the object to be moved to could be obscured or occupied by a different object In the case of the position already being occupied a decision must be made on whether the movement will be disallowed or whether other objects will be pushed out of the way by the motion Similarly the sizing operation in 3D can be more complex than its 2D version In 2D two main approaches have been taken to resizing objects One approach in to enclose the object in an axis aligned box and then drag the sides or corners of this box to change the object s size The other approach is to allow the user to directly drag the points that de ne the objects shape In the case of polygonal objects this involves dragging the object s vertices and in the case of curves dragging the control points In the case of 3D both of these approach are more dif cult since there are more degrees of freedom to deal with and the problem of the object obscuring part of the interaction In the case of the enclosing box technique the box now has 6 faces 12 edges and 8 points that can be dragged and some of these handles will be behind the object so the user won t be able to easily see and reach them Similarly for vertex based resizing there will be far more vertices to be moved and again the object will obscure some of them Many of the objects found in 3D user interfaces are more complex than the ones used in 2D ones Its not unusual to have objects with thousands or hundreds of thousands of vertices With these objects it is impossible to modify their geometry one vertex at a time Higher level interaction techniques are required that modify regions of the object so the user doesn t need to work at the vertex leve A related interaction technique is object examination With a 3D object only part of the object is visible at any point in time To completely examine the object the user most be able to move around it The object examination task allows the user to view the object from different points of view and then returns the object to its original location when the task is nished The user may also want to examine the interior structure of the object which may require the use of cutting planes or transparency Selecting the portion of the object to be viewed is an important part of this interaction task Nongeom etrical manipulations are slightly more dif cult since there isn t an obvious geometrical representa tion for the properties to be modi ed Property list as found in many 2D user interfaces could be used for this task but they may not mesh well with 3D objects in the user interface Another approach is to use a surrogate object that can represent the property in a geometrical form A good example of this is using colour cube to select the colour for an object The colour is represented by a position in 3D space which is then mapped onto the object s colour property For some properties it is quite easy to design a good surrogate object and for other is can be quite dif cult so this is a technique that won t always be useful There are a number of tasks that appear in 3D user interfaces that either don t appear in 2D user interfaces or are not a frequent component of these user interfaces The main task that falls into this group is navigation In a 3D user interface the user must be able to move through the 3D space to obtain different views of the environment or reach the objects that he or she wants to operate on While it can be argued that navigation also occurs in 2D user interfaces its role is not near as important There are a large number of 2D user interfaces that don t support navigation but very few VR applications would function without it The navigation task can be divided into two subtasks local navigation and global navigation In local navigation the location the user wants to move to is close by and the user can either see the location or can indicate the exact position in some other way In local navigation at least conceptually the user follows a path from the current position to the new position and passes through all the points along this path The purpose of local navigation includes viewing an object from an different perspective or moving to a currently visible object in order to perform some operation on it Global navigation deals with moving to positions that aren t currently visible or knowing ones position with the context of the application This aspect of navigation is closest to that found in 2D user interfaces For this type of movement the user must be provided with some way of determining the location to move to or conversely determining his or her current position There are several approaches to this problem including presenting the user with a map of the environment scaling the user so he or she can see a larger portion of the environment or allowing the user to wander through the environment and construct their own mental model of the environment Another task that isn t common in 2D user interfaces is object construction This task is used to construct a composite object from several simpler objects For example the user might want to combine a body head and lirnbs to forms an avatar for his or her character in a multiuser environment In order to do this the user must select the individual body parts for example from a menu and then assemble them The assembly process is where the problems start to appear The different parts of the avatar must be connected so there are no gaps in the body and at the same time they shouldn t penetrate each other In the case of avatar construction this can be handled by 16th April 2002 51 Basic Interaction Tasks Page 59 60 Chapter 5 Interaction In 3D a set of constraints that control how body parts are assembled For example the legs must be attached to the lower body and these attachem ent points can be preprogramm ed This greatly simpli es the interaction since the user only need to drag the part to the general area where it must be attached In the case of more general objects this approach may not work since it may not be possible to determine all the possible combinations or parts when the user interface is designed In this case a more general mechanism for easing the combination of parts is required One way of doing this is to de ne a set of constraints for each edge and face of the objects and use these constraints for simplify the common alignments between objects This is a task that hasn t been thoroughly explored and there is plenty of room for new interaction techniques in this area In summary the following basic interaction tasks have been identi ed r List Selection Object Selection Point Selection 39 Object Manipulation U Navigation Q Object Combination The interaction techniques that can be used for these tasks are described starting at section 53 52 The Nature of Hand Based Interaction Most interaction in 3D user interfaces are performed using the hands in some way This isn t always the case for example some navigation techniques use body motion but is frequent enough to warrant a special section on how the hand is used for interaction A careful examination of hand interactions shows that the hand provides at least three types of information position direction and events The position data is a 3D point is space that corresponds to some position on the hand The exact position depends upon the type of input device used In the case of gloves the position tracker is usually attached to the back of the wrist while other interaction devices may be grasped by the hand so their position corresponds to the center of the palm Regardless of the exact position this piece of information allow us to locate the hand within 3D space The direction speci es the direction in which the hand is pointing This information is important for a number of selection tasks where the user must point at the object to be selected It is assumed that the direction provided by the interaction device is a unit vector pointing along the hands direction While tracking devices provide many different orientation formats they can always be converted into a unit vector along the hand direction Finally most hand based interactions depend upon events generated by the hand These events could be generated by buttons mounted on the tracking device or gestures recognized by a glove Glove devices are capable of producing a wider range of information They can measure some or all of the nger joints and this information can be used in interaction techniques The position and orientation reported by the hand tracker is in the tracker coordinate system and before it can be used this information must be converted to the environment coordinate system The method for doing this is outlined in section 43 The method is basically the same as converting the head tracker coordinate system to the environment coordinate system A transformation from tracker 439 t to room 439 t is t t 4 based on the position of the tracker s source Then the transformation from room coordinates to environment coordinates is produced This transformation is the same as the one used for converting head tracker data from room coordinates to environment coordinates The matrices for these two transformations are multiplied together to produce the matrix for the complete coordinate transformation This transformation is applied to both the hand tracker s position and orientation The resulting position can now be directly used in the environment Most tracker produce either a quaternion or a matrix to represent the trackers orientation This information must be converted into a vector pointing in the direction of the user s hand Once the tracker s orientation has been converted to environment coordinates this is a fairly easy task The tracker s sensor the part attached to the user s hand will have a coordinate system and this coordinate system is documented in the manufacturer s literature For example in the case of Polhemus digitizers the x axis points in the opposite direction of the sensors cord That is the sensor s cord exits the sensor along its negative x axis Based on this a vector pointing along the hands Page 60 52 The Nature of Hand Based Interaction 16th April 2002 521 Gesture Recognition 61 direction can easily be computed In terms of the sensor s coordinate system the X axis points in the direction of the user s hand and the transformed tracker orientation can be viewed as a rotation from the sensor s coordinate system to the environment coordinate system Thus to obtain a vector pointing along the hand direction all that needs to be done is multiply a unit vector along the X direction by the sensor s orientation Since this orientation is already represented by a quatemion or a rotation matrix this is quite easy to do The last piece of information that must be dealt with is the event information If the hand tracker uses buttons to indicate events this is as complicated as dealing with mouse buttons The button state is read periodically and if the state has changed an event is generated Figure 51 shows how buttons can be added to the sensor of an e1ectromagnetic tracker This particular button con guration allows the user to press any of the buttons without accidentally pressing one of the others Multiple buttons can be pressed at the same time in a chording fashion to increase the number of events that can be generated Figure 51 Buttons mounted on the sensor of an e1ectromagnetic tracker If a glove device is used some form of gesture recognition must be used to convert the nger con guration into events In order to do this a clear de nition of what gesture must be produced and then algorithms that recognize gestures can be developed Different people mean different things when they use the term gesture and these different concepts must be clari ed before the different gesture recognition algorithms can be compared 521 GestureRecognition There are several different hand motions that must be considered in any gesture taxonomy Two of the main dimensions to be considered are whether motion is considered to be part of the gesture and how much of the hand is considered to be part of the gesture A gesture can be viewed as either a static or a dynamic entity In the case 16th April 2002 521 Gesture Recognition Page 61 62 Chapter 5 Interaction In 3D of a static gesture which is often called a posture the hand doesn t move while the gesture is made That is a xed static hand con guration de nes the gesture and if the hand moves the gesture may be lost In the case of a dynamic gesture the gesture consists of a path through time That is the static con guration of the hand at any point in time doesn t de ne the gesture but how the hand moves through time is the important de ning property In most cases static gestures are easier to recognize since they are based on a single snap shot of the hand con guration The other main dimension is whether gestures are restricted to the user s ngers or whether the user s entire hand is considered to be part of the gesture In the more general case the position and orientation of the hand are also considered to be an important part of the gesture and the same nger con guration can mean different things depending the hands current position When these two dimensions are combined the result is four different kinds of gestures as shown in table 5 l The term gesture is often used to mean any of the four entries in this table and the more speci c terms are used to identify that particular type of gesture Fingers Whole Hand Static nger posture posture Dynamic nger hand gesture gesture Table 51 A simple gesture classi cation scheme The static gestures are the easiest ones to recognize and the recognition dif culty increases as more informa tion is added and the data becomes more dynamic In the case of nger postures there are a number of simple recognition algorithms based on comparing the current joint angles to prestored joint limits In the case of nger postures a gesture is de ned by a particular con guration of the nger joint angles For each joint a range of angles is de ned this can be done either in terms of a maximum and minimum angle or a center joint angle and a spread If each joint lies with the range de ned for it in the gesture then the gesture is recognized The range of angles used to de ne the gesture must be large enough to accommodate the noise in the glove device and the differences between users The accuracy of the simple nger posture recognition algorithm can be improved by calibrating the glove device The calibration process produces a mapping from the raw values provided by the device to the joint angles for an individual user The calibrated joint angles are then used to de ne the gestures and as the basis of gesture recognition In this way the gestures can be de ned in a way that is independent of individual users Simple gesture editing can be produced based on this simple gesture recognition technique This approach to gesture editing is based on the user making the gesture a number of times and recording the joint angles each time the gesture is made From the collected joint angles and minimum and maximum joint angle can be computed The range can be expanded slightly to account for noise and individual differences and used to de ne the gesture This simple table based approach can be extended to hand postures in the following way Hand posture can be divided into two parts hand position and hand orientation For hand position the area around the user can be divided into a number of regions and these regions are used to de ne the different hand postures The hand regions move with the user and are not xed within the environment For example one region could be around the user s head another one at the user s side and a third one around the sholder corresponding the hand being in a outstretched position The region that the hand lies in can be used as part of the hand posture s de nition A hand orientation can be de ned in terms of a reference orientation and a range around this orientation The distance from t e hand s current orientation to each of the reference orientations can be computed and if one of these distances is within the orientation s range that hand orientation is recognized If both the reference orientations and hand orientation are expressed in quaternions quaternion distance can be used in this computation A complete hand posture can then be de ned in terms of a nger posture a hand orientation and a hand position and each of these can be recognized separately There are several problems with the table based approaches to gesture recognition First noise in the glove devices can make accurate recognition dif cult The joint angle ranges quite often need to be quite large to have reliable gesture recognition Second due to the large joint angle ranges there is a limited range of gestures that can be recognized using this approach In practice only a dozen or so different gestures can be recognized in this way Third the table based approach can t easily be extended to dynamic gestures In a dynamic gesture the joint angles or hand position and orientation change over time and it is quite dif cult to describe this with a static table Better gesture recognition techniques are based on using more sophisticated recognition algorithms A number Page 62 52 l Gesture Recognition 16th April 2002 522 Optical Tracking of Hand Shape 63 of pattern recognition techniques have been applied to the gesture recognition problem One of the most popular ones is neural networks and numerous researchers have applied this technique to gesture recognition These approaches produce better recognition rates and slightly larger gesture vocabularies at the expense of extensive training time For the best recognition rates each user needs to train the recognizer to the way they make the gestures This can be a time consuming procedure and it makes it dif cult to try a system without investing a large amount of time in training its recognizer This approach wouldn t be useful for occasional users who want to immediately interact with the application For most general applications the extra expense of using these techniques doesn t seem to be justi ed by the marginally better recognition rates 522 Optical Tracking of Hand Shape Glove based input devices have a number of problems that make them less than ideal input devices There are three main problems that contribute to their lack of popularity First the glove must be worn which restricts the user s motion and produces one more cable that the user can trip over In addition as the gloves are used they become dirty and tend to develop a smell Many users nd that this makes the glove rather unpleasant to use Since the mechanism used to measure nger movement is attached to the glove it is usually impossible to wash them If there are only a few users then this isn t a major problem but it largely roles out the use of gloves for mass market entertainment Second all but the most expensive gloves have very poor accuracy This makes it very dif cult to develop good gesture recognition algorithms as outlined in the previous section This makes the glove dif cult to use and restricts the range of applications Third gloves tend to be quite expensive A good glove can cost over 1000000 easily making it one of the most expensive components of a VR system Given the problems that gloves have it doesn t seem to be a cost effective input device Most of the glove problems can be solved by using optical tracking In this approach the user doesn t wear anything one his or her hands and the computer system tracks the hand motion using video or some other type of optical technique The simplest optical trackers are based on a single video camera that is positioned to have the best view of the user s hand motions For example if the user is working at a table or work bench the camera is mounted above the user and points down towards the hands Each video frame captured by the cam era is analyzed to determine the current position of the arm and hand This can take the place of a 3D tracking system but with one camera the output will be a 2D position and orientation For many table top applications this may be suf cient The optical tracker can give the position of the hand over top of the table and this position can be used to determine the object currently underneath the hand Depending upon the cam era resolution it may be quite dif cult to recognize the nger and their orientation This suggests that optical tracker are better suited to larger scale motions that involve the hand and arm but limited nger motions Since the video camera captures a sequence of motions it can be easier to recognize dynamic gestures This could be done by the use of video differencing where the previous image is subtracted from the current image to produce a new image the records the change in hand position and orientation over a short period of time The use of a single video camera is a relatively cheap and robust way to track gross arm and hand motions It also has the advantage of being able to track multiple limbs without a major investment in extra hardware For example both hands can be tracked using the same video camera as long as they both stay in the camera s eld of view Similarly two or more people could interact in the same space without the need for extra cameras The only problem with multiple arms is identifying them There are several ways in which this can be done A pure software approach is based on the observation an arm only moves a small amount between captured images The velocity of the arm motion can be measured and used to predict its position in the next image this predicted position can then be used as the start of the search for the real arm position As long as the arms stay well separated the software should be able to track them and keep them separate When a new arm enters the environment it will be recognized as not belonging to any of the existing arms and at this point the system can query the users to identify who the new arm belongs to A hardware solution to the multiple arm problem is to colour code each of the arms so they can easily be recognized This can be done by attaching a small coloured patch to each arm or hand The main problem with the single camera approach is it only produces 2D data and has dif culty with ne nger motions Both of these problems can potentially be solved by introducing multiple cameras In the case of the table top interaction example 3D information can be obtained by mounted a second camera on the side of the table oriented in the horizontal direction This cam era can be used to determine the distance between the hand and arm and the table surface For more general environments where the hand motion isn t as restricted more than two cameras may be required to ensure that the hand is the view of at least two cameras As the user moves his or her body could obscure one or more camera s line of sight so we can t assume that all cameras will be able to 16th April 2002 522 Optical Tracking of Hand Shape Page 63 64 Chapter 5 Interaction In 3D see the hand at all times Similarly the problem of ne nger motion could be solved by using a camera that is focused on the ngers This will produce more information on the ngers that could produce a more accurate of their con guration In this case the camera must be able to track the hands motion to ensure that it remains in the center of its eld of view and occupying as much of it as possible The would probably require some mechanical mechanism to move the cam era or change its orientation The use of multiple cameras requires the integration of information from two or more sources There are several problems associated with this First the images could be taken at slightly different times resulting in measurements of the hand when it is in different positions If the hand is moving quickly even a small difference in time could introduce an unacceptable inaccuracy into the results Second there is the problem of calibration In order to combine the results of several cameras it is necessary to know their relative positions with respect to each other The combined information must be in a common coordination system so there must be an accurate way of converting the data from each camera into this common coordinate system This calibration could be done by exactly measuring the positions of each camera and then not move them for the rest of the interaction Alternatively some type of dynamic calibration based on a background pattern or image could be used to self calibrate the cam eras In either case the accuracy of the results will depend upon the accuracy of this calibration Note that this calibration must be more accurate than that used for multiple trackers since each tracker produces one consistent measurement of object position while each of the cameras produced only part of the information required to determine the obj ect s position Third there is the problem of determining corresponding points in two or more images The aim of these techniques is to produce the location of a point in space In order to do this they must be able to identify the same point in each image with a high degree of accuracy This problem isn t as severe with one camera since all the information required to produce a position comes from a single image If the point isn t detected accurately the results will still be consisted but possibly not the point that we wanted In summary optical tracking has a large of signi cant bene ts making it a very attractive technology Un fortunately there are a number of dif cult problems that must be solved before it can be used routinely in VR applications 523 Two Handed Interaction With two 3D trackers its natural to consider using both hands for 3D interaction The use of two hands is a relatively new idea in user interface design and very systems have seriously explored this interaction style This is probably due to the fact that most 2D user interfaces have a small number of input devices so there is no temptation to use more than one device at a time Most two handed interfaces are based ona assigning different roles to the two hands That is each hand is used to a different set of 3D interactions One way of doing this is based on the observation that most people favour one of their hands For example right handed people typically use their right hand for operations requiring ne motor control such as writing This implies that we should use the right hand for similar operations in 3D user interfaces and the left hand should be used for operations that don t require the same level of ne motor controlling Obviously this assignment of tasks to hands would be reversed for left handed people The suggestion has been made that the left hand should be used to set the context for the operation and the right should be used to perform the operation For example the left hand could be used to de ne a coordinate system or hold an object to be modi ed while the right hand is used to perform the operations This is illustrated in the systems presented in This style seems to work well for tasks that require the setting of context and the hand that performs that operation doesn t need to make many ne movements The tasks should be assigned to the hand so that the user only need to concentrate on one hand at a time otherwise there may need to be a considerable time investment in learning how to coordinate the motions of the two hands Another hand assignment which is found in nonimm ersive systems is based on using one hand for navigation and the other for object manipulation In this case the assignment of tasks to hands isn t as clear as in the previous arrangement and it may be the case that the task assignment is dynamic allowing the user to switch hands at any point in the interaction For nonimmersive VR systems there is still a considerable amount of work to be done on the optimal assignment of tasks to the hands Most of the interaction techniques described in the following sections are based on the use of a single hand thought there are some two handed techniques Any of the single handed techniques can be used in a two handed con guration and in this case there may be room to optimize the interaction Page 64 523 Two Handed Interaction 16th April 2002 524 Other 3D Input Devices 65 524 Other 3D Input Devices There are other 3D input devices that can be used for 3D interaction but they don t work as well as the devices described in the previous subsections A number of 3D joysticks have been produced but none of them provide uid 3D motion One approach to constructing a 3D joystick is to add a dial to the joystick that produces the third dimension This makes it quite dif cult to move in all three dimensions simultaneously Other approaches are based on giving the control stick more degrees of freedom This is fairly easy to do with a force sensitive joystick where the control stick doesn t need to move One of the problem with force sensitive techniques is that there can be cross talk between the dimensions When the user pushes the control stick in one direction it is easy to generate a force in one of the other directions or a torque that could change the orientation result In general it is quite dif cult to use these device to produce more than 2 or 3 degrees of freedom at a time A similar device is the ball based input devices These devices are based on a ball that is several inches in diameter and force sensors are used to detect how the user is pushing or rotating the ball Again these device suffer from a large amount of crosstalk and all 6 degrees or freedom are rarely used at the same time In addition the user doesn t get the same sense of space as he or she does with other 3D input devices With these devices the user is basically modifying the position or velocity of a 3D cursor instead of directly positioning it In other words the 3D position is produced in an indirect manner The properties of these devices often dictate a separate set of interaction techniques that overcome some of their limitations A number of the interaction techniques described in the following sections don t work particularly well with these devices and the application designer should be aware of this if he or she intends to use them in an application 53 List Selection The list selection task is characterized by selecting one item for a list of known items In many cases the items in this list are commands but the interaction techniques discussed in this section can be used for any nite list of items Examples of noncommand items lists are le names primitive geometrical shapes and object properties 531 General Design Considerations List Selection Any interaction technique for list selection consists of at least three components The rst component is some mechanism for informing the user of the selections that are available This usually takes the form of a display of the list of available items The second component is the mechanism used to select an item from the list The nal component is the feedback used to inform the user of the selection All three components must be considered in the design of an interaction technique for list selection One of t e important considerations in all 3D interaction techniques is how well they blend into the user s 3D environment This is a critical factor in immersive applications but must be considered in all 3D applications The basic issue relates to how the user perceives the interaction technique within the 3D space of the application The user s concentration is on the application and is thus centered on the application s 3D space If the interaction technique disrupts this concentration it will take the user some time to reassimilate the applications 3D space when the interaction is complete The time could vary from 01 second to several seconds depending upon the size of the disruption and the application s complexity If the interaction occurs infrequently such as loading and saving les this disruption won t have a major effect on user productivity In this case one of the 2D interaction techniques outlined at the end of this chapter can be used But if the action is a frequent one such as selecting the primitive shapes used in modeling operations the disruption could make the application unusable Blending the interaction technique into the application can be quite dif cult If the interaction technique is to appear in the application s coordinate space there are a number of design and implementation issues that must be dealt with First when the interaction technique appears it must be visible to the user and he or she must be able to interact with it This may be dif cult to do in a cluttered environment where there is a good chance that application objects could obscure the interaction technique if it is placed the ideal distance away from the user This problem can be partially solved by drawing the environment in wire frame or transparently In this way the interaction technique will show throuin any of the objects that might obscure it Second the interaction technique mustn t clash with the environment to the point where it disrupts the user s concentration That is its presentation style mustn t draw the user s attention to the point that he or she stops concentrating on the application and starts concentrating on its interaction style 16th April 2002 524 Other 3D Input Devices Page 65 66 Chapter 5 Interaction In 3D These design concerns raise a number of implementation concerns that aren t present in 2D user interfaces The most serious one is that the interaction techniques may need to be programmed in the application s coordinate space If the interaction technique is to be tightly blended with the application s space it may need to be imple mented in that space In 2D interaction technique programmers work in a coordinate space that is independent of the application so this hasn t been an issue until now In the case of 3D interaction techniques the presentation must be capable of transforming in the same way as the application graphics It must be capable of being translated scaled and rotated and will have the standard viewing and perspective transformations applied to it The important components of the presentation must remain legible through these transformations This argues for presentations that are simple and have a high geometrical content Another implementation and design concern is the layout of interaction techniques Most 2D user interfaces have a fairly static layout where most of the interaction techniques have a xed location on the screen that is determined at design time The only real exception to this rule is some types of popup menus and dialogue boxes but their positions are still largely predictable In the case of 3D interaction techniques layout is a more di lcult issue First the designer has to determine whether the interaction technique has a xed location in space or follows the user as he or she moves through the environment If an interaction technique is used with a particular object or in one part of the environment it may be a good idea to assign the interaction technique a xed location close to the objects it controls Since the interaction technique is only applicable in one part of the environment it makes little sense to have it move with the user to areas where it can t be used On the other hand there are other operations that are applicable anywhere in the environment or in at least several different places that are widely separated In this case assigning a xed location to the interaction technique may not be a good idea since the user will need to move to it each time interaction is required Common interaction techniques such as command menus are best to move with the user where they can be invoked easily regardless of the user s position within the environment In the case of xed location interaction techniques the layout problem involves placing the interaction tech nique in a location where the user can easily interact with it This can be viewed as an extension of object and environment design where the designer not only selects the objects and their positions but also the interaction techniques and their positions This could easily be done in an interactive environment editor The designer should consider how the user will typically be positioned within the current part of the environment and position and orient the interaction technique so it will be easy to interact with from this typical position In the case of interaction techniques that move with the user the layout problem can be more complex In this case the interaction techniques will usually be hidden until the user wants to interact with them If the interaction techniques are large and always displayed they will obscure the objects in the environment and possibly prevent the user from performing some interaction tasks This gives the designer three choices in the placement of interaction techniques when they aren t in use First the interaction may have a xed location with respect to the user but have no visible appearance when not being interacted with The user knows the location of the interaction technique and the action that must be performed to activate it Once activated the interaction technique is visible and the user can interact with it The main problem with this approach is the user has no feedback on the location of interaction techniques when they are inactive The user must rely on external documentation to locate them Second the interaction technique can be represented by a small icon within the user s eld of view In order to activate the interaction technique the user selects its icon This solves the problem of location feedback for inactive interaction techniques and if the icons aren t very large they won t occupy a signi cant amount of the user s eld of view The icons could still distract from the user s tasks but they aren t as disruptive as displaying the entire interaction technique Third the interaction technique could be placed in a position that is outside the user s normal view volume but can be accessed by a simple motion For example all the inactive interaction techniques could be arranged around the user s feet and the user only needs to look down in order to n them To activate an interaction technique the user can select it and it will move up to the user s normal view volume Again this solves the problem of having the inactive interaction techniques in front of the user s eyes and at the same time gives the user rapid access to them The main problems with this approach is laying out the inactive interaction technique so they don t obscure each other and any important objects in the environment 532 Interaction Techniques for List Selection This section examines some of the interaction techniques that have been developed for list selection These inter action techniques vary in the number of items they can handle and their degree of blending with the environment If the set of items is small the set of events generated by the input device can be used for command selection For example a glove input device can be used to select from a set of 8 or 10 commands using gesture recognition Page 66 532 Interaction Techniques for List Selection 16th April 2002 532 Interaction Techniques for List Selection 67 This approach has the advantage of not requiring any display space thus removing the problems of layout and the interaction technique being obscured by objects in the environment In return for this advantage this approach has a number of important disadvantages First the user has no feedback on the commands that are available and the actions that are required to select them Since there is no display associated with this interaction technique there is no way of informing the user within the environment of the commands that can be selected and how they can be selected The user must rely on external documentation to determine the set of gestures required to invoke different commands This places an extra cognitive load on the user that can only be justi ed in user interfaces with a very small number of commands 3 or less or where the user spends a considerable amount of time in the environment Second once the user has selected the comm and there may be no feedback on whether the selection was successful and which command was selected After making a gesture the user needs to be informed whether he or she has been successful which in the case of menus is usually done by highlighting or ashing the selected item With gestures there must be some external way of providing selection feedback Several approaches to providing feedback have been tried One approach is to change some aspect of the graphical representation of the hand This could involve changing the shape of the hand or adding a small object to the hand that identi es the command Nonvisual feedback can also be used such as associating a unique sound with each command and when this command is selected the sound is played The sound could be played once when the command is selected or continuously while the command is in effect In 2D menus are often used for list selection so it seems natural to extend them to 3D In order to do this some way of displaying the menu in 3D must be developed along with a way of selecting the displayed menu items One approach is to display the menu on a at surface such as a polygon and then use the hand to grab or point at a menu item This approach is similar to embedding a 2D display in the 3D environment and using standard 2D interaction techniques This approach is described further in the last section of this chapter but merits some discussion at this point The at panel display of a menu can work well in some applications but in general isn t the best approach This technique works well when the menu is attached to an object in the environment and one of the object s surfaces can be used for the menu In this way the menu becomes a natural part of the environment and blends into the environment If this isn t the case the menu display can be a major distraction since it oats in the environment obscuring other objects and possibly spoiling the illusion of 3D The menu display should blend into the environment but the panel menu takes the form of a large at object that pops up and down as the user interacts with it This is similar to a person shoving a piece of paper in your face while you are trying to do somet ing The panel menu does illustrate an important issue in 3D interaction technique design that is action at a distance versus direct interaction In the case of direct interaction the user must place his or her hand in the menu item or touch the menu item This implies that the menu is within the range of the user s hand and if it isn t the user must move to it in order to select an item In action at a distance the user points at the menu item to be selected In this case the user doesn t need to be close to the menu he or she just needs to be able to point at the menu Essentially a ray is extended from the user s hand to the menu and the item this ray intersects is the menu item that is selected Action at a distance allows the user to interact with menus that aren t within reach and gives the environment designer more freedom in menu placement within the environment The main drawback is that action at a distance usually isn t as accurate as direct interaction so the user could make more errors One approach to embedding the menu into the 3D space of the application is to give the menu a 3D shape and manipulate this shape in order to select menu items One way of doing this is shown in gure 52 where the primitive shapes in a geometrical modeler are distributed over the surface of a sphere This type of menu has been called a daisy menu The center of the sphere is attached to the sensor so as the user moves and twists the sensor the menu makes the corresponding movements In the way the user can rotate the menu to examine all the menu items Menu selection is perform ed by rotating the menu so the desired item appears in the selection cone The selection cone is always drawn facing the user so it moves with the menu but doesn t rotate when the menu rotates Thus by rotating the tracker in space the user can rotate the desired menu item into the selection cone and then signal the selection Either hand gestures or buttons can be used to control the daisy menu If buttons are used the menu appears when a button is pressed and as long as the button remains pressed the user can freely rotate the menu When the button is released the item that is in the selection cone becomes the selected menu item Essentially the same thing can be done with glove gestures When the gesture is made the menu appears and when the gesture is released the item in the selection cone is selected Note that this mechanism doesn t provide a way of canceling the selection which could be viewed as a drawback of this approach While the daisy menu integrates nicely into the 3D environment it has three main problems First for best 16th April 2002 532 Interaction Techniques for List Selection Page 67 68 ChapterS Imeaaeoaaxazn 0 r I Fagaaae 5 2 Aaaasymema a a wquot h m anthz rmm Thls eaabe mmdbymmmg u meme busnmhe nems an me me am meme wan tandm me Man 3D mzmls shnuldmdynqm moan ahaanaae axis m mh amzmlselecnan M m mm ahmnmhzuxls game m came masuae dragged asaag mthz mmmn Thls suggests um Same fth pzdulems wnh me aaasy mean can be salved byusmg a decreaem meaaxayam Th2 nag meme h me bunan as aeleasea m be dn cuh m v A gaad example afthls appmach m u mzmls used In Deemg39s Halnskzmh ednax Deenag 1995 1 Th Page 68 5 z 2 Imeaamaa Technlquzs faxLlstSelzconn mm ApanEIEIZ 5 4 Object Selectlon 69 9 y on motel Flgure 5 3 Arlng menu t m he mt l rgrt u but Instead m me of the geometncal object that represents tlne menu rtem Thlsmenu layout ls shown m gure 5 4 Frgure 5 4 A sundlal menu In a 3 m m nu 3D menus The posrtron and ls parallel to the natural axls othe tracker That ls the tracker ls currently porntrng at the center of the u user rotates or moves the tracker slrghtly so that rt ls now porntrng at the l cancel operatron All the userneeds 1t ls also posslble sub rmenu T ple menus n nut na ke c c r c m use can trace out a stroke 111 3D to traverse the menu hlerarchy Fwd The man k solyed ths problenn by fadlng out the enyrronment and fadlng ln tln when rt ls popedrup The oppolee op a on ls performed er an rtenn ls selected Thls reduces the transrtron between the enyrronment an the menu task and the task l a nu the enyrronment For an erpenenced user thls could be an annoyance 54 Object Selection ln The range of objects that can be selected by drrect rnteractron ls obvlously a subset of the ones that can be I e w slnce most lf not all the loth Apnl 2002 5 4 Object Selectron Page 69 70 Chapter 5 Interaction In 3D the user is interested in 541 General Design Considerations Objects One of the important design considerations for object selection techniques is the de nition of the object to be selected This reduces to the picking problem which has been a long standing problem in interactive computer graphics When the user selects an object he or she usually selects a point or small region on the object There is some ambiguity in this selection in the sense that without additional information we don t know what the user is trying to select The user may want to select the complete object that the point belongs to or the intention could be to select just the geometrical primitive that the point lies on If the object has a hierarchical structure the user could be trying the select any of the subparts that the point lies on Alternatively if the selected point lies near a vertex the user could be trying to select the vertex There are several ways in which this ambiguity can be removed The most common way is to use the context of the interaction If at the current point in the dialog the user can only select objects then the selection must be the object and none of the other alternatives is valid at this point In most cases the dialog can be structured to avoid selection ambiguity but this isn t always the case To see why this is the case consider an interactive editor for hierarchical objects In this editor a move command could be used to move the entire object or it could be used to move one subpart of the object with respect to the rest of the object When a selection is performed it may not be obvious which of these two operations the user intends There are several ways out of this problem One is to have different selection command for different levels of the modeling hierarchy This places an extra burden on the user since the level of the modeling hierarchy must be determined before the selection operation can be perform ed Another approach is to cycle through the possible selections after a point on the object has been selected In this approach the complete object may be highlighted rst and user prompted to con rm the selection If the selection isn t con rm ed the next possible selection is highlighted one of the obj ect s subparts and the user is again asked for con rmation This process continues until the user selects a subpart or this list of subparts is exhausted Another design consideration in object selection techniques is whether the selections only occur on the surface of the object or whether the user can also select parts of the object s internal structure In some applications it doesn t make much sense to select the interior structure of the object but in other applications such as computer aided design and medicine it may be desirable to select features inside of the object This adds an extra level of complexity to the selection technique and the problem of ambiguity First the user must be able to see inside of the object in order to make selections inside of it This can be done in several ways One is to make the object partially transparent so both the internal and external structure of the object are visible at the same time Another approach is to use cutting planes to cut away the external structure of the object and make its internal structure visible Second if both the internal and external structure of the object are visible how do we know which the user is trying to select If the user points at the object the line from the user s hand that intersects the object will intersect both the obj ect s surface and its internal structure The techniques used to reduce ambiguity outlined above can also be used in this case 542 Interaction Techniques for Selection The interaction techniques for selection can be divided into two groups depending upon whether direct interaction or action at a distance is use In the case of direct interaction the user places his or her hand in the space occupied by the object and indicates the selection either using a glove gesture or by pressing a button This is a very natural interaction technique that closely mimics the actions people perform when grabbing objects in the real world Due to its naturalness it s quite easy to learn this interaction technique and users have little trouble performing it The grabbing interaction technique has a number of disadvantages First like all direction interaction tech niques the user can only interact with objects that are within arm s reach If an object isn t close to the user the user must rst move to the object before it can be selected Second the user s hand tends to hide the object that he or she is trying to select As the user moves their hand towards the object the representation of the hand drawn in the environment will obscure all or part of the object to be selected If the object is small the user may have trouble nding the object so it can be selected Third this selection technique is not very accurate in the sense that the hand can be large in comparison to the object or feature to be selected This selection technique really doesn t produce a point but rather a volume corresponding to the volume occupied by the hand In the case of large objects Page 70 541 General Design Considerations Objects 16th April 2002 542 Interaction Techniques for Selection 71 this usually isn t a problem but with small objects or when the user attempts to select smaller features this can be a signi cant problem One way of solving the problem of grabbing objects that are out of reach is to scale the environment so the user can easily grab them One technique that uses this approach is Worlds in lIiniature WIIVI In this approach the users holds a scaled down copy of the environment in one hand and uses the other hand to manipulate it The hand that is holding the environment can be used to position and orient the environment so the object is easy to select It serves as the frame of reference for the selection task The other hand is used for the ne scale manipulation of the object The user can see both the scaled version and original version of the environment at the same time so it is relatively easy to see whether the correct object has been selected While the WIM approach solves some of the major problems with object grabbing it also introduces two new problems First it only works with environments that are of a limited size In order for WIM to work it must be possible to scale the environment so it ts within the hand and at the same time the object of interest are not scaled to a size where they can be seen or selected The ideal environment size seems to be on the order of typical room If the environment is much larger the scaled down version will be too hard to work with Second in the scaled down version the hand or cursor used to grab the object will be large compared to the objects being selected This could cause problems with accurately selecting objects since the hand could cover several objects and the hand could obscure important parts of the environment making it hard to nd objects For small objects a level of ne motor control that may be hard to achieve with current tracker technology may be required Most of the action at a distance interaction techniques are based on some form of pointing The simplest approach to pointing is based on shooting a ray from the user s hand into the environment The start point of this ray is the hands current position and its direction is given by the direction in which the hand is pointing The rst object intersected by this ray is the selected object The techniques that have been developed for ray tracing can be used to ef ciently implement this interaction technique This includes the techniques that have been developed for intersecting rays with objects and the techniques for improving the ef ciency of ray tracing ere are several advantages to the ray based approach to selection First like the grabbing technique this technique is familiar to most users and they can quickly learn how to use it Second since its implementation can be based on ray tracing this interaction technique can easily be implemented by taking advantage of existing ray tracing code This greatly simpli es the design and implementation process Third the user can select any object that is visible from the hand position regardless of its distance from the hand This can be a considerable time saving if the user must interact with objects that are distributed over a large environment The main disadvantages of this selection technique are the result of the ray having a in nitely small cross section That is the ray is viewed as a line which must exactly intersect an object in order to select it This isn t a problem if the object is large and close to the user since it will cover a large number of pixels and it will be easy to point at On the other hand for small or distance objects their display will only cover a few pixels on the screen and the user could have problems exactly hitting these pixels In other words the dif culty in selecting an object depends upon the number of pixels it covers on the screen the larger this area the easier it is to select the object Tracker noise can further complicate the selection process This noise mainly effects the direction of the ray used to select objects If the object is close to the user this noise doesn t cause a major problem since the displacement caused by the noise is quite small But if the object is at a considerable distance the displacement can be quite large causing the user to completely miss the object no matter how hard he or she tries to select it This problem is illustrated in gure 55 In this gure the solid line shows the ray required to select the two objects in the gure Tracker noise causes the ray direction to be displaced by d6 producing the ray represented by the dashed line In the case of the closer object the displacement caused by this noise isn t large enough to cause a selection problem but for the further object the displacement is far too large If the tracker noise causes a difference of d6 in tracker orientation and the object to be selected is D away from the tracker then the error displacement at the object E is given by E D sindt9 Thus the size of the error displacement is proportional to both the distance of the object from the tracker and the magnitude of the tracker noise Figure 55 Selection errors caused by tracker noise Figure 54 suggests one way of solving the problems with the ray based selection techniques This solution is 16th April 2002 542 Interaction Techniques for Selection Page 71 72 Chapter 5 Interaction In 3D based on replacing the ray by a cone with the radius of the cone determined by d6 the magnitude of the tracker noise Any object that falls within this cone is selected With this approach the cone masks the tracker noise since any displacement cause by tracker noise won t cause the object to move outside of the selection zone de ned by the cone The use of the cone also makes it easier to select distant and small objects These objects cover a small number of pixels on the screen but since the cone de nes a selection region instead of a single point the user doesn t need to be as accurate in positioning the cone The use of a cone introduces a new problem The cone will potentially intersect many objects but the user only wants one of these objects The problem is determining the object that the user intended to select In the case of ray based selection the ray could interest multiple objects but this problem could be solved by always using the closest object In the case of cone based selection this technique can t be used since there might not be a unique closest object In other words the cone could interest several objects that are approximately the same distance from the tracker In this case distance can t be used to select the desired object For cone based selection a better selection metric is required This selection metric is based on two factors the distance from the tracker to the object and the distance of the object from the center line of the cone The rst factor takes into account the fact that the user is usually interested in the rst object intersecting the cone The second factor accounts for the fact that an object that is close to the center of the cone is more likely to be selected than one that is at the edge of the cone The function used to combine these two factors depends upon the application and the emphasis that is placed on the closeness of the object versus the extent of its intersection with the cone There are many metrics that can be used for cone based selection we have used one that is easy to compute and produces good results This metric is based on a point A that lies on the object and is close to the center line of the cone This point can be determined by examining all of the vertices of the object and selecting the one that is closest to the center line The point A is transformed into the coordinate system of the tracker to produce a new point B I y This transformation is constructed by translating the tracker position to the origin and then rotating the coordinate system so the xaxis points along the tracker s orientation The following equation is used to compute the goodness measure G for our metric A 1ltK271gt a GlBl where 7 sina a spread angle of the cone 3 z2y222 A value of G is computed for each object that intersects the cone and the object with the smallest value is used as the selection As can be seen this metric places higher weight on those points that are closer to the origin of the ray and closer to the center line of the cone This metric can be viewed as de ning a set of isosurfaces with the points on each isosurface having an equal probability of selection A two dimensional projection of these isosurfaces is shown in gure 56 From this gure point P would be selected over point Q since it is quite close to the center line even though point Q is slightly closer to the cone origin Figure 56 Isodistance surfaces for the spotlight selection technique Visual feedback for this interaction technique is easy on any workstation that supports lighting A cone cone can be drawn to represent the selection area and its a good idea to use some form of transparency when drawing it so it doesn t obscure objects in the environment In addition a spot light can be placed at the tracker position with an angle equal to the size of the cone This spot light will illuminate all the objects that fall within the cone s selection region If a distinct colour is used for the spot light the user can easily determine the objects that are likely to be selected and this doesn t required any computation on the past of the application There are numerous re nements that can be made to this object selection technique One is to use different selection metrics that are better suited to the current application Since the selection metric is only used to rank the Page 72 542 Interaction Techniques for Selection 16th April 2002 55 Point Selection 73 objects that fall within the selection cone this can be made a parameter to the interaction technique and varied with one application to another Other modi cations are based on reducing the number of physical actions that must be performed to select an object There are several ways in which this can be done One is to assume that the user is already looking at the object to be selected The initial axis of the cone is then aligned with the user s line of sight so minimal motion is required to select any object that is likely to be within the user s focus The initial axis could also be set to lie along the ray from the user s eye through the current tacker position This will produce an initial set of selected objects that are underneath the tracker cursor in the same way that mouse clicks are used in 2D user interfaces These different variations must be tried in individual applications to determine the one that works best in each case 55 Point Selection The point selection task is used to select a point that isn t associated with an object in the environment That is it selects a point that is located in empty space This point could be used as the location of a new object or as one of the control points that de ne an object s shape It could also be used as the target position in a navigation operation The main problem that must be solved by all the techniques in this section is clearly identifying the point to be selected and providing the feedback required to con rm the correct choice of location 551 General Design Considerations Points Since the user is selecting a point in empty space feedback is the most important consideration in point selection interaction techniques The point doesn t lie on a landmark that makes it easy to determine if the user as selected the desired position Due to the perspective projection used in most virtual environments the visual display may not be a good indication of the points location unless artifacts are added that give the user more information There are several artifacts that can assist with point selection One of the more obvious ones is the use of cross hairs Cross hairs are lines that are drawn trough the point parallel to the coordinate axis in eye coordinates Conceptually these lines extend in nitely in both directions so the user can determine the relative position of the point with respect to other objects in the environment For example to determine the depth of a point the user can follow the cross hairs that are parallel to the projection plane to see which objects they intersect The point will be at the same depth as the objects the cross hairs intersect This is a relatively easy technique to implement requires little computation and display time and adds little clutter to the display One of the main problems with this approach is it may be dif cult to nd the cross hairs in a crowded scene A technique related to cross hairs is shadows For this technique to work the user must be working in an envi ronment that has walls that are easily visible Ideally the walls of the environment are aligned with the coordinate axis but this isn t necessary Instead of drawing cross hairs the point casts shadows on the environm ent s walls These shadows give an indication of where the point is located In addition some or all of the objects in the en vironment can also cast shadows to give the user an indication of the relative position of the point In many ways shadows give better visual feedback than cross hairs since the user has only small number of places to look for the feedback In the case of cross hairs the user must follow the lines through the environment while with shadows only the walls need to be examined There are several problems with the use of shadows First the environment must have walls to project the shadows onto which may not always be the case Second with a large number of object casting shadows it may be quite dif cult to determine the point s position and how it interacts with the other objects With only a few objects its easy to determine each obj ect s shadow but as the number of objects increase this rapidly becomes a very dif cult task Third producing the shadows requires more computation than the cross airs Another approach to feedback is to display an object at the point s location For example if point selection is used as part of a navigation task a simple representation of a person could be drawn as a cursor This larger object gives a better feel for the point s location particularly its distance from the user Similarly if point selection is used object location use the object to be positioned instead of a cursor This type of feedback requires extra display time but gives the user a better feel for the location being selected A nal feedback technique is to print the coordinates of the point next to the cursor At rst this may not seem to be a very good technique but it is useful when the user has some idea of the coordinates of other objects in the environment For example in building design the user has a good idea of the location of major landmarks in the building Displaying the cursor s coordinates well inform the user whether the point is in the correct general area 16th April 2002 55 Point Selection Page 73 74 Chapter 5 Interaction In 3D Several of the feedback techniques can be used together to provide a complete solution In this way the strong points of one techniques can be used to overcome the weak points of another 552 Interaction Techniques for Point Selection The interaction techniques for point selection can be divided into two groups depending on whether they use direction interaction or action at a distance The direct interaction techniques are based on directly the positioning the cursor using the hand at the point to be selected and then indicating the selection with an event The main issue with these techniques is preventing the hand from blocking the part of the environment where the desired position is located One way of handling this is to use a small cursor to echo the hands position instead of using some representation of the hand itself The one main advantage of direct interaction in this case is that the user has a much better feel for the 3D space since his or her hand motions directly correspond to motions in the 3D space The user can move the cursor to some of the objects in the environment to get a feels for its relative size and determine the approximate are where the point should be located This intuitive feel for the 3D space is missing in the action at a distance techniques The main drawback of this approach is that the user can only select points that are within reach Again this can be solved by scaling the environment so the desired point is within reach but the user will need to have control over this scaling operation if the environment is quite large In this case it might be better to navigate to the neighbourhood of the desired point Scaling the environment could negate some of the advantage of the intuitive feel for locations within the environment Action at a distance techniques for point selection are more dif cult The main dif culty is determining the distance between the user and the selected point With a ray casting approach its easy to specify the direction to the desired point but how can be specify the distance along this ray One way of doing this is to display a cursor at some point along the ray and use two buttons to move it along the ray One button is used to move the cursor further along the ray and the other is use to move it back towards the user A third button can be used to signal that the cursor is at the desired position This technique works reasonably well if the cursor is the object to be positioned or a representation of the user in the case of navigation tasks In general action at a distance doesn t work as well for this task as direct interaction 56 Object Manipulation The previous interaction tasks are largely application independent but this isn t the case with many object ma nipulation operations There are object manipulations that are application independent such as positioning and stretching objects but there are also many operations that are application dependent such as special purpose shap ing operators for a speci c type of application object This section concentrates on the manipulation operations that are application independent but does provide advice on constructing interaction techniques for application speci c operations An object manipulation operation is an operation that changes one or more properties of an object These oper ations assume that the object has already been selected and the interaction technique must specify the operation to be performed and any parameters to this modi cation As outlined in section refBasic Interaction Tasks these op erations can be divided into two main groups geometrical 391 39 and n n m t39ic 391 39 This division is based on the fact that it is much easier to represent changes to an object s shape than it other properties This can be seen in 2D user interface where the size or shape of an object can easily be changed by dragging but other properties must be changed through interactions with menus or property sheets Because of this difference between geometrical and nongeom etrical manipulations the interaction techniques for them are described in two different subsections 561 General Design Considerations Object Manipulation One of the main problems with designing object manipulation interaction techniques is the large number of degrees of freedom involved In 2D it is fairly easy to resize an object by enclosing it in a bounding box and then dragging the corners or sides of this box When this approach is transferred to 3D the operations becomes much more complicated The 3D equivalent of a 2D box has 8 vertices 12 edges and 6 faces Handles need to be placed on each of these features to show the user that they can be dragged This adds a considerable amount of clutter to the Page 74 552 Interaction Techniques for Point Selection 16th April 2002 562 Interaction Techniques for Geometricalquot 391 39 75 display and assumes that the object occupies enough screen space so that the handles can easily be selected Even worse at any point in time approximately half of the handles will be obscured by the object itself The means that the user would need to rotate the object in order to access some of the handles required to resize the object The problem becomes even more complex if the user wants to change the object by dragging the vertices that de ne it Even simple objects have on the order of hundreds of vertices so this approach could easily lead the a high clutter level and considerable problems in selecting vertices There is also a wider range of operations that can be applied to objects and these operations have more degrees of freedom Consider the case of modifying the shape of an object by moving one of its vertices In 2D this is relatively easy since there are only two degrees of freedom and shapes tend to be fairly simple In the case of 3D there is the problem of moving in three directions and this motion must leave the object in a consistent state In the case of a polygon with more than 3 vertices its possible to drag a vertex so it is no longer on the polygon s plane What happens in this situation One approach is to project the new position back onto the plane and use it as the point s new position This is a reasonable approach since most users will nd it very dif cult to move a vertex in a plane so constraining its new position to lie on the plane seems like a good way of assisting the user One the other hand the user may want to pull the vertex off the plane and have several new polygons constructed Both of these are valid possibilities and both may be required in a particular application These could be viewed as two different operations and the user may need to specify the intended operation These problems force the designer to think in terms of higher level operations Since changing the shape of an object one vertex at a time involves a large amount of interaction we can t reasonably expect a user to interact in that way We must provide operations that allow the user to operate at a higher level This can be called the degrees affreedam in 3D user interfaces taking the obvious approach to user interaction produces a system that is unusable due to the large number of operations that must be perform ed to accomplish many tasks This problem can only be solved by replacing these small interactions with more global ones that operate many parts of the object simultaneously Designing these global operations is one of the main challenges of 3D user interface design 562 Interaction Techniques for Geometrical Manipulations The most basic geometric operation doesn t change the geometry of the object but moves it to another position or orientation in the environment The simplest approach to performing this task is to grab the object and move or rotate it to the desired position or orientation But even with this simple operation the degrees of freedom problem strikes Most 3D trackers provide 6 degrees of freedom three for translation and three for rotation If we use all 6 of these degrees of freedom the nal object state may not be what the user wanted For example the user may only want to translate the object and not rotate it Without constraining the data coming from the tracker a rotation will likely be performed as well since most users can t move the tracker without rotating it a small amount Similarly the user may want to rotate the object but again a translation is likely to occur as well In other cases the user may only want to move the object in a plane or along one axis In all of these operations we want to reduce the degrees of freedom provided by the tracker we don t want to use all the information it provides just the subset that accomplishes the user s task How can we package these ideas into an interaction technique Restricting motion to only rotation or translation is fairly easy We can use a button or similar event to signal that a restricted motion is to be performed For example can be used to indicate translation only and another to indicate rotation only How do we restrict the motion to a plane or axis At rst it appears that a button can be used for this as well but the problem isn t quite as simple as it looks We need to know the plane or axis the motion is to be constrained to the user must be able to specify this information in some way Its not likely that the user will want to move in one of the coordinate planes or along one of the coordinate axis the plane or line of motion is more likely to be de ned by another object in the environment Consider a box sitting on top of a table The user will most likely want to move the box in the plane de ned by the table top which likely won t correspond to one of the coordinate planes One way of doing this is to select two objects with one object used to de ne the constraints on the other object s motion How can we set of these constraints for an arbitrary object Any large planes in the object are obvious candidates for constraint sources Each of these planes de nes a plane constraint plus an axis constraint based on the normal to the plane Most of these constraints must be de ned by hand when the object is de ned Several approaches have been taken to the design of interaction techniques for modifying an obj ect s shape As noted earlier the handles used in 2D user interfaces cause a considerable amount of clutter in 3D user interfaces This problem can be solved by not displaying the handles and using the cursor shape to indicate the type of operation that can be performed at each point in space As the cursor approaches an area that would normally be 16th April 2002 562 Interaction Techniques for Geometrical Manipulations Page 75 76 Chapter 5 Interaction In 3D drawn as a handle it changes shape to re ect the operation that would occur if the handle was grabbed and dragged In the case of simple bounding box handles there would be distinct cursor shapes for dragging a vertex edge or face of the box and possibly another shape for dragging the whole box This solves the clutter problem since the handles aren t drawn but the user now needs to search for the handle locations For experienced user this won t be a problem but it could cause some learning problems The problem of rotating the object to reach a handle that needs to be dragged still exists with this solution Researchers at Brown University have suggested that a set of widgets could be used to modify object shape This notion of widgets is similar to that used in 2D user interfaces where a widget is viewed as a tool used to manipulate an object in the user interface A widget is attached to the object to be manipulated and then the user performs operations on the widget they are re ected in the attached object The widget s geometry is much simpler than the object being manipulated and it has a small number of obvious handles that control the interaction so many of the problems associated with directly manipulating objects don t occur when these operations are applied to the widget In this sense the widget is a surrogate object that reduces the number of degrees of freedom in the interaction One example of this type of widget is the rack which is shown in gure 57 As can be seen from this gure the widget as very simple geometry and obvious handles The user can scale one of the handles to perform a scaling operation on the object This operation acts like a taper where the scale factor is interpolated along the length of the object Similarly one of the handles can be rotated to apply a twist to the object One of the end of the object is held stationary and the other end is rotated with the handle The rest of the object between the two handles is then twisted to maintain the obj ect s continuity Both of these operations are at a high level since they effect most of the vertices that de ne the object Figure 57 Rack Widget in Use An idea similar to widget is the use of tools A widget is a device that is attached to and object and then the user manipulates the object through the widget A tool replaces the cursor and changes the operation that is performed when the user interacts with the object This is similar to the notion of tools in the real world The user has a collection of tools called a tool bench that he or she can select from To start a manipulation operation the user rst moves the cursor to the tool bench and selects the desired tool The user then moves the tool to the target object and signals when the operation is to start and nish The tool metaphor can be used with a wide range of manipulations and since this framework is extensible it is easy to add new operations without changing the general structure of the user interface A good example of a tool is a deformation operation When this tools is brought close to an object it repels the vertices that are within the tool s radius The user can control the size of the radius which can be used to vary this from a local to a global modi cation operator This tool can be reversed to produce a stretching operator that pulls part of the object s surface to follow the user s hand motions Another type of tools can be based on cutting operations and used to remove part of the object The tools metaphor is well suited to global operations that effect an area of a single object This approach doesn t work as well when several operands are required for the operation or the user must perform several actions to accomplish the task Figure 58 Use of Tools Hierarchical meshes is one way of controlling the locality of shape modi cation operations When modifying an object s shape users often want to work at several levels of details For some operations only one or a small number of vertices need to be changed but other operations require more global changes potentially changing a signi cant fraction of the object s vertices Smoothly moving between the different levels of detail is a challenge to user interface designers In the case of a hierarchical mesh the vertices in an object are organized into a hierar chy of meshes with the lowest level mesh containing individual vertices and the higher level meshes combining neighbouring vertices to produce a higher level structure When the user operates at the lowest mesh level indi vidual vertices can be moved and there is a high degree for control over the shape of individual polygons As the user moves up the hierarchy the number of vertices effected by a given operation increases and the user works with higher level object features The relative positions of the low level vertices are maintained when higher level operations are performed so the detailed shape doesn t change when the high level structure is modi ed Page 76 562 Interaction Techniques for Geometrical Manipulations 16th April 2002 563 Interaction Techniques for NonGeom etrical Manipulations 77 The main advantage of this approach is it allows the user to easily operate at several levels of detail and use the same set of operations for both local and global shape modi cation The main problem is that it is restricted to objects that have a well de ned mesh structure This technique can easily be applied to objects constructed from polygonal meshes or surface patches the control points are used to de ne the mesh but doesn t work well with objects constructed in other ways 563 Interaction Techniques for Non Geometrical Manipulations The standard approach to nongeom etrical properties in 2D user interfaces is to use some form of property sheet This works reasonably well in 2D as a general solution but has additional problems when used in 3D user in terfaces The most serious problems are the positioning and presentation of the property sheet itself In 2D user interfaces property sheets often have a xed screen location or are sometimes displayed beside the object they modify This placement strategy quite often doesn t work in 3D Fixed placement forces the user to move to the property sheet in order to modify the object and at the new location the object controlled by the property sheet may no longer be visible The property sheet could be at a xed location relative to the user but in this case there is a good chance that it could hide the object being modi ed Placing the property sheet beside the object leads to a similar set of problems The property sheet could end up being positioned in way that the user can t see it or may have considerable di lculty interacting with it This will force the user to move in order to perform the interaction There is also the question of when the property sheet should be displayed If they are continuously displayed they will add a considerable amount of clutter to the environment and if they are only displayed when required there needs to be some way of indicating when they should be displayed The appearance of a property sheet is also a problem The rst problem is that they are typically at and thus can destroy the illusion of 3D but this is only a minor problem compared to the next two How big should a property sheet be The graphics for the property sheet are generated in application space so the designer can t use one xed size that will work in all applications or in different parts of the same application The property sheet must be sized so that it is legible the user can interact with it and at the same time it doesn t occupy to much of the screen space A property sheet that is so large that only part of it can be displayed at a time is very dif cult to interact with One way of approaching this problem is to base the sizing of the property sheet on the viewing pyramid The depth within the viewing pyramid is rst determ ined and then it can be scaled so it occupies a certain percentage of the space at that depth This has the advantage of producing property sheets that are all close to the same size when displayed on the screen but has the drawback of extra computation time to relayout the property sheet each time its displayed Objects with a large number of properties cause layout problems for 3D property sheets In 2D this problem is solved by scrolling the property sheet but this can be quite dif cult in 3D The designer needs to provide a 3D scrolling mechanism that is relatively easy to use Just attaching a scroll bar to the property list and having the user move it with the hand may be considerably more dif cult than it rst appears Mechanisms that automatically scroll the property sheet when the user gets close to its top or bottom could be much better choices One thought is to scale the items so they all t on the property sheet This may work if there are only one or two items to be squeezed onto the property sheet but otherwise it is probably a bad design choice The smaller the items are the harder it is to interact with them or in some cases even read them This will result in an interaction technique that quickly becomes impossible to use Even though property sheets have a number of problems when used in 3D user interfaces they are often one of the few choices available to the designer If the number and types of properties aren t known at design time the property sheet provides a nice general mechanism for modifying their values It can also be dif cult to design interaction techniques for some properties so directly entering the property value through a property sheet is one of the only alternatives Surrogate objects are another way of modifying nongeometrical properties but this approach requires some design effort In this approach another object called the surrogate object is used to represent the property to be modi ed Typically some geometrical property of the surrogate object is changed and this change is mapped onto the nongeometrical property of the target object A good example of this is the use of a colour cube to modify the colour of another object In this case the colour cube is the surrogate object and the user selects points within this cube to change the target object s colour Another example is the use of a three variable length lines forming the shape of a set of coordinate axis to adjust the moments of inertia for an object The user drags the line along one of the axis to change the moment of inertia for that axis In both of these cases there is a clear mapping from a geometrical property of the surrogate object onto a nongeometrical property of the target object These mappings 16th April 2002 563 Interaction Techniques for NonGeom etrical Manipulations Page 77 78 Chapter 5 Interaction In 3D tend to be property speci c and a consider amount of effort may be required to construct good ones T e main advantage of surrogate objects is that they provide an intuitive and easy to use way of modifying a nongeometrical property This can greatly improve the usability of a 3D user interface The main problem is that a new surrogate object must be designed for each property This can take a considerable amount of time and may not also be possible There is also the problem of positioning the surrogate object with respect to the target object The two objects should be reasonably close together so the user can see how the target object changes as he or she interacts with the surrogate But the two objects shouldn t be so close that they obscure each other Since eachs surrogate object is used to modify one property there could be a considerable amount of clutter if all the surrogates for an object are displayed at the same time For this reason the user must rst select the property to be modi ed and then the surrogate object is displayed while the user is modifying that property and when the modi cation is complete it disappears again 57 Navigation Navigation is used to move through the virtual environment This is task is more important in 3D user interfaces than it is in 2D user interfaces since objects in 3D obscure each other parts of themselves In order to completely view an object the user must be able to either move around the object or rotate the object In the case of large objects rotating the object might not be a viable alternative so navigation is necessary Virtual environments can also be quite large In the case of architectural walkthroughs the environment will be as large as a building so the user will need some way of moving through the environment Thus for many applications it isn t possible to display the entire environment at one time so the user must be able to move through the environment Navigation is also important in many visualization applications It provides the user with an easy way of controlling the level of detail in a visualization When the user is far from the visualization an overview of the visualization is presented and as he or she moves closer more details become available Navigation is a facility that must be provided in almost all 3D user interfaces so interaction techniques for this task are very important 571 General Design Considerations Navigation As stated previous there are two variations of the navigation task which are called local and global navigation While these task can be quite different they do have some comm on characteristics that will be reviewed rst before the subtasks are addressed individually Two closely related considerations for all navigation techniques are feedback and orientation Basically the navigation technique must provide enough feedback so the user is sure of the destination and when e user arrives there they must be able to quickly orient them selves We want to prevent the user from getting lost as they move through the environment This problem becomes worse if the navigation technique allows the user to cover a large distance in a short period of time since the user can then easily loose their orientation When the user starts to navigate the interaction technique must provide enough feedback to allow the user to control their motions If the motion is over a short distance the feedback can be limited to direction of movement and speed but as the distance become larger the user must be given con rm ation that they are headed in the right way In the case of local navigation the user can usually see the intended destination or at least its general location The user should be able to directly indicate the destination or the direction to travel and received some indication that he or she will arrive in the correct location This can be done by animating the motion That is move the user at a controlled velocity towards the destination showing that path the is being followed At any point in time the user can stop the motion if it isn t leading to the correct destination This approach assists with orienting the user once the destination is reached since there are no discontinuous changes in the user s location For global navigation feedback can be more dif cult Since the user can t directly see the destination there must be some way of selecting it and then con rming that the selection is correct The interaction technique must provide enough information on each possible destination for the user to select the one he or she is interested in This could vary from a list of location names to static or dynamic images of the destination In addition an easily invoked go back operation can make this type of navigation friendlier If the user ends up at the wrong destination they can easily return to their original location and select another one The orientation problem is easier to deal with in local navigation than it is in global navigation In local navigation as long as the path followed by the user is displayed there shouldn t be major orientation problems when the destination is reached The user will see the destination as its approached and will be able to assimilate Page 78 57 Navigation 16th April 2002 572 Interaction Techniques for Navigation 79 its surroundings For global navigation the user will just appear at a new location without the bene t of viewing it from the navigation path If the user is just dropped at the destination it could take some time to determine where things are in the new location or even the direction the user is facing There are several ways in which this can be solved First if the user has been to this location before they could be dropped at a familiar place for example the position they were end when they last left the destination Or they could have a standard position in each possible destination where they will always be placed when they go there Second the motion to the destination could take a short period of time and an overview of the destination presented during this time This would give the user an opportunity to survey the lay of the land instead of just being dropped in Third a map of the destination could be provided so the user has some way of nding major land marks and the locations of important objects when they rst arrive These approaches will help the user become oriented quicker so they can get on with their real task In the case of local navigation techniques the main concerns are providing a quick way of moving through the environment maintaining the user s orientation and ensuring that the user is always in control of the motion Local navigation is a frequent task in many 3D user interfaces since the user often needs to move through the environment to view objects from different directions or interact with other objects As a result the navigation technique must be quickly activated and required minimal interaction to operate Invocating navigation from a menu that might not always be available or visible can be a disaster since the user can get to parts of the environment that he she might not be able to get out of Due to the frequency of navigation operations if many interactions are required for even simple motions the user will quickly tire and fatigue will set in In the real world where 3D motion is freely available we rarely take advantage of the full 6 degrees of freedom available to us Our normal walking motion is basically a 2D motion where we fall the ground level essentially eliminating the third translational degree of freedom In addition when walking we rarely change the orientation of our head except to rotate it about the body s vertical axis Thus our normal walking motion involves only three of the available six degrees of freedom The same is true of most vehicles except those that move through the air or under water but only a small proportion of users are familiar with operating these vehicles On the other hand the devices that are often used to control local navigation have six degrees of freedom If all size degrees of freedom are used there is a good chance that the user will loose control of the interaction and quickly become disoriented We need to restrict the motion so the user can intuitively control their motion and not end up following a path that would never occur in the real world In addition to controlling the degrees of freedom we must be careful about the relationship between the line of sight and the direction of motion Most of the time in the real world are motion is either along the line of sight or close to at We rarely walk for extended periods of time without looking in the direction that we are traveling Similarly car drivers almost always look in the direction that the car is heading Local navigation techniques should take advantage of this observation to reduce user disorientation and increase the level of user control If a navigation technique allows the user to move in a direction that isn t close to the line of site the user will usually have considerable problems controlling it unless the motion is relative to a landmark in the environment For global navigation the main concerns are in selecting the destination and quickly orienting the user once the destination is reached In the case of global navigation the user can t directly see the destination so some other mechanism must be provided for selecting it The selection technique must provide enough information that the user can quickly select the intended destination The user must also be able to quickly back out of the navigation operation if it leads to the wrong destination The amount of information that can be presented on each available destination depends upon the number of destinations and the display bandwidth that is available If there are a large number of destinations possibly only the name of the destination or a small picture of it can be displayed If there are few destinations then larger images or possibly animation of the destinations can be provided Another approach is to provide objects in the environment that are associated with possible destinations and user interactions with these operations will lead to the associated destination When the user arrives at the destination there is the problem of disorientation The user may not know the structure of the new location and his or her position there It may take several seconds or more to become ac customed to the new location which can be quite disturbing to many users As outlined above there are several ways of approaching this problem including animation into the new location providing the user with maps and returning the user to a familiar position 572 Interaction Techniques for Navigation This section starts with a discussion of the interaction techniques for local navigation and then moves onto the ones that have been developed for global navigation 16th April 2002 572 Interaction Techniques for Navigation Page 79 80 Chapter 5 Interaction In 3D One of the rst techniques proposed for local navigation is ying which is really a collection of related interac tion techniques The basic idea behind this collection of interaction techniques is to point a tracker in the direction of travel and then press a button to start the movement The user controls the direction of motion by moving the tracker and stops the motion by releasing the button While this technique works it can be very dif cult to control if some constraints aren t applied to tracker values The original implementations of this technique were based on a DataGlove or a tracker held in the user s hand In this case the user was free to point in any direction and there wasn t any relationship between the direction of motion and the user s line of sight User s frequently became lost or disoriented and had considerable dif culty with accurate positioning Another problem with this technique is that there is no control over the speed of motion The user always moved at the same rate which was too fast for accurate positioning and too slow for traveling long distances ere are several ways of improving the ying navigation technique One obvious way is to restrict the motion to the user s line of sight In this case the user is less likely to become disoriented since he or she is always looking in the direction of motion Another approach is to restrict the direction in which the user can move For example the user must follow the ground level removing the freedom to move up and down This reduction in the degrees of freedom facilitates the navigation tasks by simplifying its control The addition of a speed control mechanism further increases the usability of these navigation techniques A simple speed control mechanism is to have the speed proportional the time that the button is pressed The longer the user holds the button the faster he or she will go If the user is only moving a small distance the button press will have a short duration but if the movement is further the press will be longer By increasing the speed while the button is pressed distant locations will be reached faster To slow down the user can release the button and then press it again If there are two trackers the user can use one tracker to specify the direction of motion and the other one to specify the speed If the speed control tracker points up the speed increases and if it points down the speed decreases The use of a speed control facilitates the use of the same interaction technique for both precise motions and rapid travel over long distances Another early local navigation technique is based on the use of virtual vehicles In this approach the user controls a vehicle that moves them through the environment The user is presented with the controls for this vehicle that he or she interacts with to produce the motion through the environment For example there could be one control for steering in the xy plane and another control for the speed of the motion This can provide a more intuitive model for motion in the virtual environment In some cases a real vehicle such as a bicycle has been used with this approach An exercise bicycle can be instruments to measure the direction of the handle bars and the rate of peddling the control both the direction and speed of motion A force can be applied to the wheels to simulate the terrain that the bicycle is moving over to give the user more of a feel for the environment Another technique is to point at a location near the destination to select the general direction of motion and then user the tracker to control the speed at which the destination is approach This is an interaction at a distance approach to navigation The user res a ray from a tracker and this ray is used to select an object close to the intended destination The user then move along a line to the destination This removes the steering problem from navigation as long as the user knows where they want to go If the user is exploring the environment they may not know where they want to go They want to move through the environment and observe its structure as they move On the other hand the user may want to interact with a particular object so in this case its more ef cient to move directly to the object of interest The direction interaction equivalent of the above approach is based on Worlds in Miniature or a similar scaling approach The user holds a scaled down version of the environment in one hand and uses the other hand to move a miniature version of the user within the environment When the user is nished selecting the new position the miniature version of the environment is scaled up with the user at the new position The user is then holding a new version of the miniature environment that can be used for further interaction This approach gives the user a good idea of where they will end up after the navigation operation since they can easily view all the objects in its neighbourhood The use of a miniature version of the environment restricts the size of environment that can be used with this approach If the environment is too large its dif cult to select the user s representation in it and position it at the desired location In addition when the environment is scaled up at the end of the navigation operation the user could be moved through objects that might result in some disorientation In the case of immersive environments the user can perform some local navigation by physically walking or moving in real space The range of this navigation technique is controlled by the range of the head tracker For most tracking technologies the user has a working range on the order of one or two meters A clutching mechanism could be used to increase the range of motion but this would be quite awkward for traveling more than a few meters Page 80 572 Interaction Techniques for Navigation 16th April 2002 5 8 Object Combination 81 There are several techniques for global navigation and the most important component of each of these tech niques is the mechanism used to select the destination 1f the environment has a well de ned structure a map can be used for global navigation This map could take the form of a oor plan that shows all the major areas in the environment or if could take the form of a road or tourist map The user moves to a new location by clicking on the map at the desired location This approach is quite natural and reduces disorientation since the user has a clear vision of where he or she is going If the environment doesn t have a structure that can easily be converted to a 2D map this isn t a particularly good technique Another approach is to provide the user with a list of potential destinations This list can take many forms It could be as simple as a list of destination names In this case the user must know the correspondence between des tination name and location A set of icons that represent the destinations provides the user with more information The icon could be a picture of the destination so the user has some idea of where they will land Another approach is to construct a door at the current location for each potential destination The door can be labeled with the name or picture of the destination and when the user goes through the door they arrive at the new location This could be arranged hierarchically where a door leads to a hallway that has many doors leading from it Destinations can be associated with objects in the environment Quite often these objects have been called portals and they present a view of the destination This view could be a miniature view of the destination showing the current activities there This provides a very good indication of where the portal leads and what the user will nd there This could reduce disorientation when the user arrives at the new location The object in the current environment could be something associated with the destination This could be some representation of the task that can be performed there or the types of objects manipulated there For example if the destination provided facilities for modeling cars it could be represented by a miniature car in the current location n summary there are a number of general purpose local navigation techniques and each application should support at least one of them There is still room for innovation in this area and studies of the effectiveness of local navigation techniques Global navigation techniques tend to be more application dependent focusing on effective ways of representing the destination so the user can easily nd the correct location There is plenty of room for the development of new global navigation techniques 58 Object Combination Many design applications require the user to combine several existing objects to produce a new object This is most obvious in geometrical modeling where the user starts with a collection of primitive objects and combines them using placement or CSG operations to produce more complex objects This could also occur in more abstract settings such as network modeling where the user combines several icons representing application objects to produce a model of a network or similar object This section describes the interaction techniques that have been developed to precisely position objects so they can easily be combined 581 General Design Considerations Object Combination The main activity in object combination tasks is to precisely place one object with respect to another This posi tioning operation is required before the two objects can be combined The positioning becomes more complex if three or more objects are involved since there may be several positioning constraints that must be met In general its not good enough to use visual inspection to determine if the objects are correctly placed Many algorithms such as CSG require the objects to be precisely placed or the algorithms will fail In other applications the objects may look correct at one scale but when the user zooms in on them the inaccuracies in their placement will quickly become apparent The position accuracy required depends upon the objects involved in the operation For many geometrical operations vertices edges or faces of the two objects must exactly match so these object features are the ones that must be considered In other applications such as diagram or network construction the match doesn t need to be as precise As long as the two object touch everything is okay In many ways the particular technique used depends upon the application and the following section concentrates on object combination in geometrical modeling with the observation that since other applications need less precision these technique can be applied there as well 16th April 2002 58 Object Combination Page 81 82 Chapter 5 Interaction In 3D 582 Interaction Techniques for Object Combination One of the standard techniques used in 2D user interface is grids In 2D a two dimensional grid is constructed over the applications space and objects are forced to lie on the grid intersections This can be done quite easily using integer division the object s coordinates are divided by the grid spacing to determine the intersection point they should be moved to Rounding can be used to get the closest grid point or truncation used to move the object left and down The same approach can also be used in 3D a 3D grid is constructed and integer division is used to computer the grid intersection point Aligning obj ects to grids is quite easy and ef cient but it only accomplishes the obj ect combination task some of the time If the size of the objects is a multiple of the grid spacing then their faces will lie to the grid lines and this approach can be used to line up the faces This assumes that the objects are square or rectangular in shape If this isn t the case the use of grids doesn t help very much One of the most powerful ways of combining objects is to use constraints Each object de nes a set of con straints usually based on their faces and edges When objects are moved close together these constraints are used to join them together without the need of precise positioning on the part of the user To see how this technique works consider the constraints that could be de ned for a cube Each of the cube s faces produces a constraint that tries to align the face of another object to that face So if another object such as a tetrahedron is brought close to a cube face the closest tetrahedron face will be pulled to the closest cube face and the two faces will be joined Similarly an axis constraint can be de ned for each face This constraint is a line that starts in the center of the cube and passes through the center of a face One of the important features of a cylinder is its axis that run perpendicular to its circular bases A cylinder can be aligned with a cube by forcing its axis to lie along one of the cube s axis constraints A plane constraint can then be applied to the cylinder to pull it to the cube and attach one of its bases to the cube face Alternatively the cylinder could be moved along the constraint axis so it bases throuin the cube It could then be used in a subtraction operation to drill a hole through the cube Each object de nes its own set of constraints In the case of the cube object each face generated a plane constraints and an axis constraint Edges and vertices could also be used to de ne constraints For example each object edge could de ne an axis constraint In a geometrical modeler constraints can be de ned for each of the primitive objects and these constraints can then be used to accurately position the primitive objects There are three important questions associated with the use of constraints how do we select the active constraints how many constraints can be active at one time and what constraints are used for complex obj ects that are constructed for several primitive objects Constraint selection can be based on the proximity of the two objects or alternatively how close they are to satisfying one of the constraints For example when two cubes are brought close together so a pair of faces are facing each other the relevant plane constraint is activated to bring these two faces together Each of the constraints between a pair of close objects can be evaluated to determine the constraints that is closest to being satis ed and that is the constraint that is selected This implies that each constraint has a function that determines how close it is to being satis ed This function could return a value between 0 and l with 1 indicating complete satisfaction and 0 indicating that the constraint is applicable The user interface searches through the set of constraints nding the one that produces the highest value less than 1 we don t need to process a constraint that is already satis ed The question of how many constraints is a bit more dif cult In the case of positioning the cylinder with respect to the cube two constraints were involved one that aligned the cylinder axis with an axis constraint and another that aligned its base with the cube face In this case we want to use all the applicable constraints but this might not always be the case It could be the case that the set of constraints are contradictory and there is not possible way to satisfy them In this case the user interface must either give in or select some subset of the constraints to satisfy This condition is fairly easy to detect and the user can be warned about it Where the problems start occurring is when one of the active constraints pulls the obj ect in a direction that the user doesn t want In order to activate the constraint the user wants the user may need to position the object so the unwanted constraint is also activated In this case the user needs some way of turning off the unwanted constraint One solution to the constraint activation problem is to give the user some control over the constraints that can be selected Since there will typically be a large number of potential constraints it can be quite dif cult to directly select the constraint to be used For example the user could select from a menu whether plane or axis constraints should be used This reduces the set of potentially active constraints and thus the probability that an active constraint will move the object away from the desired position When two or more obj ects are combined there is the problem of determining the set of constraints for the new object One approach is to combine all the constraints from the primitive objects its constructed from This is easy Page 82 582 Interaction Techniques for Object Combination 16th April 2002 59 2D Techniques in 3D 83 to implement and produces reasonable behavior most of the time One potential problem with this approach is that primitive object features that are now inside the composite object can still generate constraints Thus there could be active constraints that don t correspond to any object feature that is visible to the user This could produce some unexpected behavior if the user isn t aware of the internal structures On the other hand the user may want to take advantage of the constraints de ned by internal features In CSG modeling the internal features of an intermediate object may be important in de ning the nal object They could be used to align another object that will be used in a cut operation that will expose these internal features When several primitive objects are combined there is always the possibility that con icting sets of constraints could be produced This should be detected and it may be necessary to deactivate some of the constraints involved It may not be possible to completely delete these constraints since a subsequent CSG operation could remove the problem Constraints are a powerful technique for object combination and should be seriously considered for any mod eling application There are still problems in de ning a good user interface to this technique and more work needs to be done on general constraint systems for 3D user interfaces 59 2D Techniques in 3D There is a need for one a two dimensional operations in 3D user interfaces This is the same as 2D user interfaces where many lD operations are supported One of the best examples of a 1D task embedded in a 2D user interface is scrolling The thumb of a scroll bar controls the linear position within a document and it has one degree of freedom This interaction technique is embedded so seemlessly into the 2D user interface that we don t think of it as a 1D operation We need to be able to do the same thing with l and 2 dimensional operations in 3D user interfaces The interaction techniques must t in so well that the user isn t aware of performing lower dimensional operations To incorporate lower dimensional interaction techniques in 3D user interfaces we must solve both the display and input problems The display problem deals with how the interaction technique is displayed in the 3D environ ment with a visual representation that suggests the type of operation that can be perform ed In the case of scroll bars in 2D user interfaces the scroll bar is given a 2D appearance even though the thumb can only move in one dimension The scroll bar be displayed as a line with one pixel thickness and still get the same amount of infor mation across but this visual doesn t blend in with the other 2D interaction techniques We need to take the same approach in 3D esh out the display so that its visual representation ts in with other 3D interaction techniques but suggests the lower dimensional operation Similarly user actions need to be mapped onto the interaction tech nique s values This often involves restricting the number of degrees of freedom involved in the interaction In the scroll bar example only one of the mouse s two dimensions is used to control the position of the thumb and over a small range of values the other dimension is ignored A ain in 3D we need to restrict the degrees of freedom supplied by the input device without giving the user the impression that they are being restricted One obvious way of adding 1 and 2 dimensional operations to a 3D user interface is to embed the 3D user interface in a standard GUT environment The 3D part of the user interface is a separate window within the application and the other windows can supply all the standard 2D interaction techniques While this provides the functionality that we need it doesn t integrate into the 3D user interface If an immersive environment is used it doesn t work at all since the user can t easily access the 2D input devices required to use to the GUT and may not be able to view the non3D part of the application Even in desk top applications this is rarely an acceptable solution If the 2D interaction are rarely used and don t deal with individual objects in the 3D environment it could be a workable solution For example a 2D GUT could be used to load and save les since these operations don t occur very often typically at the start and end of a session In addition most le operations refer to the entire environment so there is no need to associate them with individual objects The use of a surrounding GUT has problems when the 2D operations must refer to 3D objects In this case 2D and 3D interaction techniques must be combined For example the user must rst use a 3D interaction technique to select and object in the virtual environment and then switch devices to interact with it through the 2D GUT This complicates the interaction and makes it very dif cult to develop smooth interaction sequences In addition the combination of 2D and 3D graphics removes some of the sense of immersion that occurs in a purely 3D environment The opposite approach to this problem is to embed to 2D GUT in a 3D environment A 2D GUI is drawn on a 2D plane and all interactions with the GUI occur in this plane There is no reason why the GUI couldn t be drawn on any at surface in the environment This approach can be viewed as placing a set of control panels in the environment so this interaction techniques is sometimes called panels There are at least two ways in which 16th April 2002 59 2D Techniques in 3D Page 83 84 Chapter 5 Interaction In 3D this can be done First a 2D graphics API similar to that used in a standard 2D GUI could be developed to display the graphics on any plane surface in the environment The GUI can then be coded in this API in the same way that a standard 2D GUI is coded now This involves some extra programming effort to produced the API and rewrite the interaction techniques but the resulting interaction techniques are quite ef cient Second the GUI can be displayed in an off screen bitmap and this bitmap can then be texture mapped onto one of the surfaces in the 3D environment This involves less coding if the original API allows for writing in offscreen memory but could have a severe performance hit due to the use of a texture map that must be dynamically updated The choice between these two approaches depends upon the available programming and hardware resources Interacting with a panel is based on a cursor that is displayed on its surface and controlled by one of the 3D input devices One way of doing this is to cast a ray from the tracker and if its intersects the panel the intersection point de nes the cursor position If the panel is close this works reasonably well but if its far away noise in the tracker may make if dif cult to control the cursor Another approach is to rst select the panel possibly using a ray based technique and then using some other tracker motion to control the cursor For example rotating the tracker about two of its axis can be used to control the two degrees of freedom in the cursor s motion To account for tracker noise there can be a constant mapping between angular de ection and the amount of cursor motion If a joystick or similar input device is used two axis of the joystick could be mapped to the panel s surface and used to control the cursor The control mechanism largely depends upon the available input devices and how this control can be merged with the 3D interaction that is already used in the environment If the panel is within reach the user can directly interact with it by placing his or her hand over the desired interaction technique and using the hand as a cursor There are several alternative for placing panels within the environment depending upon the objects they are related If all the interaction technique on a panel are related to one object than the logical placement for the panel is either on the object itself or just next to it There could also be a button on the object that when pressed pops up the panel so the user can interact with it If the panel contains a general set of interaction techniques that can be applied to a wide range of objects then it might make more sense to have the panel follow the user This could either be a full version of the panel or a small button that can be selected when the user wants to interact with the panel Panels can also be hidden above or below the user and the user can grab them and drag them into place when an interaction technique is required Panels that follow the user can cause placement problems If the panels are large they can obscure a signi cant portion of the environment it dif cult for the user to interact with the objects in it This is one of the main reasons for collapsing panels when the user isn t interaction with them Similarly when a panel is popped up we don t want objects in the environment to obscure the user s view of it One way of doing this is to place the panel between the user and the other objects in the environment but this could result in a panel that s too close to the user to easily interact with Another approach is to make the objects between the user and the panel transparent so it is visible through them and the user can easily interact with it The use of transparency has an impact on display time but the usability advantage is well worth while 510 Interaction Technique Summary This chapter presents an overview of interaction techniques for 3D user interfaces There is still a considerable amount of work that needs to be done in this area in at least the following three areas First there is a need for more and better 3D interaction techniques Very few 3D interaction techniques have been developed we need to develop more techniques explore the design space and provide the user interface designer with a wider range of alternatives Second there needs to be more experience with building 3D user interfaces in order to determine which interaction techniques work well together and how to best assemble interaction techniques to produce a usable interface Third there needs to be experimental evaluations of the various interaction techniques and complete 3D user interfaces Experimental evaluations will give us some idea of how the different interaction techniques perform with respect to each other and the strong and weak points of each interaction technique In addition we need to evaluate complete user interfaces to determine which combinations of interaction technique work the best and the aspects of user interface design that have the most effect on usability Page 84 510 Interaction Technique Summary 16th April 2002 Bibliography 1 Ronald Azuma and Gary Bishop A FrequencyDomain analysis of headmotion prediction In Robert Cook editor SIGGRAPH 95 Conference Proceedings Annual Conference Series pages 4017408 ACM SIGGRAPH Addison Wesley August 1995 held in Los Angeles California 0611 August 1995 2 Durand R Begault 3D Soundfor VirtualReality andMultimedia Academic Press Cambridge MA 1994 3 Clifford M Beshers and Steven K Feiner Realtime 4D animation on a 3D graphics workstation In Pro ceedings ofGraphics Interface 88 pages 177 June 1988 4 Frederick P Brooks Jr Walkthrough A Dynamic Graphics System for Simulating Virtual Buildings In Frank Crow and Stephen M Pizer editors Proceedings 1986 Workshop on Interactive 3D Graphics pages 9721 Chapel Hill North Carolina 1986 ACM SIGGRAPH 5 Frederick P Brooks Jr Grasping Reality Through Illusion Interactive Graphics Serving Science In Pro ceedings ofACM CHI 88 Conference on Human Factors in Computing Systems pages 1711 ACM SIGCHI Washington DC May 1988 6 Frederick P Brooks Jr Ming OuhYoung James J Batter and P Jerome Kilpatrick Project GROPE Haptic displays for scienti c visualization In Forest Baskett editor Computer Graphics SIGGRAPH 90 Proceedings volume 24 pages 1777185 August 1990 7 Steve Bryson and Creon Levit Virtual Wind Tunnel An Environment for the Exploration of Three Dim en siona1 Unsteady Flows In Proceedings of Visualization 91 San Diego California October 1991 IEEE 8 Stuart K Card Thomas P Moran and Allen Newe11 The Psychology of HumanComputer Interaction Lawrence Erlbaum Associates Hi11sda1e New Jersey 1983 9 Carolina CruzNeira Daniel J Sandin and Thomas A DeFanti Surroundscreen projectionbased virtual reality The design and implementation of the CAVE In James T Kajiya editor Computer Graphics SIG GRAPH 93 Proceedings volume 27 pages 1357142 August 1993 10 Steven Feiner and Clifford Beshers Visualizing ndimensional virtual worlds with nvision In Rich Riesen fe1d and Carlo Sequin editors Computer Graphics 1990 Symposium on Interactive 3D Graphics vol ume 24 pages 3738 March 1990 11 SS Fisher M McGreevy J Humphries and W Robinett Virtual Environment Display System In Frank Crow and Stephen M Pizer editors Proceedings ofACM 1986 Workshop on Interactive 3D Graphics pages 77787 Chapel Hill North Carolina 1986 ACM SIGGRAPH 12 J D Foley A van Dam Steven K Feiner and John F Hughes Fundamentals ofInteractive Computer Graphics AddisonWesley Publishing Company second edition 1990 13 Thomas Funkhouser Ingrid Carlbom Gary E1ko Gopal Pingali Mohan Sondhi and Jim West A Beam Trac ing Approach to Acoustic Modeling for Interactive Virtual Environments pages 21732 ACM SIGGRAPH Addison Wesley August 1998 held in Orlando Florida 1924 July 1998 14 Frank A Geldard Sensory Saltation Metastability in the Perceptual World Lawrence Erlbaum Associates Hi11sda1e NJ 1975 15 J J Gibson The Ecological Approach to Visual Perception Houghton Mif in Company Boston 1979 16 Larry F Hodges Benjamin A Watson G Drew Kess1er Barbara 0 Rothbaum and Dan Opdyke Virtually conquering fear of ying IEEE Computer Graphics amp Applications 16642749 November 1996 ISSN 02821716 17 Jeffrey Jacobson Mark S Redfern Joseph M Furm an Susan L Whitney Patrick J Sparto Jeffrey B Wilson and Larry F Hodges Facility for Research and Rehabilitation of Balance Disorders InACM Virtual Reality 85


Buy Material

Are you sure you want to buy this material for

25 Karma

Buy Material

BOOM! Enjoy Your Free Notes!

We've added these Notes to your profile, click here to view them now.


You're already Subscribed!

Looks like you've already subscribed to StudySoup, you won't need to purchase another subscription to get this material. To access this material simply click 'View Full Document'

Why people love StudySoup

Bentley McCaw University of Florida

"I was shooting for a perfect 4.0 GPA this semester. Having StudySoup as a study aid was critical to helping me achieve my goal...and I nailed it!"

Amaris Trozzo George Washington University

"I made $350 in just two days after posting my first study guide."

Jim McGreen Ohio University

"Knowing I can count on the Elite Notetaker in my class allows me to focus on what the professor is saying instead of just scribbling notes the whole time and falling behind."

Parker Thompson 500 Startups

"It's a great way for students to improve their educational experience and it seemed like a product that everybody wants, so all the people participating are winning."

Become an Elite Notetaker and start selling your notes online!

Refund Policy


All subscriptions to StudySoup are paid in full at the time of subscribing. To change your credit card information or to cancel your subscription, go to "Edit Settings". All credit card information will be available there. If you should decide to cancel your subscription, it will continue to be valid until the next payment period, as all payments for the current period were made in advance. For special circumstances, please email


StudySoup has more than 1 million course-specific study resources to help students study smarter. If you’re having trouble finding what you’re looking for, our customer support team can help you find what you need! Feel free to contact them here:

Recurring Subscriptions: If you have canceled your recurring subscription on the day of renewal and have not downloaded any documents, you may request a refund by submitting an email to

Satisfaction Guarantee: If you’re not satisfied with your subscription, you can contact us for further help. Contact must be made within 3 business days of your subscription purchase and your refund request will be subject for review.

Please Note: Refunds can never be provided more than 30 days after the initial purchase date regardless of your activity on the site.