The task of having computers able to understand their environments through direct imaging has proved to formidable. With its beginnings about 30 years ago, the field of computer vision has grown as a major part for the pursuit for artificial intelligence. Most elements of this pursuit – language understanding, reasoning and planning, speech – are very difficult challenges, but vision, with its high dimensionality of space, time, scale, color,dynamics, and so forth, may be the most challenging. Early attempts to develop computer sivion focused on restricted situations in which it was feasible to provide the computer with fairly complete descriptions of what it would encounter. In such cases, single images provided the sensory information for analysis. As the domains of application grew, the requirements for more competent descriptions of the world increased. Dealing with three-dimensional (3D) dynamic structures (the real world) from 3D dynamic platforms (we humans) calls for greater capabilities on both the analysis and synthesis sides of the issue. The analysis side is the processing of sensory data for such tasks as recognition and navigation, and a number of techniques are discussed here for dealing with these two-, three-, and higher-dimensional data. The synthesis side is the construction of “internal’’ descriptions of what they may be used subsequently for the above tasks. This latter issue is the underlying theme we pose in this paper – developing representations from vision that will later enable effective automated operation in our 3D dynamic environments.