A Storage Representation For Efficient Access To Large, Multidimensional Arrays

Citation

Quam, L. H. (1980). A storage representation for efficient access to large multidimensional arrays. SRI International.

Abstract

This paper addresses problems associated with accessing elements of large multidimensional arrays when the order of access is either unpredictable or is orthogonal to the conventional order of array storage. Large arrays are defined as arrays that are larger than the physical memory immediately available to store them. Such arrays must be accessed either by the virtual memory system of the computer and operating system, or by direct input and output of blocks of the array to a file system. In either case, the direct result of an inappropriate order of reference to the elements of the array is the very time-consuming movement of data between levels in the memory hierarchy, often costing factors of three orders of magnitude in algorithm performance. The access to elements of large arrays is decomposed into three steps: transforming the subscript values of an n-dimensional array into the element number in a one-dimensional virtual array, mapping the virtual array position to physical memory position, and accessing the array element in physical memory. The virtual-to-physical mapping step is unnecessary on computer systems with sufficiently large virtual address spaces. This paper is primarily concerned with the first step. A subscript transformation is proposed that solves many of the order-of-access problems associated with conventional array storage. This transformation is based on an additive decomposition of the calculation of element number in the array into the sum of a set of integer functions applied to the set of subscripts as follows:

element-number(i,j,…) = fi(i) + fj(j) + …

Choices for the transformation functions that minimize access time to the array elements depend on the characteristics of the computer systems memory hierarchy and the order of accesses to the array elements. It is conjectured that given appropriate models for system and algorithm access characteristics, a pragmatically optimum choice can be made for the subscript transformation functions. In general these models must be stochastic, but in certain cases deterministic models are possible. Using tables to evaluate the functions fi and fj makes implementation very efficient with conventional computers. When the array accesses are made in an order inappropriate to conventional array storage order, this scheme requires far less time than for conventional array-accessing schemes; otherwise, accessing times are comparable. The semantics of a set of procedures for array access, array creation, and the association of arrays with file names is defined. For computer systems with insufficient virtual memory, such as the PDP-10, a software virtual-to-physical mapping scheme is given in Appendix C. Implementations are also given in the appendix for the VAX and PDP-10 series computers to access pixels of large images stored as two-dimensional arrays of n bits per element.


Read more from SRI