David Zhang, Gooitzen van der Wal, Saurabh Farkya, Thomas Senko, Aswin Raghavan, Michael Isnardi, Michael Piacentino (2022) CVPR 2022 / ECV 2022, New Orleans, LA, June 19-20, 2022
We present a scalable in-pixel processing architecture that can reduce the data throughput by 10X and consume less than 30 mW per megapixel at the imager frontend. Unlike the state-of-the-art (SOA) analog process-in-pixel (PIP) that modulates the exposure time of photosensors when performing matrix-vector multiplications, we use switched capacitors and pulse width modulation (PWM). This non-destructive approach decouples the sensor exposure and computing, providing processing parallelism and high data fidelity. Our design minimizes the computational complexity and chip density by leveraging the patch-based feature extraction that can perform as well as the CNN. We further reduce data using partial observation of the attended objects, which performs closely to the full frame observations. We have been studying the reduction of output features as a function of accuracy, chip density and power consumption from a transformer-based backend model for object classification and detection.