We articulate a framework for using computational modeling to coherently integrate the design of science and engineering curricular experiences. We describe how this framework informs the design of the Water Runoff Challenge (WRC), a multi-week curriculum unit and modeling environment that integrates Earth science, engineering, and computational modeling for upper elementary and lower middle school students. In the WRC, students develop conceptual and computational models of surface water runoff, then use simulations incorporating their models to develop, test, and optimize solutions to the runoff problem. We conducted a classroom pilot study where we collected students’ learning artifacts and data logged from their use of the computational environment. We illustrate opportunities students had to integrate science, engineering, and computational thinking during the unit in a pair of contrasting vignettes.
Exact Inference for Relational Graphical Models with Interpreted Functions: Lifted Probabilistic Inference Modulo Theories
Probabilistic Inference Modulo Theories (PIMT) is a recent framework that expands exact inference on graphical models to use richer languages that include arithmetic, equalities, and inequalities on both integers and real numbers. In this paper, we expand PIMT to a lifted version that also processes random functions and relations. This enhancement is achieved by adapting Inversion, a method from Lifted First-Order Probabilistic Inference literature, to also be modulo theories. This results in the first algorithm for exact probabilistic inference that efficiently and simultaneously exploits random relations and functions, arithmetic, equalities and inequalities.
We present SGDPLL(T ), an algorithm that solves (among many other problems) probabilistic inference modulo theories, that is, inference problems over probabilistic models defined via a logic theory provided as a parameter (currently, propositional, equalities on discrete sorts, and inequalities, more specifically difference arithmetic, on bounded integers). While many solutions to probabilistic inference over logic representations have been proposed, SGDPLL(T ) is simultaneously (1) lifted, (2) exact and (3) modulo theories, that is, parameterized by a background logic theory. This offers a foundation for extending it to rich logic languages such as data structures and relational data. By lifted, we mean algorithms with constant complexity in the domain size (the number of values that variables can take). We also detail a solver for summations with difference arithmetic and show experimental results from a scenario in which SGDPLL(T ) is much faster than a state-of- the-art probabilistic solver.
There is increasing interest in broadening participation in computational thinking (CT) by integrating CT into precollege STEM curricula and instruction. Science, in particular, is emerging as an important discipline to support integrated learning. This highlights the need for carefully designed assessments targeting the integration of science and CT to help teachers and researchers gauge students’ proficiency with integrating the disciplines. We describe a principled design process to develop assessment tasks and rubrics that integrate concepts and practices across science, CT, and computational modeling. We conducted a pilot study with 10 high school students who responded to integrative assessment tasks as part of a physics-based computational modeling unit. Our findings indicate that the tasks and rubrics successfully elicit both Physics and CT constructs while distinguishing important aspects of proficiency related to the two disciplines. This work illustrates the promise of using such assessments formatively in integrated STEM and computing learning contexts.
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment. This study differs from the majority of speaker recognition research, which focuses on speech acquisition over short distances, such as when using a telephone handset or mobile device or far-field microphone arrays, for which beamforming can enhance distant speech signals. We use two large-scale corpora collected by retransmitting speech data in reverberant environments with multiple microphones placed at different distances. We first characterize three different speaker recognition systems ranging from a traditional universal background model (UBM) i-vector system to a state-of-the-art deep neural network (DNN) speaker embedding system with a probabilistic linear discriminant analysis (PLDA) back-end. We then assess the impact of microphone distance and placement, background noise, and loudspeaker orientation on the performance of speaker recognition system for distant speech data. We observe that the recently introduced DNN speaker embedding based systems are far more robust compared to i-vector based systems, providing a significant relative improvement of up to 54% over the baseline UBM i-vector system, and 45.5% over prior DNN-based speaker recognition technology.
Deep neural network (DNN)-based speaker embeddings have resulted in new, state-of-the-art text-independent speaker recognition technology. However, very limited effort has been made to understand DNN speaker embeddings. In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard Mel frequency cepstral coefficients (MFCC), as well as power normalized cepstral coefficients (PNCC), and perceptual linear prediction (PLP). Using a speaker recognition system based on DNN speaker embeddings and probabilistic linear discriminant analysis (PLDA), we compared different approaches to leveraging complementary information using score-, embeddings-, and feature-level combination. We report our results for Speakers in the Wild (SITW) and NIST SRE 2016 datasets. We found that first and second embeddings layers are complementary in nature. By applying score and embedding-level fusion we demonstrate relative improvements in equal error rate of 17% on NIST SRE 2016 and 10% on SITW over the baseline system.
With the recent introduction of speaker embeddings for text-independent speaker recognition, many fundamental questions require addressing in order to fast-track the development of this new era of technology. Of particular interest is the ability of the speaker embeddings network to leverage artificially degraded data at a far greater rate beyond prior technologies, even in the evaluation of naturally
degraded data. In this study, we aim to explore some of the fundamental requirements for building a good speaker embeddings extractor. We analyze the impact of voice activity detection, types of degradation,
the amount of degraded data, and number of speakers required for a good network. These aspects are analyzed over a large set of 11 conditions from 7 evaluation datasets. We lay out a set of recommendations for training the network based on the observed trends. By applying these recommendations to enhance the default recipe provided in the Kaldi toolkit, a significant gain of 13-21% on the Speakers in the Wild and NIST SRE’16 datasets is achieved.
This paper introduces the Voices Obscured in Complex Environmental Settings (VOiCES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for model training. Despite these efforts, model performance degrades when tested against uncurated speech in natural conditions. For this corpus, audio was recorded in furnished rooms with background noise played in conjunction with foreground speech selected from the LibriSpeech corpus. Multiple sessions were recorded in each room to accommodate for all foreground speech-background noise combinations. Audio was recorded using twelve microphones placed throughout the room, resulting in 120 hours of audio per microphone. This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.
Multi-domain language recognition involves the application of a language identification (LID) system to identify languages in more than one domain. This problem was the focus of the recent NIST LRE 2017, and this article presents the findings from the SRI team during system development for the evaluation. Approaches found to provide robustness in multi-domain LID include a domain-and-language-weighted Gaussian backend classifier, duration-aware calibration, and a source normalized multi-resolution neural network backend. The recently developed speaker embeddings technology is also applied to the task of language recognition, showing great potential for future LID research.