The EcoCyc model-organism database collects and summarizes experimental data for Escherichia coli K-12. EcoCyc is regularly updated by the manual curation of individual database entries, such as genes, proteins, and metabolic pathways, and by the programmatic addition of results from select high-throughput analyses. Updates to the Pathway Tools software that supports EcoCyc and to the web interface that enables user access have continuously improved its usability and expanded its functionality. This article highlights recent improvements to the curated data in the areas of metabolism, transport, DNA repair, and regulation of gene expression. New and revised data analysis and visualization tools include an interactive metabolic network explorer, a circular genome viewer, and various improvements to the speed and usability of existing tools.
Information & computer science publications
Background: The Metabolic Network Explorer is a new addition to the BioCyc.org website and the Pathway Tools software suite that supports the interactive exploration of metabolic networks. Any metabolic network visualization tool must by necessity show only a subset of all possible metabolite connections, or the results will be visually overwhelming. Existing tools, even those that purport to show an organism’s full meta- bolic network, limit the set of displayed connections based on predefined pathways or other preselected criteria. We sought instead to provide a tool that would give the user dynamic control over which connections to follow.
Results: The Metabolic Network Explorer is an easy-to-use, web-based software tool that allows the user to specify a starting metabolite of interest and interactively explore its immediate metabolic neighborhood in either or both directions to any desired depth, letting the user select from the full set of connected reactions. Although, as for other tools, only a small portion of the metabolic network is visible at a time, that portion is selected by the user, based on the full reaction complement, and it is easy to switch among alternate paths of interest. The display is intuitive, customizable, and provides copious links to more detailed information pages.
Conclusions: The Metabolic Network Explorer fills a gap in the set of metabolic network visualization tools and complements other modes of exploration. Its primary strengths are its ease of use, diagrams that are intuitive to biologists, and its integration with the broader corpus of data provided by a BioCyc Pathway/Genome Database.
Background: The diagnosis of posttraumatic stress disorder (PTSD) is usually based on clinical interviews or self‐report measures. Both approaches are subject to underand over‐reporting of symptoms. An objective test is lacking. We have developed a classifier of PTSD based on objective speech‐marker features that discriminate PTSD cases from controls.
Methods: Speech samples were obtained from warzone‐exposed veterans, 52 cases with PTSD and 77 controls, assessed with the Clinician‐Administered PTSD Scale. Individuals with major depressive disorder (MDD) were excluded. Audio recordings of clinical interviews were used to obtain 40,526 speech features which were input to a random forest (RF) algorithm.
Results: The selected RF used 18 speech features and the receiver operating characteristic curve had an area under the curve (AUC) of 0.954. At a probability of PTSD cut point of 0.423, Youden’s index was 0.787, and overall correct classification rate was 89.1%. The probability of PTSD was higher for markers that indicated slower, more monotonous speech, less change in tonality, and less activation. Depression symptoms, alcohol use disorder, and TBI did not meet statistical tests to be considered confounders.
Conclusions: This study demonstrates that a speech‐based algorithm can objectively differentiate PTSD cases from controls. The RF classifier had a high AUC. Further validation in an independent sample and appraisal of the classifier to identify those with MDD only compared with those with PTSD comorbid with MDD is required.
We present SGDPLL(T ), an algorithm that solves (among many other problems) probabilistic inference modulo theories, that is, inference problems over probabilistic models defined via a logic theory provided as a parameter (currently, propositional, equalities on discrete sorts, and inequalities, more specifically difference arithmetic, on bounded integers). While many solutions to probabilistic inference over logic representations have been proposed, SGDPLL(T ) is simultaneously (1) lifted, (2) exact and (3) modulo theories, that is, parameterized by a background logic theory. This offers a foundation for extending it to rich logic languages such as data structures and relational data. By lifted, we mean algorithms with constant complexity in the domain size (the number of values that variables can take). We also detail a solver for summations with difference arithmetic and show experimental results from a scenario in which SGDPLL(T ) is much faster than a state-of- the-art probabilistic solver.
Viral proteins evade host immune function by molecular mimicry, often achieved by short linear motifs (SLiMs) of three to ten consecutive amino acids (AAs). Motif mimicry tolerates mutations, evolves quickly to modify interactions with the host, and enables modular interactions with protein complexes. Host cells cannot easily coordinate changes to conserved motif recognition and binding interfaces under selective pressure to maintain critical signaling pathways. SLiMs offer potential for use in synthetic biology, such as better immunogens and therapies, but may also present biosecurity challenges. We survey viral uses of SLiMs to mimic host proteins, and information resources available for motif discovery. As the number of examples continues to grow, knowledge management tools are essential to help organize and compare new findings.
Interpreting changes in metabolite abundance in response to experimental treatments or disease states remains a major challenge in metabolomics. Pathway Covering is a new algorithm that takes a list of metabolites (compounds) and determines a minimum-cost set of metabolic pathways in an organism that includes (covers) all the metabolites in the list. We used five functions for assigning costs to pathways, including assigning a constant for all pathways, which yields a solution with the smallest pathway count; two methods that penalize large pathways; one that prefers pathways based on the pathway’s assigned function, and one that loosely corresponds to metabolic flux. The pathway covering set computed by the algorithm can be displayed as a multi-pathway diagram (“pathway collage”) that highlights the covered metabolites. We investigated the pathway covering algorithm by using several datasets from the Metabolomics Workbench. The algorithm is best applied to a list of metabolites with significant statistics and fold-changes with a specified direction of change for each metabolite. The pathway covering algorithm is now available within the Pathway Tools software and BioCyc website.
The Omics Dashboard is a software tool for interactive exploration and analysis of gene-expression datasets. The Omics Dashboard is organized as a hierarchy of cellular systems. At the highest level of the hierarchy the Dashboard contains graphical panels depicting systems such as biosynthesis, energy metabolism, regulation and central dogma. Each of those panels contains a series of X–Y plots depicting expression levels of subsystems of that panel, e.g. subsystems within the central dogma panel include transcription, translation and protein maturation and folding. The Dashboard presents a visual read-out of the expression status of cellular systems to facilitate a rapid top-down user survey of how all cellular systems are responding to a given stimulus, and to enable the user to quickly view the responses of genes within specific systems of interest. Although the Dashboard is complementary to traditional statistical methods for analysis of gene-expression data, we show how it can detect changes in gene expression that statistical techniques may overlook. We present the capabilities of the Dashboard using two case studies: the analysis of lipid production for the marine alga Thalassiosira pseudonana, and an investigation of a shift from anaerobic to aerobic growth for the bacterium Escherichia coli.
The IoT can become ubiquitous worldwide—if the pursuit of systemic trustworthiness can overcome the potential risks.
The Internet of Things (IoT) is already part of our daily lives, and will become even more so in the near future. The many characteristics that make the IoT different from the traditional networked computing, such as the close interaction with the physical world, also require us to pay particular attention to how to make such systems safe and secure. This document is part of a series that has previously addressed how to build more secure medical devices, connected vehicles, and electric power systems. In this document, we focus on the challenges associated with composing systems, rather than building individual programs or devices. We use the concept of smart cities to illustrate how design for safety, security, and privacy must consider emergent properties, and how a system or technology designed for this domain must account for how it might be integrated, reused, or composed with other technologies and systems.