ACM-BCB 2018

Accepted Tutorials

Analysis of sequencing data
Authors: Tamar Sofer[1][2], Mariaelisa (Misa) Graff[3], [1] Program in Sleep Medicine Epidemiology, Brigham and Women’s Hospital, [2]Harvard Medical School, [3]University of North Carolina at Chapel Hill, Department of Epidemiology Abstract: Large Whole Exome and Genome Sequence (WES, WGS) data are becoming available for the biomedical community through NHLBI and NHGRI studies such as TOPMed and CCDG. These studies collected more than WGS from more than 100,000 each, with individuals contributed from multiple, existing, epidemiological studies, representing genetically and environmentally diverse populations. Because the goal of WGS studies are to identify rare variants association with health outcomes, these heterogeneous data have to be analyzed together. This represent new challenges due to their inherent heterogeneity, that was not seen earlier, when genetic association studies analyzed diverse populations separately, and later combined them in meta-analysis. In this workshop, we will introduce the challenges and pitfalls in analyzing large sequencing data sets, and analysis approaches that address these challenges. At the end of the workshop, attenders will be familiar with these challenges, with a few software packages and how they address them, and will know how to look at the scientific literature to form a specific plan to analyze their own data, based on their specific structure and goals. Workshop url: https://scholar.harvard.edu/tsofer/analysis-sequencing-data-tutorial

Analysis of sequencing data

Authors: Tamar Sofer[1][2], Mariaelisa (Misa) Graff[3], [1] Program in Sleep Medicine Epidemiology, Brigham and Women’s Hospital, [2]Harvard Medical School, [3]University of North Carolina at Chapel Hill, Department of Epidemiology

Abstract: Large Whole Exome and Genome Sequence (WES, WGS) data are becoming available for the biomedical community through NHLBI and NHGRI studies such as TOPMed and CCDG. These studies collected more than WGS from more than 100,000 each, with individuals contributed from multiple, existing, epidemiological studies, representing genetically and environmentally diverse populations. Because the goal of WGS studies are to identify rare variants association with health outcomes, these heterogeneous data have to be analyzed together. This represent new challenges due to their inherent heterogeneity, that was not seen earlier, when genetic association studies analyzed diverse populations separately, and later combined them in meta-analysis.
In this workshop, we will introduce the challenges and pitfalls in analyzing large sequencing data sets, and analysis approaches that address these challenges. At the end of the workshop, attenders will be familiar with these challenges, with a few software packages and how they address them, and will know how to look at the scientific literature to form a specific plan to analyze their own data, based on their specific structure and goals.

Workshop url: https://scholar.harvard.edu/tsofer/analysis-sequencing-data-tutorial

Modeling Macromolecular Structures and Motions: Computational Methods for Sampling and Analysis of Energy Landscapes
Authors: Kevin Molloy[1], Nasrin Akhter[1], Amarda Shehu[1], [1] Department of Computer Science, George Mason University Abstract: With biomolecular structure recognized as central to understanding mechanisms in the cell, computational chemists and biophysicists have spent significant efforts on modeling and analyzing structure and dynamics. While significant advances have been made, particularly in the design of sophisticated energetic models and molecular representations, such efforts are experiencing diminishing returns. One of the culprits is the low exploration capability of Molecular Dynamics- and Monte Carlo-based exploration algorithms. The impasse has attracted AI researchers bringing complementary tools, such as randomized search and stochastic optimization. The objective of this tutorial is three-fold. First , the tutorial will introduce students and researchers that attend ACM-BCB to stochastic optimization treatments and methodologies for understanding and elucidating the role of structure and dynamics in the function of biomolecules. Second , the tutorial will allow attendees to connect between structures, motions, and function via analysis tools that take an energy landscape view of the relationship between biomolecular structure, dynamics, and function. Third, the presentation will be enhanced via open-source software that permit hands-on exercises. One such software is developed in the Shehu Computational Biology laboratory and allows researchers both to integrate themselves in a new research domain as well as drive further research via plug-and-play capabilities. The hands-on approach in the tutorial will be beneficial to students and senior researchers keen to make their own contributions.

Modeling Macromolecular Structures and Motions: Computational Methods for Sampling and Analysis of Energy Landscapes

Authors: Kevin Molloy[1], Nasrin Akhter[1], Amarda Shehu[1], [1] Department of Computer Science, George Mason University
Abstract: With biomolecular structure recognized as central to understanding mechanisms in the cell, computational chemists and biophysicists have spent significant efforts on modeling and analyzing structure and dynamics. While significant advances have been made, particularly in the design of sophisticated energetic models and molecular representations, such efforts are experiencing diminishing returns. One of the culprits is the low exploration capability of Molecular Dynamics- and Monte Carlo-based exploration algorithms. The impasse has attracted AI researchers bringing complementary tools, such as randomized search and stochastic optimization. The objective of this tutorial is three-fold. First , the tutorial will introduce students and researchers that attend ACM-BCB to stochastic optimization treatments and methodologies for understanding and elucidating the role of structure and dynamics in the function of biomolecules. Second , the tutorial will allow attendees to connect between structures, motions, and function via analysis tools that take an energy landscape view of the relationship between biomolecular structure, dynamics, and function. Third, the presentation will be enhanced via open-source software that permit hands-on exercises. One such software is developed in the Shehu Computational Biology laboratory and allows researchers both to integrate themselves in a new research domain as well as drive further research via plug-and-play capabilities. The hands-on approach in the tutorial will be beneficial to students and senior researchers keen to make their own contributions.

Agile Clinical Decision Support Development and Implementation
Authors: Mujeeb A. Basit, MD, MMSc[1], Vaishnavi Kannan, MS[1]; Duwayne L. Willett, MD, MS[1], [1]University of Texas Southwestern Medical Center, Dallas, Texas Abstract: Designing effective Clinical Decision Support (CDS) tools in an Electronic Health Record (EHR) can prove challenging, due to complex real-world scenarios and newly-discovered requirements. Deploying new CDS tools shares much in common with new product development, where “agile” principles and practices consistently prove effective. Agile methods can thus prove helpful on CDS projects, including time-boxed “sprints” and lightweight requirements gathering with User Stories. Modeling CDS behavior promotes unambiguous shared understanding of desired behavior, but risks analysis paralysis: an Agile Modeling approach can foster effective rapid-cycle CDS design and optimization. The agile practice of automated testing for test-driven design and regression testing can be applied to CDS development using open-source tools. Ongoing monitoring of CDS behavior once released to production can identify anomalies and prompt rapid-cycle redesign to further enhance CDS effectiveness. The workshop participant will learn about these topics in interactive didactic sessions, with time for practicing the techniques taught.

Agile Clinical Decision Support Development and Implementation

Authors: Mujeeb A. Basit, MD, MMSc[1], Vaishnavi Kannan, MS[1]; Duwayne L. Willett, MD, MS[1], [1]University of Texas Southwestern Medical Center, Dallas, Texas
Abstract: Designing effective Clinical Decision Support (CDS) tools in an Electronic Health Record (EHR) can prove challenging, due to complex real-world scenarios and newly-discovered requirements. Deploying new CDS tools shares much in common with new product development, where “agile” principles and practices consistently prove effective. Agile methods can thus prove helpful on CDS projects, including time-boxed “sprints” and lightweight requirements gathering with User Stories. Modeling CDS behavior promotes unambiguous shared understanding of desired behavior, but risks analysis paralysis: an Agile Modeling approach can foster effective rapid-cycle CDS design and optimization. The agile practice of automated testing for test-driven design and regression testing can be applied to CDS development using open-source tools. Ongoing monitoring of CDS behavior once released to production can identify anomalies and prompt rapid-cycle redesign to further enhance CDS effectiveness. The workshop participant will learn about these topics in interactive didactic sessions, with time for practicing the techniques taught.

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Authors: Yanjun (Jane) Qi, Ph.D.[1], [1]Department of Computer Science, School of Engineering and Applied Science, University of Virginia Abstract: The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of molecular elements on human genomes across many different tissue types. This massive-scale molecular data provides researchers with an unprecedented opportunity to understand gene regulation that can enable new insights into principles of life, the study of diseases, and the development of treatments and drugs. Computational challenges are the major bottlenecks for comprehensive genome-wide data analysis of gene regulation. Such data sets are complex, often ill-understood and at an unprecedented scale of data growth. Problems of this nature may be particularly well suited to deep learning techniques that recently show impressive results across a variety of domains. This tutorial aims to provide an extensive literature review about the state-of-the-art techniques in deep Learning, to examine how deep learning is enabling changes at analyzing datasets about gene regulations, and to foresee the potential of deep to transform several areas of biology and medicine.

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Authors: Yanjun (Jane) Qi, Ph.D.[1], [1]Department of Computer Science, School of Engineering and Applied Science, University of Virginia
Abstract: The past decade has seen a revolution in genomic technologies that enable a flood of genome-wide profiling of molecular elements on human genomes across many different tissue types. This massive-scale molecular data provides researchers with an unprecedented opportunity to understand gene regulation that can enable new insights into principles of life, the study of diseases, and the development of treatments and drugs. Computational challenges are the major bottlenecks for comprehensive genome-wide data analysis of gene regulation. Such data sets are complex, often ill-understood and at an unprecedented scale of data growth. Problems of this nature may be particularly well suited to deep learning techniques that recently show impressive results across a variety of domains. This tutorial aims to provide an extensive literature review about the state-of-the-art techniques in deep Learning, to examine how deep learning is enabling changes at analyzing datasets about gene regulations, and to foresee the potential of deep to transform several areas of biology and medicine.

Using BioDepot-workflow-Builder to create and execute reproducible bioinformatics workflows
Authors: Ka Yee Yeung[1], Ling-Hong Hung[1], Wes Lloyd[1], [1]Institute of Technology, University of Washington, Tacoma, WA, USA Abstract: Reproducibility is essential for the verification and advancement of scientific research. It is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results of computational analyses. Software containers like Docker, that distribute the entire computing environment are rapidly gaining popularity in bioinformatics. Docker not only allows for the reproducible deployment of bioinformatics workflows, but also facilitates mix-and-match of components from different workflows that have complex and possibly conflicting software requirements. However, configuration and deployment of Docker, a command-line tool, can be exceedingly challenging for biomedical researchers with limited training in programming and technical skills.

Using BioDepot-workflow-Builder to create and execute reproducible bioinformatics workflows

Authors: Ka Yee Yeung[1], Ling-Hong Hung[1], Wes Lloyd[1], [1]Institute of Technology, University of Washington, Tacoma, WA, USA
Abstract: Reproducibility is essential for the verification and advancement of scientific research. It is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results of computational analyses. Software containers like Docker, that distribute the entire computing environment are rapidly gaining popularity in bioinformatics. Docker not only allows for the reproducible deployment of bioinformatics workflows, but also facilitates mix-and-match of components from different workflows that have complex and possibly conflicting software requirements. However, configuration and deployment of Docker, a command-line tool, can be exceedingly challenging for biomedical researchers with limited training in programming and technical skills.

Rapidly identifying disease-associated rare variants using annotation and machine learning at whole-genome scale online
Authors: Alex Kotlar[1], Thomas S. Wingo[2], [1]Department of Human Genetics, Emory University, [2]Department of Neurology, Emory University, Division of Neurology, Atlanta VA Medical Center Abstract: Accurately selecting disease-associated alleles from large sequencing experiments remains technically challenging. During this tutorial, participants will learn how to use a new variant annotation and classification tool called Bystro (https://bystro.io/) to perform rare-variant association studies at the scale of whole-genome and whole-exome experiments. Bystro is the first online, cloud-based application that makes variant annotation and filtering accessible to all researchers for terabyte-sized whole-genome experiments containing thousands of samples. Its key innovation is a general-purpose, natural-language search engine that enables users to identify and export alleles and samples of interest in milliseconds. Participants will be shown how to 1) annotate Variant Call Format (VCF) data online using Bystro, 2) remove low-quality samples based on transition/transversion, silent/replacement, theta, and other key quality control metrics, 3) use Bystro’s search engine to remove low quality alleles as well as sites unlikely to be associated with disease, 4) apply machine learning methods including support vector machines (SVM) and classification trees (LightGBM) to classify disease-associated alleles, 5) perform rare-variant association tests (R/SKAT) on the filtered and quality-controlled alleles to identify diseaseassociated genes. Most steps will be performed online in Bystro.

Rapidly identifying disease-associated rare variants using annotation and machine learning at whole-genome scale online

Authors: Alex Kotlar[1], Thomas S. Wingo[2], [1]Department of Human Genetics, Emory University, [2]Department of Neurology, Emory University, Division of Neurology, Atlanta VA Medical Center
Abstract: Accurately selecting disease-associated alleles from large sequencing experiments remains technically challenging. During this tutorial, participants will learn how to use a new variant annotation and classification tool called Bystro (https://bystro.io/) to perform rare-variant association studies at the scale of whole-genome and whole-exome experiments. Bystro is the first online, cloud-based application that makes variant annotation and filtering accessible to all researchers for terabyte-sized whole-genome experiments containing thousands of samples. Its key innovation is a general-purpose, natural-language search engine that enables users to identify and export alleles and samples of interest in milliseconds. Participants will be shown how to 1) annotate Variant Call Format (VCF) data online using Bystro, 2) remove low-quality samples based on transition/transversion, silent/replacement, theta, and other key quality control metrics, 3) use Bystro’s search engine to remove low quality alleles as well as sites unlikely to be associated with disease, 4) apply machine learning methods including support vector machines (SVM) and classification trees (LightGBM) to classify disease-associated alleles, 5) perform rare-variant association tests (R/SKAT) on the filtered and quality-controlled alleles to identify diseaseassociated genes. Most steps will be performed online in Bystro.

Interpretable Machine Learning in Healthcare
Authors: Muhammad Aurangzeb Ahmad[1,2], Dr. Carly Eckert, M.D[1,3], Ankur Teredesai[1,2], [1]KenSci Inc. Seattle, Washington; [2]Department of Computer Science, Center for Data Science, University of Washington – Tacoma; [3]Department of Epidemiology, University of Washington Abstract: This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.

Interpretable Machine Learning in Healthcare

Authors: Muhammad Aurangzeb Ahmad[1,2], Dr. Carly Eckert, M.D[1,3], Ankur Teredesai[1,2], [1]KenSci Inc. Seattle, Washington; [2]Department of Computer Science, Center for Data Science, University of Washington – Tacoma; [3]Department of Epidemiology, University of Washington
Abstract: This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.

Important Dates

Call for	Submission Deadline	Notification of Acceptance
Papers	May 20	June 11
Workshops	March 31	April 7
Tutorials	March 31	April 7
Highlights	June 1	June 11
Posters	June 13	June 20

Hotel Reservation cutoff date August 10

News

Event photos are online

September 9, 2018

Dr. Joshua C. Denny, today's keynote speaker, slides here

August 31, 2018

There are several Room changes, refer to updated program

August 31, 2018

NIH Grant Writing Workshop slides here

August 31, 2018

Travel Grant Information posted

August 29, 2018

IMLH rescheduled to August 30th at 12PM

August 29, 2018

Accepted Papers

August 27, 2018

Program Brochure

August 24, 2018

Join Funding Panel

August 24, 2018

ACM-BCB room rate at JW Marriot is available till August 10th

August 6, 2018

Accepted Posters list

July 27, 2018

Full schedule posted

June 24, 2018

Day 1 schedule posted

June 8, 2018

Updated Camera Ready Deadline- June 30

June 8, 2018

Program Committee listed

June 4, 2018

Accepted Workshops' links added

May 22, 2018

Registration Open

May 21, 2018

Revised Highlights deadline- June 1

May 11, 2018

Accepted Workshops are listed

April 29, 2018

Accepted Tutorials are listed

April 29, 2018

Updated Paper submission deadline- May 20

April 22, 2018

Sponsorship Benefits information available

April 18, 2018

Revised Workshop proposal deadline- March 31

March 22, 2018

Revised Tutorial submission deadline- March 31

March 22, 2018

Venue information is available

February 18, 2018

Call for Papers, Workshops, Posters, Tutorials, Highlights

February 12, 2018