[Sextonlabmeeting] Fwd: Fall 2021 Syllabus for QSB 282: Bioinformatics

Jason Sexton jsexton2 at ucmerced.edu
Tue Aug 24 09:31:09 PDT 2021


---------- Forwarded message ---------
From: David Ardell <dardell at ucmerced.edu>
Date: Tue, Aug 24, 2021 at 9:03 AM
Subject: Fall 2021 Syllabus for QSB 282: Bioinformatics
To: qsb-grads <qsb-grads at ucmerced.edu>, qsb-faculty <
qsb-faculty at ucmerced.edu>


Dear QSB Faculty and Students,
Hello! Please find below the Fall 202 syllabus for QSB 282: Bioinformatics.

At the end of the syllabus, you will find a curated list of links to books
 available freely online or through the library, that I hope you might find
useful.

Please consider registering for the course, or having your students
register for this course. If you have any questions about it, please
contact me. Welcome!

Thanks, and have a great semester.

Best wishes,
Dave Ardell
QSB 282: Bioinformatics David H. Ardell Fall 2021
1 Syllabus Version

This is Version 1.
2 Prerequisite

Graduate Standing.
3 Course Goals and Objectives

Welcome to Bioinformatics! This course aims to provide transferable
life-skills in computer literacy including scripting and programming,
statistics, data science and modeling useful for any biology student. In
it, we learn elements of *literacy in scientific computing* such as
theoretical descriptions of code, data and machines, theory and practice of
programming, compiling, installing and using free open-source software,
libraries, package managers, and network security specifically in the
professional UNIX-like command-prompt paradigm. The class is now centered
on the functional and interpreted language R, as well as the UNIX/Tidyverse
“pipes and filters” paradigm. It also touches on the powerful scripting
language Perl and the object-oriented language Python. The course starts
with a * science in society* component that explores the *ethical and
scientific imperatives for the publication of reproducible computational
workflows using Free, Open-Source Software (FOSS) as part of open science.*
After emphasizing the disadvantages of commercial closed-source software
for all users, especially scientists and engineers, we train in the use of
FOSS alternatives including RStudio/RMarkdown, Overleaf, LaTeX, Git and GNU
tools. We learn elements of introductory data science, statistics and
machine learning including methods to analyze, integrate, visualize, model
and simulate larger volumes of text, numerical and biological data. We
learn fundamental topics in modern statistics including analysis of
frequencies and location, experimental design, regression and
classification, goodness of fit, false discovery rates, and machine
learning concepts, emphasizing non-parametric, likelihood-based and
Bayesian approaches, with applications to functional genomics, proteomics,
phylogeny and other subjects. We learn elements of bioinformatics and
molecular evolutionary theory as part of learning how to rigorously and
reproducibly apply open-source implementations of bioinformatics algorithms
to conduct original computational analyses and syntheses of these data.
After this course you will have an advanced introductory ability to apply
the theoretical and practical foundations of bioinformatics and critically
evaluate analyses done by others. *At the end of this course, you will be
qualified to profess that you can program reproducible bioinformatic
workflows and data science projects in Git, Overleaf, LaTeX,
R/RStudio/RMarkdown, UNIX and Tidyverse Pipelines, Perl one-liners and
Regular Expressions, object-oriented programming against bioinformatics
APIs in Python, and conversant in bioinformatics theory, software and
methods.* Graduate Bioinformatics QSB 282 includes *discussions on original
bioinformatics and statistics literature readings* and an original *final
project applying course concepts and methods.*
4 Theoretical Topics

Theoretical topics covered in QSB 282 encompass

   - Elements of computer science theory including:
      - fundamentals of data, descriptions and machines,
      - programming paradigms (declarative, functional, pipelined, compiled
      vs interpretive, object-oriented),
      - regular expressions, formal languages, automata
   - Elements of molecular evolution, bioinformatics and statistics theory
   including:
      - homology and similarity
      - evolutionary distances and models of sequence evolution
      (p-distance, poisson correction, substitution matrices, DNA evolutionary
      models),
      - bioinformatic scores and likelihoods
      - alignment by dynamic programming
      - theory and practice of pairwise and multiple sequence alignment,
      greedy algorithms such as CLUSTALW
      - distance-based methods in phylogeny including neighbor-joining and
      unsupervised machine learning methods
      - BLAST theory and practice including p-values and E-values
      - profile models of motif and other sequence families
   - Elements of statistics theory including:
      - statistical analysis of frequencies, compositions and location,
      including parametric, non-parametric and Bayesian approaches
      - genome-wide analysis of differential gene expression and proteomics
      - elements of experimental design including batch effects
      - empirical Bayes moderated statistics
      - multiple test corrections including Family Wise Error Rates and
      False Discovery Rates
      - elements of machine learning including classifiers and statistics
      for their performance including Receiver-Operator-Characteristic (ROC)
      curves and Area-Under-Curve (AUC)

5 Practical Skills

Students are guided interactively with partial solutions to gain experience
using RMarkdown/RStudio, Overleaf and Git, the UNIX command-line, UNIX
Pipelines, Git, Overleaf, Regular Expressions, elements of data science
including data cleaning and control, data integration, data visualizations
in UNIX and R, package managers, compiling open-source software, regular
expressions, perl one-liners and BASH scripting, bioinformatics with
open-source software such as CodonW, BLAST, PSI-BLAST, CLUSTALW, PUZZLE,
BioNJ, HMMer, Parsing with Biopython, analysis of gene expression data in R
using Voom, limma and R-based implementations of GSEA.
6 Student Learning Outcomes

At the conclusion of QSB 282, we aim to provide students with the ability
to:

   1. *use UNIX-like command-line computing environments* to manipulate
   files, SSH/SCP to securely transfer data between servers and computers over
   the internet, as well as to understand and apply basic concepts about
   computer architecture, networks and security
   2. *compile and install free, open-source software (FOSS)* and libraries
   in UNIX and *understand why open-source software and data formats
   benefit the scientific enterprise*
   3. *creatively manipulate and integrate text, numerical, and biological
   data* using UNIX command-line tools in novel combinations including perl
   one-liners, python object-oriented APIs, and R.
   4. acquire biological data from public databases and rigorously and *
   reproducibly apply open-source implementations of bioinformatics algorithms
   to conduct original computational analyses and syntheses* of these data,
   including alignment, homology search, phylogeny, and differential gene
   expression analysis.
   5. understand the theoretical bases, assumptions, and quantities of
   different bioinformatics algorithms, concepts, and statistics and apply
   them in bioinformatic treatment of data and *critically evaluate
   analyses done by others*
   6. critically read and evaluate bioinformatics and statistics primary
   literature, and *write publication-quality bioinformatics and statistcal
   methods, results and conclusions*
   7. conduct an *original directed but independent research project* that
   applies and synthesizes the concepts and techniques of the course.

7 Required Materials

   1. A personal computer.*Your computer should have at least 10 GB free on
   its hard drive, be updated with the latest operating system updates, and be
   backed-up to an external hard-drive in the event of data loss.*
   2. Internet access.
   3. A VPN Client <https://it.ucmerced.edu/VPN_Changeover> to access
   primary and secondary scientific literature available through the library,
   while off-campus.

8 Required and Supplemental Course Readings

Discussion write-ups and lab assignments will require students to access
primary and secondary scientific literature as well as eBooks available
either freely on the internet or through the library (see above about
installing a VPN client to access readings from off-campus).

All students are expected to supplement lecture slides with * independent
reading* from the excellent *O’Reilly Online Learning* resource available
for free through the library. Suggested titles and specific assigned
readings are listed below. The value of this resource for independent
learning and reference cannot be overemphasized!

*To access O’Reilly Online Learning readings, sign in through its special
portal at https://www.oreilly.com/library/view/temporary-access/
<https://www.oreilly.com/library/view/temporary-access/>.*
9 Course Website

The course website for QSB 282 is available through UC Merced CatCourses
<https://catcourses.ucmerced.edu/courses/22061>
10 Course Policies
10.1 Assignment and Lateness Policy

All assignments (including lab assignments, discussion write-ups and the
final course project proposal, write-up and oral presentation) must be
completed by end of term in order to achieve a passing grade of B- or
better in QSB 282. Late assignments receive full credit.
10.2 Attendance Policy

*Showing up is 80 percent of life* – Woody Allen, via Marshall Brickman
<http://quoteinvestigator.com/2013/06/10/showing-up/#note-6553-1>

*Attendance is mandatory in labs and discussions.* A roll call will be
taken and contribute to your final grade in the class. I recognize that the
experience students bring to this class is highly variable, and that at
least some parts of lab assignments may be easy to complete and not require
any help. Nonetheless, I ask you to please help each other in lab to
complete the lab assignments, because we humans have evolved to best learn
new techniques and new languages socially, in groups. If you find yourself
often in the helper role, then please consider this class to be part of
your graduate pedagogical training…. and thank you in advance.

Absences from lab and discussion will be excused with full credit as
follows:

   1.

   You may have a planned excused absence (or arranged permanent excuse for
   lateness or leaving early) related to professional obligations (such as
   conference attendance), course conflicts or state or federally-accepted
   religious observances. Please contact me by email in advance to be excused.
   2.

   You may have an unplanned absence due to a serious illness or crisis.
   Please contact me by email to be excused.

Unexcused late attendance will be worth 80% of on-time attendance.
10.3 COVID Policy

We are all humans stuck in the middle of a pandemic. I will maintain
reasonable flexibility in the face of the ongoing challenges posed by this,
in accordance with campus policies. Please contact me if you need help.
10.4 Grading Policy

Graduate students may take QSB 282 for either a letter grade or on a S/U
grading basis. To achieve an S grade they must perform at the level of at
least a “B” in the grading scale below, complete all assignments and the
independent research project to a professional standard of scholarship and
thoroughness.
Percentage of Final Grade Assessment
*45%* lab assignments
*10%* discussion write-ups
*10%* attendance and participation in lab and discussion
*35%* final project proposal, final report and oral presentation

This table shows the minimum course points achieved that will guarantee the
corresponding grade. Information on grade appeals, incompletes, etc. can be
found in the UC Merced Grading Policy available from the Registrar.
Grade % Total Points Achieved
A-, A, A+ 85%, 90%, 95%
B-, B, B+ 70%, 75%, 80%
C-, C, C+ 60%, 63%, 67%
D-, D, D+ 50%, 53%, 57%
10.5 Student Disability Services

UC Merced is committed to make our courses accessible to all students,
including students with limited mobility, impaired hearing or vision, and
learning disabilities. Any student who feels they may need an accommodation
based on the impact of a disability should contact me privately to discuss
specific needs, and please also contact Disability Services at (209)
228-6996 as soon as possible to become registered and ensure that such
accommodations are implemented in a timely fashion.
10.6 Academic Integrity/Cheating

You may not share your worked assignments directly with other students. You
may not copy homework or lab assignments from other students. Your work
must be original and fulfill the professional scientific standards of
publication and academic ethical standards commonly applied at UC Merced.
You may not make up data or source references, you may not make false
excuses to get extensions of time or recuse yourself from assignments, you
must comply with exam and assignment instructions. *You may help others on
assignments but you may not provide a copy of your original work to another
student in whole or in part.* If you violate these rules you risk losing
good standing in QSB.
11 Class Schedule
[image: Class Calendar for QSB 282: Bioinformatics]

Class Calendar for QSB 282: Bioinformatics
11.1 Week 01, 08/23 - 08/27: Data Science I: Workspace for Reproducible
Scientific Computing
11.1.1 Lectures: Course Intro., Intro. to RMarkdown and other Markup
Languages
11.1.2 Lab: Start — Install UNIX, R/RStudio, LaTeX and Git
11.1.3 Discussion: Reproducibility in Scientific Computing
11.2 Week 02, 08/30 - 09/03: Data Science II: Communicating and
Collaborating with RStudio, Overleaf and Git
11.3 Week 03, 09/06 - 09/10: Data Science III: Installing and Compiling FOSS
11.4 Week 04, 09/13 - 09/17: Data Science IV: Computational Pipelines in
UNIX and R
11.5 Week 05, 09/20 - 09/24: Data Science V: Data Integration and
Visualization in UNIX and R
11.6 Week 06, 09/27 - 10/01: Bioinformatics I: Analysis of Frequencies and
Compositions
11.7 Week 07, 10/04 - 10/08: Bioinformatics II: Homology, Similarity and
Evolutionary Distance

Final Project Proposals Due
11.8 Week 08, 10/11 - 10/15: Bioinformatics III: Substitution and Score
Matrices

Individual Meetings to Discuss Final Projects
11.9 Week 09, 10/18 - 10/22: Bioinformatics IV: Pairwise and Multiple
Alignment
11.10 Week 10, 10/25 - 10/29: Bioinformatics V: BLAST Theory
11.11 Week 11, 11/01 - 11/05: Bioinformatics VI: Automating Genome-Wide
Analysis
11.12 Week 12, 11/08 - 11/12: Machine Learning I: Classifiers and Motif
Analysis
11.13 Week 13, 11/15 - 11/19: Machine Learning II: Distance-based
Clustering, Phylogeny and Dimensionality Reduction
11.14 Week 14, 11/22 - 11/26: Machine Learning III: Analysis of Location
11.15 Week 15, 11/29 - 12/03: Machine Learning IV: Linear Models and
Experimental Designs in Gene Expression Analysis
11.16 Week 16, 12/06 - 12/10: Machine Learning V: Pathway Analysis

Final Project Oral Presentations on Saturday Morning
12 Recommended and Supplemental References

Please use these references to support your learning in this course.
References from O’Reilly require you to sign in through its special portal
at https://www.oreilly.com/library/view/temporary-access/. References
through “EBookCentral” may require you to sign in by VPN to access through
the library while off-campus, see
https://it.ucmerced.edu/VPN_Changeover. *Through
these portals, all references with URLs given below are freely available.*
12.1 Computer Science and Programming

   1. *UNIX and Perl to the Rescue!: A Field Guide for the Life Sciences
   (and Other Data-rich Pursuits)* by Drs Keith Bradnam, Michelle Gill and
   Ian Korf. (CUP ISBN 978-0521169820).
   http://korflab.ucdavis.edu/Unix_and_Perl/current.pdf and other resources
   available at https://rescuedbycode.com
   2. *Command-Line Essentials Playlist* by Daniel J. Barrett
   https://learning.oreilly.com/playlists/6b0ba469-d706-45a0-ae95-05560a7ef529/
   3. *Unix Power Tools*, 3rd Edition by Jerry Peek, Shelley Powers, Tim
   O’Reilly, Mike Loukides
   https://learning.oreilly.com/library/view/unix-power-tools/0596003307/
   4. *Learning R* by Richard Cotton (O’Reilly Media,
   Inc. 978-1-4493-5710-8).
   https://learning.oreilly.com/library/view/learning-r/9781449357160/
   5. *Dynamic Documents with R and knitr*, 2nd Edition by Yihui Xie
   https://learning.oreilly.com/library/view/dynamic-documents-with/9781315360706/
   6. *Regular Expressions Cookbook*, 2nd Edition by Jan Goyvaerts, Steven
   Levithan
   https://learning.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/
   7. *Understanding Computation*, by Tom Stuart
   https://learning.oreilly.com/library/view/understanding-computation/9781449330071/
   8. *Three Ways To Learn Python Playlist* by Tim O’Reilly
   https://learning.oreilly.com/playlists/2c6094f3-06c1-413f-aacd-a4d8c437d826/
   9. *Fluent Python*, 2nd Edition by Luciano Ramalho
   https://learning.oreilly.com/library/view/fluent-python-2nd/9781492056348/

12.2 Statistics

   1. *Modern Statistics for Modern Biology* by Susan Holmes and Wolfgang
   Huber (FREE BOOK AVAILABLE ONLINE)
   http://web.stanford.edu/class/bios221/book/
   2. *Statistical Rethinking : A Bayesian Course with Examples in R and
   STAN*, 2nd Ed. by Richard McElreath
   https://ebookcentral.proquest.com/lib/ucm/reader.action?docID=6133700
   3. *Large-Scale Inference : Empirical Bayes Methods for Estimation,
   Testing, and Prediction* by Bradley Efron
   https://ebookcentral.proquest.com/lib/ucm/detail.action?pq-origsite=primo&docID=585354
   4. *Bayesian Data Analysis* by Gelman et al. 3rd Edition. Available as
   e-book from
   https://ebookcentral.proquest.com/lib/ucm/detail.action?docID=1438153
   5. *Nonparametric Statistical Methods* 2nd Ed. M. Hollander, Douglas A.
   Wolfe and Eric Chicken.
   https://ebookcentral.proquest.com/lib/ucm/detail.action?docID=1550549
   6. *The Analysis of Biological Data* by Whitlock and Schluter, 2nd. Ed.
   The modern recommended commercial textbook
   7. *Biometry* by Sokal and Rohlf, 3rd Ed. (1995) or 4th Ed. (2012). New
   York. W.H. Freeman. The old-school classic.
   8. *Biostatistical Analysis* by J.H. Zar, 5nd. Ed. — Another classic
   with unique coverage of circular statistics and other topics.
   9. *An Introduction to the Bootstrap* by Efron and Tibshirani (1993)
   Chapman and Hall. Brad Efron is the inventor of the bootstrap and innovator
   of empirical Bayes methods, see above. This is a great learning text
   covering basic concepts of statistics from a fresh angle.

12.3 Data Science and Machine Learning

   1. *Data Science at the Command Line*, 2nd Ed by Jeroen Janssens
   https://learning.oreilly.com/library/view/data-science-at/9781492087908/
   2. *Hands-On Machine Learning with Scikit-Learn and TensorFlow* by
   Aurélien Géron
   https://learning.oreilly.com/library/view/hands-on-machine-learning/9781491962282/
   3. *An Introduction to Statistical Learning with Applications in R* by
   James, Witten Hastie, and Tibshirani, Springer,
   https://www.springer.com/us/book/9781461471370
   4. *The Elements of Statistical Learning: Data Mining, Inference, and
   Prediction* by Hastie, Tibshirani, and Friedman, 2nd. Ed. Springer,
   available for free at http://statweb.stanford.edu/~tibs/ElemStatLearn/
   5. *R for Data Science* by Hadley Wickham; Garrett Grolemund (O’Reilly
   Media, Inc. 978-1-4919-1039-9)
   https://learning.oreilly.com/library/view/r-for-data/9781491910382/
   6. *Fundamentals of Data Visualization* by Claus Wilke (O’Reilly Media,
   Inc., 978-1-4920-3108-6)
   https://learning.oreilly.com/library/view/fundamentals-of-data/9781492031079/

12.4 Bioinformatics and Molecular Evolution

   1. *Sequence - Evolution - Function: Computational Approaches in
   Comparative Genomics* by Koonin and Galperin (2003) —
   https://www.ncbi.nlm.nih.gov/books/NBK20259/
   2. Sections from online help resources of NCBI such as the NCBI
   Handbook, Tutorials, and Howtos available at
   http://www.ncbi.nlm.nih.gov/guide/training-tutorials/
   3. *Bioinformatics Data Skills* by Vince Buffalo
   https://learning.oreilly.com/library/view/bioinformatics-data-skills/9781449367480/
   4. *BLAST* by Ian Korf, Mark Yandell, and Joseph Bedell —
   https://learning.oreilly.com/library/view/blast/0596002998/
   5. *R Bioinformatics Cookbook* by Dan MacLean
   https://learning.oreilly.com/library/view/r-bioinformatics-cookbook/9781789950694/
   6. *Mastering Python for Bioinformatics* by Ken Youens-Clark
   https://learning.oreilly.com/library/view/mastering-python-for/9781098100872/
   7. *Bioinformatics and Molecular Evolution* by Higgs and Atwood
   (Blackwell; ISBN978140510683)
   8. *Biological Sequence Analysis* by Durbin et al. (Cambridge;
   ISBN9780521629713)
   9. *An Introduction to Bioinformatics Algorithms* by Jones and Pevzner
   (MIT Press; ISBN9780262101066)



-- 
Jason (Jay) Sexton
(he/him/his)
Associate Professor
Department of Life and Environmental Sciences
University of California, Merced
231 Science and Engineering Building 1
jsexton2 at ucmerced.edu
http://sextonlab.ucmerced.edu/

---Nature does not hurry, yet everything is accomplished ~ Lao Tzu---

---We are quite literally air, water, soil, energy and other living
creatures ~ David Suzuki---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ucmerced.edu/pipermail/sextonlabmeeting/attachments/20210824/0961dc2e/attachment-0001.html>


More information about the SextonLabMeeting mailing list