basics
A kind reader pointed out that I frequently mention algorithms, but that I haven't defined them in the basics posts. To me, they're so fundamental to the stuff I do for a living that I completely forgot that they're an interesting idea.
It's really pretty simple: an algorithm is description of a mechanical procedure for performing a mathematical task. But that's actually a fairly sophisticated idea - the general concept of things that describe a procedure, and which can be discussed, described, and reasoned about as entities in their own right is something quite different from nearly…
How do microbiologists determine which microbe caused a disease?
As Tara has eloquently described (I, II), we are covered with bacteria and other microbes. A reasonable question then, is when we get sick, how do we which little devil deserves the blame?
In many cases, pathogens (disease-causing organisms) are identified by a common series of steps, known as Koch's postulates. Robert Koch described these steps in 1876 when he used them to prove that Bacillus anthracis was the cause of anthrax. During the past century, his steps have been used successfully many times.
Koch's steps are…
Yet another term that we frequently hear, but which is often not properly understood, is the concept of optimization. What is optimization? And how does it work?
The idea of optimization is quite simple. You have some complex situation, where
some variable of interest (called the target) is based on a complex
relationship with some other variables. Optimization is the process of trying to find
an assignment of values to the other variables (called parameters) that produces a maximum or minimum value of the target variable, called
the optimum or optimal value
The practice of optimization…
In yesterdays basics post, I alluded to the second kind of calculus - the thing that computer scientists like me call a calculus. Multiple people have asked me to explain what our kind of calculus is.
In the worlds of computer science and logic, calculus isn't a particular thing:
it's a kind of thing. A calculus is a sort of a logician's automaton: a purely
symbolic system where there is a set of rules about how to perform transformations of
any value string of symbols. The classic example is
href="http://scienceblogs.com/goodmath/goodmath/lambda_calculus/">lambda calculus,
which I've…
Calculus is one of the things that's considered terrifying by most people. In fact, I'm sure a lot of people will consider me insane for trying to write a "basics" post about something like calculus. But I'm not going to try to teach you calculus - I'm just going to try to explain very roughly what it means and what it's for.
There are actually two different things that we call calculus - but most people are only aware of one of them. There's the standard pairing of differential and integral calculus; and then there's what we computer science geeks call a calculus. In this post, I'm only…
One of the fundamental branches of modern math - differential and integral calculus - is based on the concept of limits. In some ways, limits are a very intuitive concept - but the formalism of limits can be extremely confusing to many people.
Limits are basically a tool that allows us to get a handle on certain kinds
of equations or series that involve some kind of infinity, or some kind of value that is almost defined. The informal idea is very simple; the formalism is also pretty simple, but it's often obscured by so much jargon that it's hard to relate it to the intuition.
The use of…
Basics: Algebra
While I was writing the vectors post, when I commented about how math geeks always build algebras around things, I realized that I hadn't yet written a basics post explaining what we mean by algebra. And since it isn't really what most people think it is, it's definitely worth taking the time to look at.
Algebra is the mathematical study of a particular kind of structure: a structure created by taking a set of (usually numeric) values, and combining it with some operations operate on values of the set.
One of the simplest examples of a kind of algebra is a simple group. A…
There's another way of working with number-like things that have multiple dimensions in math, which is very different from the complex number family: vectors. Vectors are much more intuitive to most people than the the complex numbers, which are built using the problematic number i.
A vector is a simple thing: it's a number with a direction. A car can be going 20mph north - 20mph north is a vector quantity. A 1 kilogram object experiences a force of 9.8 newtons straight down - 9.8n down is a vector quantity.
To be precise about the definition, a vector is a quantity qualitatively…
"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!"
- from Jabberwocky, by Lewis Carroll
I'm certain that if we ever sequenced DNA from the frumious Bandersnatch it would match hypothetical and putative proteins.
Why?
Because we always (well, almost always) get matches to hypothetical and putative proteins when we do a database search with a protein sequence.
Why?
Because many of the protein sequences in GenBank (at the NCBI) are a result of conceptual translations.
What? !!
A conceptual translation…
When we think of numbers, our intuitive sense is to think of them in terms of
quantity: counting, measuring, or comparing quantities. And that's a good intuition for real numbers. But when you start working with more advanced math,
you find out that those numbers - the real numbers - are just a part of the picture. There's more to numbers than just quantity.
As soon as you start doing things like algebra, you start to realize that
there's more to numbers than just the reals. The reals are limited - they exist
in one dimension. And that just isn't enough.
In terms of algebra - we know that…
Many people would probably say that things like computability and the halting
program aren't basics. But I disagree: many of our basic intuitions about numbers and
the things that we can do with them are actually deeply connected with the limits of
computation. This connection of intuition with computation is an extremely important
one, and so I think people should have at least a passing familiarity with it.
In addition to that, one of the recent trends in crappy arguments from creationists is to try to invoke ideas about computation in misleading ways - but if you're familiar with what…
How to win the X PRIZE in genomics
In October, 2006, the X PRIZE foundation announced that second X prize would focus on genomics. The first team to successfully sequence 100 human genomes in 10 days will win $10 million dollars.
And I would venture to guess, that the winning team would also win in the IP (intellectual property) game and the genetic testing market since they will gain an unprecedented look at genetic variation.
But when is done really done?
The first trick is defining what it means to be done. My husband says that "a sequencing project is done when the people who are…
What are the real numbers?
Before I go into detail, I need to say up front that I hate the term
real number. It implies that other kinds of numbers are not real,
which is silly, annoying, and frustrating. But we're pretty much stuck with it.
There are a couple of ways of describing the real numbers. I'm going to take you through a couple of them: first, an informal intuitive description; then an axiomatic definition, and finally, a constructive definition.
The Reals, Informally
The informal, intuitive description is the basic number line. Think about
a line, that goes on forever in…
As long as I'm doing all of these basics posts, I thought it would be worth
explaining just what a Turing machine is. I frequently talk about things
being Turing equivalent, and about effective computing systems, and similar things, which all assume you have some clue of what a Turing machine is. And as a bonus, I'm also going to give you a nifty little piece of Haskell source code that's a very basic Turing machine interpreter. (It's for a future entry in the Haskell posts, and it's not entirely finished, but it does work!)
The Turing machine is a very simple kind of theoretical computing…
Vizzini: He didn't fall? Inconceivable!
Inigio: You keep using that word. I do not think it means what you think it means.
- William Goldman, The Princess Bride
Excuse me while I temporarily interrupt the genome sequencing series to define a word.
Artifacts in the classroom
It's disorienting. You learn a word in certain context. You're sure of it's meaning and then you end up in a situation where people use the word in a completely unexpected way and no one else seems bothered by this!
I had this happen once with the word "artifact." I had organized a conference and some workshop…
To the ancient Greeks, a chimera was a kind of monster, with the body of a goat, the tail of a dragon, and a lion's head. To geneticists, a chimera can be an animal that's derived from two embryos, such as a transgenic mouse. Or if the organism is a plant, it can be a plant with a graft. We have a chimeric cherry tree in our back yard with branches from Rainier cherries, Bing cherries, and Van cherries. And you should see the chimeras that hang out at evolgen.
Naturally, the DNA cloning and sequencing world has it's chimeras, too. There are two main kinds that I know. Sometimes chimeras…
Sets are truly amazing things. In the history of mathematics, they're
a remarkably recent invention - and yet, they're now considered to be the
fundamental basis on which virtually all of mathematics is built. From simple things (like the natural numbers), to the most abstract and esoteric things (like algebras, or topologies, or categories), in modern math, they're pretty much all understood
in terms of sets.
So what is a set? A set is really just an abstract way about talking about a
collection of distinct things. Really, in the simplest version of set theory,
that's it. Such a simple…
The general steps in genome sequencing were presented in the earlier installments ( there are links at the bottom of the page), but it's worth repeating them again since each of the earlier steps has a bearing on the outcome of those that come later.
These are:
Break the genome into lots of small pieces at random positions.
Determine the sequence of each small piece of DNA.
Use an assembly program to figure out which pieces fit together.
That first step, making a collection of DNA fragments (a library), with breakpoints at random positions is of critical importance to the success of later…
"How much do I love you?
I'll tell you no lie.
How deep is the ocean?
How high is the sky?"
- Irving Berlin
The other installments are here:
Part I: Introduction
Part II: Sequencing strategies
Part III: Reads and chromats
Part V: checking out the library
We all know that sequencing a genome must be a lot of work. But unlike love, it is something we can measure. In fact, an important part of genome sequencing is estimating just how much work needs to be done. This is especially important if you're the one paying for it or the one writing the grant proposal.
Coverage depth: or why do we…
Another great basics topic, which came up in the comments from last fridays "logic" post, is the
difference between syntax and semantics. This is an important distinction, made in logic, math, and
computer science.
The short version of it is: syntax is what a language looks like; semantics is what
a language means. It's basically the distinction between numerals (syntax) and
numbers (semantics).
In terms of logic, the syntax is a description of what a valid statement looks like: what the pieces of a statement are, and all of the different ways that the pieces can get put together to
form…