Digital Biology Friday: What was that gene anyway?

By sporte on July 21, 2006.

Welcome back!

If you've just joined us, we're in the middle of a quest to find the identity of an unknown nucleotide sequence. To summarize our results so far, we used this sequence to do a blastn search of GenBank, using all the default settings at the NCBI. You can see the beginning of the project here.

And we had some rather curious results.

It appeared that our sequence matched sequences from very diverse organisms, like Dengue virus, E. coli, and Simian Immunodeficiency virus. Very strange!

There was another curious word, too, that appeared in the descriptions for each of the results.

That word was VECTOR. "Vector" is a word that I imagine Sherlock Holmes would have used if he wanted to interrogate a scientist or mathematician and find out what they did without having them realize that he was trying to do so.

To a mathematician or a physicist, a vector is a straight line with a magnitude and direction. To a public health official, a vector is a rat, mouse, louse, or insect; anything capable of carrying a disease.

And, to a molecular biologist, a vector can be a plasmid, phage, or eucaryotic virus that is used to move genes around from place to place. This information can help us make some good guesses about the function of our unknown bit of DNA, because vectors have been engineered to have some common features. Some of these are special DNA sequences that allow plasmids to be copied. Some of the special features are genes that encode for enzymes that make bacteria resistant to different antibiotics. If a bacterial cell contains a plasmid with one of these antibiotic resistance genes, it produces a protein that allows it to live in the presence of an antibiotic. These features are helpful for biologists because we can select bacteria that are resistant to a drug and kill off all the rest.

Okay, where were we?

Back to our results:

Here is our list of matching sequences from the blastn search. We had some good guess last week about answers, and one was right, but involved far too much work.

I think it's far easier to look at the data.

Here's how.

We click the link to the alignment score.

This shows us where our sequences match each other. Pay attention to the positions of the subject sequence that match our query! We need to remember this. Our sequence starts matching at 44, 246 and ends matching at 44, 665.

Then we click the link to the matching sequence, and scroll down the page.

Eventually, we reach numbers. These numbers represent positions in the DNA sequence.

Here's the region where our sequence matches:

And our answer is, the beta lactamase gene. This gene codes for an enzyme that breaks the beta-lactam rings, thus disabling antibiotics like pencillin.

technorati tags: digital biology, blast, bioinformatics

More like this

Digital Biology Friday: It's still Friday!

"Hey Rocky, watch me pull a rabbit out of my hat!" I realized that I should add just a bit more information to last answer on gene identification, so here it is. After the last installment, Diego commented: but still you do not know exactly what part of your DNA sequence is matching to the…

Digital Biology Friday: What sequences do you believe?

During the past few Fridays (or least here and here), we've been looking at a paper that was published from China with some Β-lactamase sequences that were supposedly from Streptococcus pneumoniae. The amazing thing about these particular sequences is that Β-lactamase has never been seen in S.…

Digital Biology Friday: Those BLASTed results!

Last week, we embarked on an adventure with BLAST. BLAST, short for Basic Alignment Search Tool, is a collection of programs, written by scientists at the NCBI (1) that are used to compare sequences of proteins or nucleic acids. BLAST is used in multiple ways, but last week my challenge to you,…

Antibiotic resistance: taking the bypass

The wind storms and heavy rains that hit Seattle recently, demonstrated why a bypass mechanism can be a helpful thing - for both bacteria and motorists. Under the bridge on Mercer, from the Seattle Times When the weather is nice, I bike to work. But when the weather gets bad, (I consider rain…

hmmmm....
Well it is easier, but still you do not know exactly what part of your DNA sequence is matching to the annotated protein.

To know that it is much better to do a blast search against a protein DB. Then you will have information about the conservation of your sequence, which can be also useful.

And after that you can use PFAM to be sure that the protein have a "functional" conserved domain.

As you have it, it would be like the very first step, but then you have to carry on, and verify your initial findings using more specific tools.

Hi Diego,

Actually, you can look at the GenBank record and see how the DNA sequence corresponds to the encoded protein. I show it here.

I agree, PFAM is helpful if you're trying to understand the function of a truly unknown protein, or if your match isn't as good as it was in this case (100%). I also really like the Conserved Domain Database.

Hey, thanks for the really informative posts. I've been trying to get a handle on this stuff for a while, and seeing these tasks done in context just made it all click.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been reposted at…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a short tutorial with…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I teach a…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…