IIIE 14. Using Bioinformatics to Research Shrimp Disease

Background

In the 21st century, biology is undergoing a transformation from being a purely lab-based science to being an information-based science. Advances in DNA-based technologies, such as genome sequencing, have led to an explosion of genetic information being generated by the scientific community. Biology, computer science, and information technology have merged into the new field of bioinformatics to solve the problems associated with storing, organizing, indexing, viewing, and analyzing the mind-boggling volume of data. It is routine now for scientists to search databases and perform extensive Web-based searches as they analyze results, formulate specific hypotheses, and design new experiments.

By helping scientists in many different disciplines advance their understanding of basic biological processes, the rapidly emerging field of bioinformatics is leading to advances in the diagnosis, treatment, and prevention of many diseases affecting humans and other organisms. Scientists studying marine pathogens use the same bioinformatics tools as scientists studying human diseases.

Aquaculture products such as shrimp are a significant source of protein in many countries. Some of these products carry foreign pathogens and harmful parasites that are harmless to humans, but that in a global marketplace can spread quickly, infecting aquaculture and wild stocks around the world, with devastating consequences. White spot syndrome virus (WSSV) is one disease that threatens the global shrimp industry. WSSV has the potential to infect cultured shrimp and incur a mortality rate of 100 percent of the population within 3 to 10 days. The potentially fatal virus has been found to be a threat not only to all shrimp species, but also to some freshwater crustaceans such as crab and crayfish. The threat of the virus spreading from farms into coastal waters and infecting other crustaceans is of great concern.

Using bioinformatics tools, what kind of genetic sequence information can be gathered on white spot syndrome virus? Discover how the sequences can open doors to more information about this virus and the species of shrimp it infects. This activity will teach students to locate a DNA sequence in a database and learn a little about how genetic information is posted for other scientists to use. Using discoveries from other scientists around the world, they can begin to understand how the viral DNA compares to DNA sequences in other organisms. While it sounds challenging, there are no wrong answers here—only information to be gathered and analyzed.

BLAST, basic local alignment search tool, is a computer-based system that allows scientists to enter DNA sequences and compare them to other sequences stored in a large database. Results of a BLAST search may tell scientists what species the DNA came from, what other genes encode proteins similar to the ones they entered, or a wealth of other information.

To conduct a BLAST search, at least two sequences must be input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences from the query that are similar to a subsequence in the database. The query sequence, which may be as long as one thousand nucleotides, is compared to a database that contains several billion nucleotides. The BLAST Web server is hosted by the National Center for Biotechnology Information (NCBI). Anyone with a Web browser can perform similar searches against constantly updated databases of protein and DNA sequences. Newly sequenced organisms are constantly being added to the database. This activity will guide students through an actual BLAST search.

Students will first locate a WSSV sequence in the Entrez function of the NCBI Web site and then run a BLAST search using that sequence.

PDF file PDF file for this project

 

Focus Questions

What information on specific gene sequences can be gathered from the NCBI BLAST Web site?

notepad

objectives

Students will explore the BLAST section of the NCBI Web site to learn about resources that can be
gathered on a specific gene sequence.

 glass

 materials

Computer with Internet access
Student worksheet and activity instructions included here.

 clock

 teaching-time

One class period

 testtubes

 procedure

Part 1: Obtaining a Gene Sequence from the Entrez Function of NCBI Web Site

1. Go to the NCBI Web site http://www.ncbi.nlm.nih.gov/.

2. First you will use the Entrez function of the site to search for a WSSV DNA sequence. To search for a sequence, select “Entrez Home” located on right side of the page under "Hot Spots."

3. In the Entrez window, click on “GenBank” on the top bar to open a nucleotide databank that provides access to many sequences. How many bases are stored in GenBank (the first paragraph indicates how many)?

4. Change the pulldown search menu to "Nucleotide." Enter the letters “WSSV” in the search box and click on GO.

5. The page that opens will display a list of numbers. Each number corresponds to a scientific paper about the shrimp virus. What are the first five papers dealing with WSSV? Open one paper and list one author. In which country was the study conducted?

6. The genetic sequence for a part of the WSSV genome will be visible at the bottom of the page. Cut and paste the sequence information into a Word document. Accuracy is critical as this sequence will be used in the next part of the activity.

Part 2: Performing a BLAST Search

1. Go to the NCBI Web site, http://www.ncbi.nlm.nih.gov/.

2. Click on “BLAST,” which appears on the menu across the top of the page.

3. The BLAST section of interest is nucleotides only. Select the "nucleotide blast" program under "Basic BLAST."

4. The new BLAST window will have an “Enter Query Sequence” box near the top. Cut and paste the WSSV sequence you found earlier into the search window. It is critical that the WSSV genetic sequence be copied correctly. To insure accuracy, cut and paste the sequence, with the numbers, into the search box.

5. Below the SEARCH box there are many changes that can be made to search parameters and other options, but those can be explored at another time. Only two adjustments will be made to the search parameters at this time. Look carefully at the menu. Use the drop down database menu in the "Search Set" section to change the "Human genomic plus transcript (Human G + T)" to “Non-human, non-mouse ESTs (est_others).” The “nr” refers to nucleotide results. The “est_others” refers to expressed tag sequences that are small parts of larger genes that have not been entirely identified yet, and “others” refers to organisms other than human and mouse.

6. Finally, scroll to the bottom to “BLAST” for results. Hit the BLAST button. The results page may take a minute or two. It depends on how many other researchers are chasing after any number of the millions of sequences on the NCBI site at the same time this search is being conducted.

7. A Nucleotide Sequence page will appear. View a page that shows several other sequences from the data bank that have a high similarity to the WSSV sequence. If your result reads “No significant similarities found,” copy a second WSSV sequence and try again.

8. The Query ID number in the upper left side of the page denotes the search identification number. Use the mouse to move the cursor over the graph. The graph displays how numerically close the sequence came to others in the databank. Each line can be selected, and the following information on that sequence will appear:

• The sequence alignment comparison
• The percent of how closely the sequences align
• The number of gaps and misalignments in the sequence
• The organism source of the sequence information. This includes genus and species as well as what the organ source is.
• The sequence accession number (the “gi” number), which allows location of the sequence at a later date.
• The genetic source of the sequence such as RNA or DNA
Scroll down and view the sequence as compared to others. The red A, C, T, etc., show the differences in the sequences. Just these small deviations can differentiate species or indicate different functions of the two genes.

   student-work
 questionmark

 questions

1. Why is it important that a database of all genetic sequences be created?

2. What kinds of information can researchers get from being able to search and align
sequences?

3. What does “est_others” mean? How about “est_mouse”? What can we learn from incomplete sequences?

4. How has the ability to sequence genes and blend computer technology with life sciences affected scientists’ ability to conduct research?

5. How many bases are stored in GenBank?

6. What are the first five articles listed that deal with white spot syndrome virus? Give only the title of the article.

1.
2.
3.
4.
5.

7. List one author of the first paper

8. In what country was the study conducted?
After running the sequence through the BLAST search, look for the following features and answer the questions.

9. What is the Query ID number?
Locate the sequence alignment comparison (in top box of the graph).

10. Locate the graph of the comparison (pass cursor over the box to the left) to be sure all information appears. What is the Line One information?

Click on the line to show nucleotide alignments (this feature shows the genes that the query
sequence is similar to and how they are aligned). Find the red letter beneath the gene sequence line that shows differences in the sequence.

11. How many gaps and misalignments are there in the sequence?

12. What is the genus or species of the organism?

13. What is the sequence access number (the “gi” number) that allows location of the sequence at a later date?

14. What is the genetic source of the information? 1 DNA or 1 RNA (Check one)

   teacher-key
 answermark

 answers

1. Why is it important that a database of all genetic sequences be created?

A database of all genetic sequences allows scientists to look at the degree of similarity among disparate genetic information from huge numbers of different organisms. This information is important in many types of research, including examining the genetic basis of disease and disease resistance. This database also allows researchers to look at the “relatedness” of different populations of the same organism and allows scientists to post discoveries in a place where they can be easily accessed by any researcher in the world. This allows scientific
advancement to happen very quickly.

2. What kinds of information can researchers get from being able to search and align sequences?

They can see how closely related one gene in an organism is to another. This allows scientists to look at how certain genes may have evolved over time. It also allows scientists to compare genetic sequences of one organism with another in a user-friendly program.

3. What does “est_others” mean? How about “est_mouse”? What can we learn from incomplete sequences?

The term “est_others” means expressed tag sequences (small, incomplete sequences) of newly discovered genes in organisms other than humans or mice. The term “est_mouse” means the partially expressed tag sequences of all of the mouse genes that have been discovered so far. Newly discovered sequences can be compared to all mouse sequences to look for differences. A partial sequence may be completed by another researcher’s lab in another part of the world.

4. How has the ability to sequence genes and blend computer technology with life sciences affected scientists’ ability to conduct research?

Unprecedented speed and enhanced collaboration are facilitated by computers. We cannot even begin to grasp the impact computers have had on scientific research. Research has been able to move at lightning speed, and projects such as the Human Genome Project are living proof. Without computer capabilities, the Human Genome Project would have taken many, many years. Only through the development of computer programs that can rapidly analyze and organize sequence data was the blueprint of human life able to be determined.

BLAST Search Questions:

5. How many bases are stored in GenBank?

89 billion in GenBank, 108 billion in the WGS division as of August 2009

6. What are the first five articles listed that deal with white spot syndrome virus? Give only the title of the article.

Paper titles will look similar to these:

1. GQ328029 Shrimp white spot syndrome virus isolate 03 VP28 gene, complete cds 

2. GQ328028 Shrimp white spot syndrome virus isolate 03 VP19 gene, complete cds 

3. Shrimp white spot syndrome virus isolate SDDL18/04 envelope protein VP19 (VP19) gene, complete cds

4. Shrimp white spot syndrome virus isolate SDDL18/04 envelope protein VP28 (VP28) gene, complete cds

5. Shrimp white spot syndrome virus unkonwn mRNA

7. List one author of the first paper

8. In what country was the study conducted?

After running the sequence through the BLAST search, look for the following features and answer the questions.

Answers will vary, depending on results of each search.

9. What is the Query ID number?
Locate the sequence alignment comparison (in top box of the graph).

10. Locate the graph of the comparison (pass cursor over the box to the left) to be sure all information appears. What is the Line One information?

Click on the line to show nucleotide alignments (this feature shows the genes that the query
sequence is similar to and how they are aligned). Find the red letter beneath the gene sequence line that shows differences in the sequence.

11. How many gaps and misalignments are there in the sequence?

12. What is the genus or species of the organism?

13. What is the sequence access number (the “gi” number) that allows location of the sequence at a later date?

14. What is the genetic source of the information? 1 DNA or 1 RNA (Check one)

 computer

 references