last  page PLNT4610/PLNT7690 Bioinformatics
Lecture 12, part 1 of 3
next page

November   24, 2011

MICROARRAYS



MICROARRAY LINKS
Bibliography on Microarray Data Analysis [http://www.nslij-genetics.org/microarray/]  


A. Microarray Technology

1. How do microarrays work?
2. Types of experiments
3. Types of data
4. What are we trying to learn from microarrays?

B. Experimental design and normalization

1. Sources of experimental variation
2. Normalization

C. Grouping genes with similar expression patterns

1. Cluster analysis
2. Self-organizing maps


A. Microarray Technology

1. How do microarrays work?

In microarray hybridization, oligonucleotides of 60 - 70 bases in length are synthesized directly on a glass slide. Each array on the slide may contain 10,000 or more oligonucleotide spots, each representing a gene. Slides are hybridized with labeled cDNA made from an mRNA population. The labeled cDNA population, therefore, is a population of cDNA molecules representing the original mRNA population. The cDNAs are hybridized with slides. The amount of hybridization to a given clone represents the amount of mRNA present for the corresponding gene.

Microarray technology:   measures mRNA levels for thousands of genes in


 Microarrays consist of thousands of oligonucleotides spotted onto microscope slides (microarray). Sequences are chosen from EST collections or genomic sequences, so the sequences, and usually the identities of genes in the array are known.

In a microarray experiment, an mRNA population is isolated from cells. The population is labeled by synthesizing complementary cDNAs using reverse transcriptase and labeled nucleotides. The resulting cDNA population is then hybridized to the array.


gene x - strongly expressed; high abundance transcript

gene y - moderately expressed; medium abundance transcript

gene z - weakly expressed; low abundance transcript

Each transcript base pairs with the complementary DNA for its corresponding gene on the array.

Signal strength is proportional to the abundance of each mRNA


WARNING! Each one of these steps contributes to experimental variation.

a. Microarrays

Each gene on an array is represented as an single-stranded oligonucleotide in sizes ranging from 25 - 70 nt. Smaller sizes tend to be less efficient in binding probe. However, smaller oligonucleotides have greater specificity. Beyond about 70 nt, there is little increase in signal. Oligonucleotides are synthesized de-novo, so there is no chance of contamination from other sequences, as with cDNAs


b. Labeled cDNA
This is probably the biggest single source of experimental variation. Microarray experiments typically attempt to compare gene expression levels in different tissues or conditions, or at different times after a treatment. RNA is extracted from each tissue, condition, or traatment and RNA samples are diluted so that each sample contains the same concentration of RNA. To create a single-stranded probe, RNA is added to a reaction mix containing oligo dT primers, which can base pair with the polyA tail on mRNA, Reverse Transcriptase (RNA-dependent DNA polymerase) and labeled nucleotides. Commonly, labeled nucleotides are either tagged with fluorescent labels such as Cy3 and Cy5, or digoxygenin (DIG), which can be detected using chemiluminescent detection. In principle, for every mRNA molecule in the original RNA population, a single-stranded labeled cDNA will be produced, complementary to the mRNA. The higher the concentration of a particular mRNA, the more cDNA will be present.

c. Hybridization and washing

Incorporation of label into each probe is quantified, and probes are diluted so that all are at an equal concentration. Usually, a duplicate filter or microarray is prepared for each probe to be assayed. cDNAs are hybridized separately with each array. Filter arrays are incubated with labeled cDNA and washed in much the same way as is done for Southern or Northern blotting. For glass microarrays, hybridization is done under a coverslip, and slides are washed by dipping into wash solutions. Commercially-produced arrays come in cassettes, in which hybridization, washing, and detection are done.

d. Data acquisition
Hybridized probe is detected by  UV fluorescence in a slide reader using confocal laser microscopy. The raw intensity of each spot is measured by a CCD camera, and the data acquired as a TIF image.
 

2. Types of experiments

Single label experiment

The simplest type of microarray experiment is the single label experiment. Duplicate arrays are hybridized with probes made using a single label. To allow comparison between treatments, controls must be included in the probes and on the arrays to act as hybridization standards.

from Mark Schena*,, Dari Shalon, Renu Heller*, Andrew Chai*, Patrick O. Brown§, and Ronald
W. Davis* (1996) Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Vol. 93, Issue 20, 10614-10619.

Expression of human genes was measured in RNA populations from cells grown at 37°C (-Heat shock) or 43°C (+Heat shock). White boxes: genes whose expression changes with heat shock. Red boxes: genes activated by heat shock. Green boxes: genes suppressed by heat shock.

Double label experiments

Another approach to comparing expression between two conditions is double label experiments.  For example,  in work from Patrick Brown's lab at Stanford,  cDNA probes were made from yeast cells grown in the presence of either galactose or glucose. To distinguish between signals from the two probes, different fluorescently-tagged nucleotides, either Cy3 or Cy5 were added during reverse transcription. Cy3 has emission maxima at 565 and 615 nm, while Cy5 has an emission peak of 670nm. Replicate experiments were done in which dyes were switched. By scanning the arrays twice, once for Cy3 and once for Cy5, a composite image can be generated in which the ratio of the two dyes, and hence, the ratio of transcripts in the two growth conditions, can be measured. In pseudocolor images, spots in the array representing genes that are more strongly expressed in the presence of galactose are shown in green, and spots representing genes more strongly expressed in the presence of glucose are shown in red.

http://www.pnas.org/cgi/content/full/94/24/13057/F1


3) Types  of data

microarray studies tend to generate two different types of data. Studies in which two or more conditions are compared at a time generate discrete state data. Often it is critical to follow the expression of a gene over time after a treatment. In timecourse experiments, the expression of each gene in response to two or more treatments is measured over time. For example, in the timecourse at right, the solid blue and red dashed curves might represent the expression levels for a gene in response to two different drugs.



There is a whole family of problems in normalization of data and controlling for components of experimental variation.

To put things into perspective, if the experiment was repeated 4 times, the timecourse above represents
2 treatments x 6 times x 4 replicates = 48 labeled cDNA populations hybridized to 48 duplicate arrays

to generate the data.
Although the data for each replicate are averaged, there is often a great deal of variation  in the results, which can potentially negate any meaning. Therefore, extraordinary measures must be taken to minimize experimental variation at each step in the procedure, to minimize the overall variation.

2. What are we trying to learn from microarrays?

The primary goal of microarray experiments is to generate expression information for every gene in the array, under some set of condittions. Expression may be studied in The kind of results that are sought in microarray experiments can be illustrated as follows:

In the example, timecourse data are generated for each gene in an array. The raw data consists of a series of expression curves for timecourses, or histograms where other types of treatments are being compared. The goal is usually to find which groups of genes have the most similar expression patterns. In the example, two genes in the array (hatched background) show a gradual induction over the period of the timecourse. Two other genes (shaded background) show a biphasic response with two distinct periods of strong expression.
 

Key questions:
  • Which genes are expressed  differentially, between condition A and condition B?
  • How can genes be grouped according to similarities in expression patterns?

B. Experimental design and normalization

It is critical to realize that every experimental step in a procedure contributes to the final experimental error. Therefore, one should conceptualize the data as a set of observations each with a measureable amount of variation. In the figure, error bars represent the standard error of each measurement. The goal can then be restated as that of setting up the experiment in such a way as to minimize the final standard error in the observations. For some timepoints in which there is little true difference, a difference can only be detected when the standard error for both treatments is small. For other timepoints where the differences are large, higher standard errors will still allow the detection of the difference between two treatments.

1. Sources of experimental variation

Making a list of factors that contribute to experimental error is essentialy the same as making a list of steps in the microarray experiment. However several points are worth highlighting.
 

BIOLOGICAL REPLICATES ARE THE SINGLE MOST EFFECTIVE WAY TO GET GOOD GENE EXPRESSION RESULTS!

In the next section we will see that there is an almost endless list of ways to massage the data. The most heroic analytical methods are no substitute for the simple step of doing several biological replicates.
  • In each biological replicate, the entire experiment, such as different treatments of a batch of cells, plants or animals, sampling of different tissues from different conditions, followed by extraction of RNA, is repeated.
  • The RNA samples from different biological replicates are NOT mixed for a single hybridization. Rather, a separate labeling and hybridization is carried out for EACH REPLICATE.
  • Technical replicates, in which the same RNA sample is labeled and hybridized, only control for differences in handling. Biological replicates include all sources of biological and experimental variation. Therefor, they are more realistic.
  • As the number of biological replicates increases, the total experimental variation decreases.
Gene chips are getting cheaper all the time, often less than $100 per chip. The excuse that you can't do biological replicates because it is too expensive no longer obtains.

Estimated sample size requirements for example data set


FDR = 0.10
FDR = 0.05
FDR = 0.01
Power = 0.5 3 / 3 3 / 3 5 / 5
Power = 0.6 3 / 3 3 / 4 7 / 6
Power = 0.7 3 / 4 5 / 5 10 / 9
Power = 0.8 4 / 6 9 / 8 20 / 14
Power = 0.9 13 / 11 30 / 16 75 / 27

Power is the fraction of true positives detected. FDR is the false discovery rate ie. false positives. The numbers either side of the right slash indicate sample-size (ie. biological replicates) estimates made using the sample-size estimation methods described in Ref. [8] and Ref. [10], respectively.

Agilent - 10 Pitfalls of Microarray Analysis

Tommy S. Jorstad, Mette Langaas, Atle M. Bones, Understanding sample size: what determines the required number of microarrays for an experiment?, Trends in Plant Science, Volume 12, Issue 2, February 2007, Pages 46-50, ISSN 1360-1385, DOI: 10.1016/j.tplants.2007.01.001.

Knapen D, Vergauwen L, Laukens K, Blust R (2009) Best practices for hybridization design in two-color microarray analysis Trends in Biotechnology 27:406-414

Simon, S. Myths & Truths About Microarray Expression Profiling




Unless otherwise cited or referenced, all content on this page is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada


last  page PLNT4610/PLNT7690 Bioinformatics
Lecture 12, part 1 of 3
next page