How do we find genes related to traits? A review of Bulked Sample Analysis

Here at Legume Laboratory we have written a number posts about research that has overexpressed a particular gene which has been linked with a particular trait such as resistance to drought or resistance to pest damage. The idea behind such research is usually to see whether an increase in the amount that gene is transcribed results in a linear change in the trait, evidencing  the control of that trait by the gene being overexpressed.

But how are those candidate genes initially identified?

The Plant Biotechnology Journal recently published a review piece on the process and current state of technology used in designing and sampling populations of plants to isolate particular differences in phenotype and identifying the genetic differences that cause the phenotypes of related plants to diverge. In particular, the review focuses on bulked sample analysis, a short cut to the more time and budget costly method of gene mapping all samples of a population. The end result can be the identification of one or a couple of genes controlling the trait of interest, but is usually the identification of a number of regions of the genome that are differentially expressed between the phenotypes, such regions called  Quantitative Trait Loci (or QTL).


How we find genes related to traits

Bulked segregants and variants

To discover which genes are involved in a particular trait within a certain species of plant we first need to pull together a population of the plant that shows variation in the trait of interest. There are two methods of creating this population, one of which uses a controlled population created from a specific breeding strategy (segregating population), the other creates a population from plants with phenotypic variation in the trait of interest which are derived from any population of that species ie the population isn’t raised through controlled breeding (a variant population). The idea behind both strategies is to obtain a population of plants which, in the next step, can by phenotyped for the trait of interest with particular attention paid to the most extreme variation ie plants showing significant drought tolerance versus plants most adversely affected by water deficit.

Sampling and phenotyping

After a method of developing a population has been chosen the plants are grown and a method established of scoring or classifying the different phenotypes being examined. An example may be the number of lesions formed on plant leaves as a result of a fungus or grain yield under varying water supplies. In the case of segregating populations the phenotyping may be carried through a number of generations of plants with individuals at the phenotypic extremes being selected for crossing to create the following generation, segregating the trait and, theoretically, the genetic underpinnings of the trait.

In establishing these populations care must be taken to ensure that only the trait of interest is being selected for.  The authors of the review emphasise the importance of reducing the signal-to-noise ratio and mention the development and implementation of precision phenotyping techniques and technology.

Where a particular type of stress is being selected for, the contrasting environments (one of high stress, one of lesser or absent that stress) need to be established and tested for concurrently.

Once the population for phenotyping has been developed under the required testing conditions, the plants are sampled. In most cases the sampling takes place by applying the phenotyping criteria to each plant, the end result being a spectrum of phenotypes that will usually distribute normally with the extreme phenotypes being at the tail ends of the distribution curve.

Obtaining results that are statistically significant rely on the population size and the number of plants at either end of the distribution curve. Variations in the sample sizes required depend heavily on factors such as the distance between genes related to the trait (and therefore the frequency of recombination), the number of genes related to the trait and effect size of a particular gene or genes on the trait.


Figure 2 from article. Four types of bulked sample analysis (BSA). (a) BSA for qualitative traits such as disease resistance with two distinct phenotypes (R, resistance; S, susceptible). (b) BSA for quantitative traits with normal distribution, among which samples from two tails (L: lower; U, upper) are selected and bulked. (c) BSA for multiple parallel bulks with individuals selected independently from the two tails of a normal distribution. (d) BSA with only one bulk available for the target trait, while the other tail was killed by lethal genes or due to severe stresses, when compared with individuals randomly selected from a control population under no stress with normal allele frequencies for the target trait; CK: plants from the control population, R: plants selected from the stressed environment.

For a population consisting of between 200 to 500 plants, the optimum tail size would be 20% to 30% of the population. As the total population being sampled from increased, the size of the tails to be selected will decrease. Large variations in phenotypes can reduce the sample sizes to 10% of a small total population (200 individuals), while QTLs associated with a small phenotype effect will require a much larger population (3000 to 5000 individuals) with each extreme phenotype being a selection of 100 plants from each tail.

Figure 2 above shows different methods of bulking samples for analysis. In the case of a trait that can be classified as a resistance or susceptible to a particular stress, the more resistant and susceptible individuals selected from the tails are used, while populations looking at a quantitative change in phenotype (such as grain weight) can be sampled from the extreme tails in one or multiple bulks from each end of the distribution. Where, for example, one treatment group fails to survive the treatment process leaving only one tail, the tail can be compared to a selection of control crops (figure 2d above).

Molecular analysis

Once the selected samples of the population are bulked they can analysed by various methods to detect differences or changes in genome, gene transcription or protein expression.

DNA analysis is the predominant form of molecular analysis. For many crops a set of DNA markers have been created from analysis of plant genome, based on such genetic landmarks as simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs) and PCR based markers. Using the markers as the basis for PCR amplification, as a most common example, differences between the genotypes of the two phenotype bulks can identified and mapped back to the genome. The result is demonstrated in Figure 2 above with its depiction of DNA bands or DNA expression levels and the connection between variation of plant phenotype and genotype.

DNA microarrays are increasingly being used in a similar manner for a faster and cheaper analysis.

Linkage maps can then be created which show how closely linked the DNA marker is to the gene or genes within the identified QTL.

Analysis of the transcribed DNA via RNA sequencing analysis methods can give a greater insight into the variation in gene transcription between phenotypes, although the effect of any non-transcribed DNA or levels of transcription cannot be assessed.

Protein analysis is a little more difficult to perform and borrows from immunology methods that use labeled antibodies to detect proteins within the bulked samples. However, mass-spectrometry  and Edman degradation are two methods that are being used to understand the primary sequence of proteins present within the samples with greater precision and without the need to have a range of antibodies that will detect the majority of proteins in the samples.


Figure from article – the BSA pipeline, from population selection to application.

Applications of Bulk Sample Analysis and the Future

Bulk sample analysis, particularly bulked segregant analysis, is repeatedly used to detect the genetic underpinnings of particular traits and is widely used in agriculture-related science. When performed under tightly controlled conditions it assists researchers to isolate a particular trait from other variations in phenotype from which  base the identification of QTL can result.

And the depth of interrogation of the genetic basis of important traits is increasing as sequencing technology develops. The ability to effectively barcode segments of DNA before sequencing it in a large pool of DNA, allowing subsequent identification of the starting DNA, will hasten data gathering.

More important is the reducing cost of sequencing DNA. At the point where using markers and PCR or microarrays hardly differs in price to entire genome sequencing, the amount of data generated for analysis (and the number of computer programs developed to assist with the taks) will explode. It may be then that complex traits weakly controlled by a number of QTL will be identified with comparative ease.

The theoretical assistance these methods have for agriculture are the identification of genes that control particular traits which will then be used for screening and selecting crop breeding stocks. As the library of QTL increases, the ability to select seeds for particular conditions will assist food production levels particularly in the more trying of growing conditions.



One thought on “How do we find genes related to traits? A review of Bulked Sample Analysis

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s