Quantitative genetics is concerned with the inheritance of biological traits showing continuous (or quantitative) phenotypic variation. Quantitative traits are common and have been extensively investigated and studied in evolutionary and genetic studies and in plant and animal breeding. Quantitative traits are normally controlled by multiple genes with various kinds of genetic effect, and their phenotypes are readily modified by environmental variation.
Crop genetic improvements catalysed population growth, which in turn has increased the pressure for food security. We need to produce 70% more food to meet the demands of 9.5 billion people by 2050. Climate changes have posed challenges for global food supply, while the narrow genetic base of elite crop cultivars has further limited our capacity to increase genetic gain through conventional breeding. The effective utilization of genetic resources in germplasm collections for crop improvement is crucial to increasing genetic gain to address challenges in the global food supply. Genomic selection (GS) uses genome-wide markers and phenotype information from observed populations to establish associations, followed by genome-wide markers to predict phenotypic values in test populations. Characterizing an extensive germplasm collection can serve a dual purpose in GS, as a reference population for predicting model, and mining desirable genetic variants for incorporation into elite cultivars. New technologies, such as high-throughput genotyping and phenotyping, machine learning, and gene editing, have great potential to contribute to genome-assisted breeding. Breeding programmes integrating germplasm characterization, GS and emerging technologies offer promise for accelerating the development of cultivars with improved yield and enhanced resistance and tolerance to biotic and abiotic stresses. Finally, scientifically informed regulations on new breeding technologies, and increased sharing of genetic resources, genomic data, and bioinformatics expertise between developed and developing economies will be the key to meeting the challenges of the rapidly changing climate and increased demand for food.
Construction of accurate and high-density linkage maps is a key research area of genetics. We investigated the efficiency of genetic map construction (MAP) using modifications of the k-Optimal (k-Opt) algorithm for solving the traveling-salesman problem (TSP). For TSP, different initial routes resulted in different optimal solutions. The most optimal solution could be found only by use of as many initial routes as possible. But for MAP, a large number of initial routes resulted in one optimal order. k-Opt using open route length gave a slightly higher proportion of correct orders than the method of adding one virtual marker and using closed route length. Recombination frequency (REC) and logarithm of odds (LOD) score gave similar proportions of correct order, higher than that given by genetic distance. Both missing markers and genotyping error reduced ordering accuracy, but the best order was still achieved with high probability by comparison of the optimal orders from multiple initial routes. Computation time increased rapidly with marker number, and 2-Opt took much less time than 3-Opt. The 2-Opt algorithm was compared with ordering methods used in two other software packages. The best method was 2-Opt using open route length as the criterion to identify the optimal order and using REC or LOD as the measure of distance between markers. We describe a unified software interface for using k-Opt in high-density linkage map construction for a wide range of genetic populations.
Genomic prediction (GP) has become a valuable tool for predicting the performance of selection candidates for the next breeding cycle. A vast majority of statistical linear models on which GP is based rely on the assumption of normality of the residuals and therefore on the response variable itself. In this study, we propose to use Bayesian regularized quantile regression (BRQR) in the context of GP; the model has been successfully used in other research areas. We evaluated the prediction ability of the proposed model and compared it with the Bayesian ridge regression (BRR; equivalent to genomic best linear unbiased predictor, GBLUP). In addition, BLUP can be used with pedigree information obtained from the coefficient of coancestry (ABLUP). We have found that the prediction ability of BRQR is comparable to that of BRR and, in some cases, better; it also has the potential to efficiently deal with outliers. A program written in the R statistical package is available as Supplementary material.
Owing to high power and accuracy and low false positive rate in our multi-locus approaches for genome-wide association studies and linkage analyses, these approaches have attracted considerable attention in plant and animal genetics. In large mapping population, however, fast multi-locus random-SNP-effect efficient mixed model association (FASTmrEMMA) and genome-wide composite interval mapping (GCIM) run a relatively long time. To address this issue, we proposed the improved FASTmrEMMA and GCIM algorithms in this study. In the new algorithms, some matrix identities, such as the Woodbury matrix identity, were used. In scanning each marker on the entire genome, in other words, the improved algorithms effectively replace the expensive eigenvector solutions in (restricted) maximum likelihood estimations in original algorithms with two (one) updated inner products and one updated vector-matrix-vector multiplication. Simulated and real data analyses showed that their computational efficiencies are increased sharply in large mapping population, although there are no mapping result differences between original and improved algorithms. In addition, the related software packages (mrMLM.GUI and QTL.gCIMapping.GUI) can be downloaded from the R and BioCode websites.
Genome-wide association study (GWAS) has been a standard approach to discover the genetic determinants underlying complex traits. It is a major challenge in GWAS how to improve analysis power, uncover complex genetic correlation, and reveal gene-gene and gene-environment interactions through integrated analysis of multiple genetically related traits. To combat these challenges, we proposed a mixed linear model-based joint association analysis method for multiple traits, which include epistasis and gene-environment interaction in the mapping model and utilize within-trait variance and between-trait covariance simultaneously; A F-statistics based on Wilks statistics is used to test the significance of each SNP and paired interacted SNPs, each genetic effects of QTS are estimated and tested by the MCMC method based on a QTS full model. Simulations showed that the multi-trait GWAS method could provide increased power in detecting pleiotropic loci affecting more than one trait, and can unbiasedly estimate effects of QTS. To demonstrate the performance of the proposed method, we analyzed four blood lipid traits in Multi-Ethnic Study of Atherosclerosis (MESA) Cohort and two yield-related traits in a rice immortalized F2 dataset. A software package was developed for the proposed method.
META-R (multi-environment trial analysis in R) is a suite of R scripts linked by a graphical user interface (GUI) designed in Java language. The objective of META-R is to accurately analyze multi-environment plant breeding trials (METs) by fitting mixed and fixed linear models from experimental designs such as the randomized complete block design (RCBD) and the alpha-lattice/lattice designs. META-R simultaneously estimates the best linear and unbiased estimators (BLUEs) and the best linear and unbiased predictors (BLUPs). Additionally, it computes the variance-covariance parameters, as well as some statistical and genetic parameters such as the least significant difference (LSD) at 5% significance, the coefficient of variation in percentage (CV), the genetic variance, and the broad-sense heritability. These parameters are very important in the selection of top performing genotypes in plant breeding. META-R also computes the phenotypic and genetic correlations among environments and between traits, as well as their statistical significance. The genetic correlations between environments or traits can be visualized in a biplot graph or a tree diagram (dendrogram). Genetic correlations are very important for identifying environments with similar behavior or making indirect selection and identifying the most highly associated traits. META-R performs multi-environment analyses by using the residual maximum likelihood (REML) method; these analyses can be done by environment, across environments by grouping factors (stress conditions, nitrogen content, etc.) and across environments; the analyses across environments can be done with a pre-defined degree of heritability.
Grain shape and color strongly influence yield and quality of durum wheat. Identifying QTL for these traits is essential for transferring favorable alleles based on selection strategies and breeding objectives. In the present study, 192 Ethiopian durum wheat accessions comprising 167 landraces and 25 cultivars were genotyped with a high-density Illumina iSelect 90K single-nucleotide polymorphism (SNP) wheat array to conduct a genome-wide association analysis for grain width (GW), grain length (GL), CIE (Commission Internationale l'Eclairage) L* (brightness), CIE a* (redness), and CIE b* (yellowness) traits. The accessions were planted at Sinana Agricultural Research Center, Ethiopia in the 2015/2016 cropping season in a complete randomized block design with three replications. Twenty homogeneous and healthy seeds per replicate were used for trait measurement. Digital image analysis of seeds with GrainScan software package was used to generate the phenotypic data. Analysis of variance revealed highly significant differences between accessions for all traits. A total of 46 quantitative trait loci (QTL) were identified for all traits across all chromosomes. One novel major candidate QTL (−lg P ≥ 4) with pleiotropic effects for grain CIE L* (brightness) and CIE a* (redness) was identified on the long arm of chromosome 2A. Eighteen nominal QTL (−lg P ≥ 3) and 26 suggestive QTL (−lg P ≥ 2.5) were identified. Pleiotropic QTL influencing both grain shape and color were identified.
Panicle traits directly associated with yield are a target of selection in rice breeding. Although abundant QTL for panicle traits have been identified, there is little information about the genetic basis of panicle traits in japonica super rice (JSR) cultivars. In this study, we identified QTL for panicle traits in three environments using a population of recombinant inbred lines (RILs) derived from the JSR cultivar Liaoxing 1. A total of 197 RILs were genotyped with 285 polymorphic SNP markers. Phenotypic data and best linear unbiased prediction (BLUP) value of primary branch number (BNP), secondary branch number (BNS), grain number on primary branch (GNP), grain number on secondary branch (GNS), grain number per panicle (GN), panicle length (PL) and grain density (GD) were used for QTL mapping. A total of 105 QTL for seven panicle traits were detected in single environments using their BLUP values. Individual QTL explained 0.51%-52.22% of the phenotypic variation. Of the 105, 49 were also detected by joint multi-environment analyses. Five stable QTL: qGD9, qPL9, qGNP9, qGN6, and qBNS6.2 were identified in multiple environments. qGD9, qGNP9, and qPL9, co-localizing on chromosome 9, likely correspond to the known gene DEP1. Importantly, qGN6 and qBNS6.2 in a co-localization region were identified as novel QTL, and their Liaoxing 1 alleles had a positive effect. Several RILs with the QTL allele combinations qGD9/qPL9/qGNP9 and qGN6/qBNS6.2 showed greater GN. Further investigation of the putative gene underlying qGN6/qBNS6.2 would shed light on the molecular mechanism of JSR.
Soil flooding stress, including seed-flooding, is a key issue in soybean production in high-rainfall and poorly drained areas. A nested association mapping (NAM) population comprising 230 lines of two recombinant inbred line (RIL) populations with a common parent was established and tested for seed-flooding tolerance using relative seedling length as indicator in two environments. The population was genotyped using RAD-seq (restriction site-associated DNA sequencing) to generate 6137 SNPLDB (SNP linkage disequilibrium block) markers. Using RTM-GWAS (restricted two-stage multi-locus multi-allele genome-wide association study), 26 main-effect QTL with 63 alleles and 12 QEI (QTL × environment) QTL with 27 alleles in a total of 33 QTL with 78 alleles (12 dual-effect alleles) were identified, explaining respectively 50.95% and 14.79% of phenotypic variation. The QTL-alleles were organized into main-effect and QEI matrices to show the genetic architecture of seed-flooding tolerance of the three parents and the NAM population. From the main-effect matrix, the best genotype was predicted to have genotypic value 1.924, compared to the parental value range 0.652-1.069, and 33 candidate genes involved in six biological processes were identified and confirmed by χ2 test. The results may provide a way to match the breeding by design strategy.
Branch number (BN) is an important agronomic attribute related to the plant architecture, adaptability, and yield of soybean. To date, few studies of BN have been conducted to elucidate its genetic background. We aimed to localize genetic factors affecting BN using segregating populations derived from the high-branching cultivar ‘Kennong24’ (KN24) and the low-branching cultivar ‘Kenfeng19’ (KF19). Composite interval mapping analysis detected a QTL (qBN-1) on chromosome 6 between the SSR markers BARCSOYSSR_06_0993 and BARCSOYSSR_06_1070 using an F2 population. To fine-map qBN-1, a RIL population was developed and genotyped with 14 SSR markers located in the QTL region. qBN-1 was localized to a 115.67-kb interval flanked by markers BARCSOYSSR_06_1048 and BARCSOYSSR_06_1053. The QTL was further confirmed using backcross populations of size 1305 (BC2F2 with KN24 as a recurrent parent) and 1712 (BC3F2 with KF19 as a recurrent parent). The fine-mapping region of qBN-1 contained only two candidate genes, Glyma.06G208800 and Glyma.06G208900, whose expression patterns were investigated by qRT-PCR. Compared to Glyma.06G208800 gene expression, Glyma.06G208900 showed the highest expression of the two genes and showed a significant difference in expression between high- and low-branching genotypes in either axillary meristem or shoot apical meristem, and showed opposite expression patterns in the two tissues at V4 and R1 stages. These results identify Glyma.06G208900 as a novel candidate gene controlling BN. Taken together, the results of this study provide a foundation for cloning and functional analysis of the qBN-1 gene and for the improvement of BN by marker-assisted selection in soybean breeding.
Soybean is a source of edible oil for humans and provides a third of the vegetable oil consumed worldwide. Increasing seed oil content in seeds is thus a key objective in soybean breeding. In the present study, a four-way recombinant inbred line (FW-RIL) population comprising 144 lines, planted in 10 environments, and a germplasm panel of 455 accessions, planted in two environments, were used to collect oil-content phenotypes. First, 59 quantitative trait loci (QTL) were detected in the FW-RIL population by inclusive complete interval mapping on a linkage map consisting of 2232 single-nucleotide polymorphism (SNP) markers. Also in the FW-RILs, 44 quantitative trait nucleotides (QTNs) were detected by association analysis using 109,676 SNP markers and five methods of multi-locus genome-wide association study. Second, 77 QTN were detected by association analysis in the germplasm panel using 63,306 markers. Comparison of the QTL and QTN suggested four QTN controlling oil content. Pathway analysis was performed on genes in attenuation regions of these four QTN, and two candidate genes involved in the synthesis or metabolism of soybean oil were identified. These findings provide useful information about the genetics of oil content and may contribute to its genetic improvement by marker-assisted selection.
Alfalfa (Medicago sativa L.) is the most widely grown forage legume crop worldwide. Yield and plant height are important agronomic traits influenced by genetic and environmental factors. The objective of this study was to identify quantitative trait loci (QTL) and molecular markers associated with alfalfa yield and plant height. To understand the genetic basis of these traits, a full-sib F1 population composed of 392 individuals was developed by crossing a low-yielding precocious alfalfa genotype (male parent) with a high-yielding late-maturing alfalfa cultivar (female parent). The linkage maps were constructed with 3818 single-nucleotide polymorphism (SNP) markers on 64 linkage groups. QTL for yield and plant height were mapped using phenotypic data for three years. Sixteen QTL associated with yield and plant height were identified on chromosomes 1 to 8. Six QTL explained more than 10% of phenotypic variation, representing major loci controlling yield and plant height. One locus on chromosome 1 controlling yield traits had not been identified in previous studies. Three QTL co-located with other QTL (qyield-1 and qheight-7, qheight-5 and qyield-4, qheight-6, and qyield-6). With further validation, the markers closely linked with these QTL may be used for marker-assisted selection in breeding new alfalfa varieties with high yield.
Evaluation of general combining ability (GCA) is crucial to hybrid breeding in maize. Although the complete diallel cross design can provide an efficient estimation, sparse partial diallel cross (SPDC) is more flexible in breeding practice. Using real and simulated data sets of partial diallel crosses between 266 maize inbred lines, this study investigated the performance of SPDC designs for estimating the GCA. With different distributions of parental lines involved in crossing (called random, balanced and unbalanced samplings), different numbers of hybrids were sampled as the training sets to estimate the GCA of the 266 inbred lines. In this process, three statistical approaches were applied. One obtained estimations through the ordinary least square (OLS) method, and the other two utilized genomic prediction (GP) to estimate the GCA. It was found that the coefficient of determination of each approach was always higher than the heritability of a target trait, showing that the GCA for maize inbred lines could be accurately predicted with SPDC designs. Both the GP approaches were more accurate than the OLS, particularly in the scenario for a low-heritability trait with a small sample size. Additionally, prediction results demonstrated that a big sample of hybrids could greatly help improve the accuracy. The random sampling of parental lines had little influence on the average accuracy. However, the prediction for lines that never or seldom involved in crossing might suffer from much lower accuracy.
Genome-wide prediction is a promising approach to boost selection gain in hybrid breeding. Our main objective was to evaluate the potential and limits of genome-wide prediction to identify superior hybrid combinations adapted to Northwest China. A total of 490 hybrids derived from crosses among 119 inbred lines from the Shaan A and Shaan B heterotic pattern were used for genome-wide prediction of ten agronomic traits. We tested eight different statistical prediction models considering additive (A) effects and in addition evaluated the impact of dominance (D) and epistasis (E) on the prediction ability. Employing five-fold cross validation, we show that the average prediction ability ranged from 0.386 to 0.794 across traits and models. Six parametric methods, i.e. ridge regression, LASSO, Elastic Net, Bayes B, Bayes C and reproducing kernel Hilbert space (RKHS) approach, displayed a very similar prediction ability for each trait and two non-parametric methods (random forest and support vector machine) had a higher prediction performance for the trait rind penetrometer resistance of the third internode above ground (RPR_TIAG). The models of A + D RKHS and A + D + E RKHS were slightly better for predicting traits with a relatively high non-additive variance. Integrating trait-specific markers into the A + D RKHS model improved the prediction ability of grain yield by 3%, from 0.528 to 0.558. Of all 6328 potential hybrids, selection of the top 44 hybrids would lead to a 6% increase in grain yield compared with Zhengdan 958, a commercially successful hybrid variety. In conclusion, our results substantiate the value of genome-wide prediction for hybrid breeding and suggest dozens of promising single crosses for developing high-yielding hybrids for Northwest China.
The characterization of genomes with great detail offered by the modern genotyping platforms have opened a venue for accurately predicting the genotype-by-environment interaction (GE) effects of untested genotypes in different environmental conditions. Already developed statistical models have shown the advantages of including the GE interaction component in the prediction context using molecular markers, pedigree, or both. In order to leverage the family information of highly structured populations when pedigree data is not available, we developed a model that uses the family membership instead. The proposed model extends the reaction norm model by including the interaction between families and environments (FE). A representative fraction of a soybean Nested Association Mapping population (16,187 grain yield records) comprising 38 bi-parental families (1358 genotypes) observed in 18 environments (2011, 2012, and 2013) was used to contrast the proposed model with three conventional prediction models. Two cross-validation scenarios (prediction of tested [CV2] and untested [CV1] genotypes) with a twofold design (50% for training and testing sets) were used for mimicking prediction situations that breeders face in fields. Results showed that the family factor in interaction with environments explains a sizable amount of the phenotypic variability. This helped to improve the predictive ability with respect to the main effects model (GBLUP) around 41% (CV2) and 49% (CV1), and about 17% with respect to the conventional reaction norm model. The inclusion of the FE term not only improved the global results but also significantly increased the prediction accuracy of those environments where the conventional models showed a very poor performance. These results show the importance of taking into consideration the family structure existing in breeding programs for improving the selection strategies in multi-parental populations.
The theory and associated selection methods of classical quantitative genetics are based on the multifactorial or polygene hypothesis. Major genes or quantitative trait loci (QTL) in modern quantitative genetics based on a “major gene plus polygenes” genetic system have been paid much attention in genetic studies. However, it remains unclear how the numerous minor genes act, although the polygene theory has sustained genetic improvement in plants and animals for more than a hundred years. In the present study, we identified a novel minor gene, BnSOT-like1 (BnaA09g53490D), which is a sulfotransferase (SOT) gene catalyzing the formation of the core glucosinolate (GSL) structure in Brassica napus. This gene has been occasionally found during investigations of plant height-related genes, but has not been identified by QTL mapping because of its small phenotypic effects on GSL content. The overexpression of BnSOT-like1 up-regulated the expression of aliphatic GSL-associated genes, leading to a high seed aliphatic GSL content, and the overexpression of the allelic gene Bnsot-like1 did not increase seed GSL content. These findings suggest that the SOT gene has a marked effect on a quantitative trait from a reverse genetics standpoint, but a minor effect on the quantitative trait in its natural biological state. Because of the redundancy of GSL biosynthetic genes in the allotetraploid species B. napus, mutations of a single functional gene in the pathway will not result in significant phenotypic changes, and that the genes in biosynthetic pathways such as BnSOT-like1 in our study have minor effects and may be called polygenes in contrast to the reported three regulatory genes (BnHAG1s) which strongly affect GSL content in B. napus. The present study has shed light on a minor gene for a quantitative trait.
Recurrent selection is an important breeding method for population improvement and selecting elite inbreds or fixed lines from the improved germplasm. Recently, a computer simulation tool called QuMARS has been developed, which allows the simulation and optimization of various recurrent selection strategies. Our major objective in this study was to use the QuMARS tool to compare phenotypic recurrent, marker-assisted recurrent, and genomic selections (abbreviated respectively as PS, MARS and GS) for both short- and long- term breeding procedures. For MARS, two marker selection models were considered, i.e., stepwise (Rstep) and forward regressions (Forward). For GS, three prediction models were considered, i.e., genomic best linear unbiased predictors (GBLUP), ridge regression (Ridge), and regression by Moore-Penrose general inverse (InverseMP). To generate genotypes and phenotypes for a given individual during simulation, one additive and two epistasis genetic models were considered with three levels of heritability. Results demonstrated that selection responses from GBLUP-based GS and MARS (Forward) were consistently greater than those from PS under the additive model, particularly in early selection cycles. In contrast, selection response from PS was consistently superior over MARS and GS under epistatic models. For the two epistasis models, total genetic variance and the additive variance component were increased in some cases after selection. Through simulation, we concluded that GS and PS were effective recurrent selection methods for improved breeding of targeted traits controlled by additive and epistatic quantitative trait loci (QTL). QuMARS provides an opportunity for breeders to compare, optimize and integrate new technology into their conventional breeding programs.