作 者 :Jian-Cheng Wang, Jin Hu, Ning-Ning Liu, Hai-Ming Xu and Sheng Zhang
Keywords:cluster, core subset, genotypic value, mixed linear model, molecular marker information,
Abstract:In the present study, a strategy was proposed for constructing plant core subsets by clusters based on the combination of continuous data for genotypic values and discrete data for molecular marker information. A mixed linear model approach was used to predict genotypic values for eliminating the environment effect. The “mixed genetic distance” was designed to solve the difficult problem of combining continuous and discrete data to construct a core subset by cluster. Four commonly used genetic distances for continuous data (Euclidean distance, standardized Euclidean distance, city block distance, and Mahalanobis distance) were used to assess the validity of the continuous data part of the mixed genetic distance; three commonly used genetic distances for discrete data (cosine distance, correlation distance, and Jaccard distance) were used to assess the validity of the discrete data part of the mixed genetic distance. A rice germplasm group with eight quantitative traits and information for 60 molecular markers was used to evaluate the validity of the new strategy. The results suggest that the validity of both parts of the mixed genetic distance are equal to or higher than the common genetic distance. The core subset constructed on the basis of a combination of data for genotypic values and molecular marker information was more representative than that constructed on the basis of data from genotypic values or molecular marker information alone. Moreover, the strategy of using combined data was able to treat dominant marker information and could combine any other continuous data and discrete data together to perform cluster to construct a plant core subset.(Author for correspondence. Tel: +86 (0)571 8697 1119; Fax: +86 (0)571 8697 1117; E-mail: jhu@dial.zju.edu.cn)