在回归分析中,信息矩阵X‘X的行列式值det(X‘X)如果近于0,就会造成其逆阵(X‘X)-1的极度膨胀,进而大大增加回归系数的误差均方,影响回归配合的稳健性和精确度。因而det(X‘X)近于0的X‘X被称为“病态矩阵”。本文提出以X变数的相关矩阵R的行列式值为综合指标,当det(R)在区间[-0.01,0.01]和[-0.0001,0.0001]但非0时,可分别认为其对应的X‘X是“病态的”和“严重病态的”。X‘X的病态源于X矩阵的高度列依赖,可用简单相关系数、多重决定系数和状态指数度量其列依赖程度。为了改进或消除X‘X的病态,建议选用(1)简化原回归模型,(2)增加新的资料,(3)对回归系数添加限制条件,(4)采用诸如脊回归、广义逆M-回归等非常规回归程序。简要讨论了病态诊断的重要性和病态改进的评价。
in regression analysis, the information matrix X‘X is an important factor because of b=(X‘X)-1 X‘Y. If the determinant value of X‘X, det(X‘X), is close to zero ,the inverse of the X‘X, (X‘X)-1, will extremely inflate, the error mean square for regression coefficient will largely increase, and in consequence the regression fitting will be poor robustness and low precision. Thus the matrix X‘X of det(X‘X)≈0 is called “ill-conditioned matrix”. In this paper the determinant value of correlative matrix R of x variables, det(R), is used as a synthetic index for ill-conditioning, i.e. if the det(R) lies in the intervals [-0.01,0.01] and [-0.0001,0.0001] but nonzero, the corresponding matrix X‘X can be regarded as ill-conditioned and seriously ill-conditioned, respectively. The ill-conditioned results from the linear dependency among columns in matrix. Three diagnostic criteria, including linear correlation coefficient, multiple determination coefficient and condition index, can measure the degree of the column dependency. In order to improve or eliminate the ill-conditioning of , four methods, i.e. (1) to reduce the original regression model, (2) to collect the new data, (3) to add the restrictive condition for regression coefficients and (4) to adopt the non –customary regression procedure such as the ridge regression and the generalized inverse regression, are suggested. The importance of diagnosing the ill-conditioning and the evaluation for improved ill-conditioning are also discussed briefly.
全 文 :