## quinta-feira, 1 de julho de 2010

### models

(...) The value of a model is that often it suggests a simple summary of the data in terms of the major systematic effects together with a summary of the nature and magnitude of the unexplained of random variation. (...)

Thus the problem if looking intelligently at data demands the formulation of patterns that are thought capable of describing succinctly not only the systematic variation in the data under study, but also for describing patterns in similar data that might be collected by another investigator at another time and in another place.

(...) Thus the very simple model
$y = \alpha x + \beta ,$
connecting two quantities y and x via the parameter pair (α,β), defines a straight-line relationship between y and x. (...) Clearly, if we know α and β we can reconstruct the values of y exactly from those if x (...). In practice, of course, we never measure the ys exactly, so that the relationship between y and x is only approximately linear. (...)

The fitting of a simple linear relationship between the ys and the xs requires us to choose from the set of all possible pairs of parameters values a particular pair (a, b) that makes the patterned set $\hat{y}_1,\ldots,\hat{y}_n$ closest to the observed data. In order to make this statement precise we need a measure of 'closeness' or, alternatively, of distance or discrepancy between the observed ys and the fitted $\hat{y}$s. Examples of such discrepancy functions include the $L_1$-norm

$S_1(y,\hat{y}) = \sum | y_i - \hat{y}_i |$
and the $L_\infty$-norm
$S_\infty(y,\hat{y}) = \max_i | y_i - \hat{y}_i | .$

Classical least squares, however, chooses the more convenient $L_2$-norm or sum of squared deviations
$S_2(y,\hat{y}) = \sum ( y_i - \hat{y}_i )^2$
as the measure of discrepancy. These discrepancy formulae have two implications. First, the straightforward summation of individual deviations, either $| y_i - \hat{y}_i |$ or $( y_i - \hat{y}_i )^2$, each depending on only one observation, implies that the observations are all made on the same physical scale and suggests that the observations are independent, or at least that they are in some sense exchangeable, so justifying an even-handed treatment of the components. Second, the use of arithmetic differences $y_i - \hat{y}_i$ implies that a given deviation carries the same weight irrespective of the value of $\hat{y}$. In statistical terminology, the appropriateness of $L_p$-norms as measures of discrepancy depends on stochastic independence and also on the assumption that the variance of each observation is independent of its mean value. Such assumptions, while common and often reasonable i practive, are by no means universally applicable.

(Generalized Linear Models, P. McCullagh and J.A. Nelder)