leolca's blog: julho 2010

domingo, 18 de julho de 2010

Combination

Here is a simple function for Octave/MatLab I wrote to create the combinations of the numbers in a vector. Suppose you want to get the possible combinations of the numbers [1 2 3 4] arrenged in 3, what would give you [1 2 3], [1 2 4], [1 3 4] and [2 3 4]. You just have to use the function bellow calling X = combinations([1 2 3 4],3), and X will be the matrix with all combinations.


function X = combinations(x,k)
%
%  X = combinations(x,k)
%  Create all combinations (without repetition) of the
%  elements in x arragend in groups of k items.
%  example:
%  x=[1 2 3 4];
%  X = combinations(x,3)
%  X =
%     1   2   3
%     1   2   4
%     1   3   4
%     2   3   4
%

if(size(x,1) > size(x,2)), x = x'; end;
n = length(x);
X = [];
if(k == 1),
  X = x';
else
for l = 1 : n-k+1,
    C = nchoosek(n-l,k-1);
    xtemp = x;
    xtemp(1:l) = [];
    X = [X;  [repmat(x(l),C,1) combinations(xtemp,k-1)] ];
end;
end;

quinta-feira, 15 de julho de 2010

Holiday

(taken from www.phdcomics.com)

quarta-feira, 14 de julho de 2010

Subtitles on PS3

Unfortunately the only way I found, until now, to play downloaded videos with subtitles on my PS3 is using a tool called AVIAddXSubs. This tool create a DivX video by adding a source video (.avi, .mpg. etc) and its subtitle (.srt). The subtitle is added to the DivX container (the DivX Media Format (DMF) has support to multiple subtitles, multiple audio tracks and multiple video streams, among other things, just like Matroska). Although AVIAddXSubs is a Windows program, it might run on Linux, thanks to Wine. I have just tried it and it did work! I could get my video playing on my PS3 with subtitles. :)

quinta-feira, 1 de julho de 2010

models

(...) The value of a model is that often it suggests a simple summary of the data in terms of the major systematic effects together with a summary of the nature and magnitude of the unexplained of random variation. (...)

Thus the problem if looking intelligently at data demands the formulation of patterns that are thought capable of describing succinctly not only the systematic variation in the data under study, but also for describing patterns in similar data that might be collected by another investigator at another time and in another place.

(...) Thus the very simple model
\[ y = \alpha x + \beta ,\]
connecting two quantities y and x via the parameter pair (α,β), defines a straight-line relationship between y and x. (...) Clearly, if we know α and β we can reconstruct the values of y exactly from those if x (...). In practice, of course, we never measure the ys exactly, so that the relationship between y and x is only approximately linear. (...)

The fitting of a simple linear relationship between the ys and the xs requires us to choose from the set of all possible pairs of parameters values a particular pair (a, b) that makes the patterned set $\hat{y}_1,\ldots,\hat{y}_n$ closest to the observed data. In order to make this statement precise we need a measure of 'closeness' or, alternatively, of distance or discrepancy between the observed ys and the fitted $\hat{y}$s. Examples of such discrepancy functions include the $L_1$-norm

\[S_1(y,\hat{y}) = \sum | y_i - \hat{y}_i | \]
and the $L_\infty$-norm
\[S_\infty(y,\hat{y}) = \max_i | y_i - \hat{y}_i | .\]

Classical least squares, however, chooses the more convenient $L_2$-norm or sum of squared deviations
\[S_2(y,\hat{y}) = \sum ( y_i - \hat{y}_i )^2 \]
as the measure of discrepancy. These discrepancy formulae have two implications. First, the straightforward summation of individual deviations, either $| y_i - \hat{y}_i |$ or $( y_i - \hat{y}_i )^2$, each depending on only one observation, implies that the observations are all made on the same physical scale and suggests that the observations are independent, or at least that they are in some sense exchangeable, so justifying an even-handed treatment of the components. Second, the use of arithmetic differences $y_i - \hat{y}_i$ implies that a given deviation carries the same weight irrespective of the value of $\hat{y}$. In statistical terminology, the appropriateness of $L_p$-norms as measures of discrepancy depends on stochastic independence and also on the assumption that the variance of each observation is independent of its mean value. Such assumptions, while common and often reasonable i practive, are by no means universally applicable.

(Generalized Linear Models, P. McCullagh and J.A. Nelder)