segunda-feira, janeiro 22, 2007

Mentiras, grandes mentiras e estatísticas

Não há praticamente nenhum campo da Biologia em que se não use a Estatística. No entanto o desenvolvimento da disciplina e dos programas de computador que o acompanham resulta num campo ao mesmo tempo extremamente vasto, extremamente complexo e ao alcance de qualquer idiota com um rato.


Este texto de Clay Helberg, Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies) resume os principais problemas de uma abordagem descuidada à Estatística, dando uma série de bons conselhos no final:

  • Be sure your sample is representative of the population in which you're interested.
  • Be sure you understand the assumptions of your statistical procedures, and be sure they are satisfied. In particular, beware of hierarchically organized (non-independent) data; use techniques designed to deal with them.
  • Be sure you have the right amount of power--not too little, not too much.
  • Be sure to use the best measurement tools available. If your measures have error, take that fact into account.
  • Beware of multiple comparisons. If you must do a lot of tests, try to replicate or use cross-validation to verify your results.
  • Keep clear in your mind what you're trying to discover--don't be seduced by stars in your tables; look at magnitudes rather than p-values.
  • Use numerical notation in a rational way--don't confuse precision with accuracy (and don't let the consumers of your work do so, either).
  • Be sure you understand the conditions for causal inference. If you need to make causal inference, try to use random assignment. If that's not possible, you'll have to devote a lot of effort to uncovering causal relationships with a variety of approaches to the question.
  • Be sure your graphs are accurate and reflect the data variation clearly.

Sem comentários: