The Second Version

13/07/12

Per la Precisione

Il blog Caravella.eu scrive:
Istat, la famiglia è sempre più in crisi: nel 2010 separazioni in aumento del 2,6%


Non me ne vogliano gli autori de La Caravella, ma questo modo di esprimersi mi disturba. E vi spiego il motivo.

"In aumento" significa un processo in corso al tempo presente; se misuriamo la nostra variabile Y a T0 e poi a Tp(resente) troveremo che Y(Tp)>Y(T0). Possiamo poi calcolare la derivata rispetto al tempo, dY/dT e questo sara' il tasso di variazione di Y, espresso in (unita' di misura di Y)/(unita' di tempo).

Si puo'anche esprimere dY/dT in percentuale, ma secondo me e' un modo che causa complicazioni inutili*.

Il titolo dell'articolo pero' da' un aumento del 2,6% senza specificare unita' di tempo e nemmeno riferimenti temporali: nel 2010 in aumento rispetto a cosa? Da gennaio a dicembre, oppure da qualche altro istante T0? Poi nel testo si legge:
Nel 2010 le separazioni sono state 88.191 e i divorzi 54.160; rispetto all’anno precedente le separazioni hanno registrato un incremento del 2,6% mentre i divorzi un decremento pari a 0,5%.


Ah, quindi le separazioni non sono "in aumento", ma sono gia' aumentate, nel 2010 rispetto al 2009 (riferimento temporale, finalmente), e di una quantita' che possiamo misurare esattamente dai dati negli appositi registri.

Insomma, non una bestialita' ma una violazione della precisione di linguaggio che mi e' saltata all'occhio. Perche' la matematica e' niente se non estremamente precisa nel linguaggio.

*Ma a volte aiuta: nel caso dei conti bancari, ad esempio, e' piu' comodo usare gli interessi in percentuale che scrivere "X euro/(euro*anno)".

Etichette: ,

24/11/08

Reaction Rates and Randomness

In a thread over at Misha's there is this significant snippet:
Cortillaen,

The mathematical chances of a simple 200 chain (all left or all right handed) amino acid is .1 in 10 to the minus 40,000. In plain english that number would be expressed as ZERO.

Reference: Evolution; A Theory in Crisis, by Dr. Michael Denton
I suspect that the probability quoted in there is calculated as it were the probability of obtaining a certain string from randomly picked letters (the explanation is not at all clear), which is the wrong way to look at the issue.

Because chemistry is definitely not random.
Even leaving aside the specific properties of biological evolution.

Let's take initially a simple model system one in which a molecule A can react with either a molecule B or a similar but not identical molecule C (say, acid-catalyzed aromatic alkylation where A is benzene, B ethylene and C propylene). The two reactions, and respective rates, are:

A + B --> X r1
A + B --> Y r2

What happens in this worls is that, except for very rare cases, r1 and r2 are different. The difference can be tiny or relevant but it is nearly inescapable.

The study of reaction rates and what factors influence them is the realm of chemical kinetics, and a great deal of effort went into it. Many clever and competent scientists spent a lot of time using sophisticate instruments to investigate reaction mechanisms and rates - and they obtained many good results.

It turns out that the main driver of reaction rates is molecular orbitals - the shape and electron distribution of molecules; however, thinking in terms of molecular orbitals is not the easiest task and in many cases it is enough to refer to a subset of molecular properties such as charge distribution and molecular size and shape - for example, a small molecule can gain access to a reactive centre while a bigger one can not.

The second factor is temperature, which in turn alters the collision rates and energy distribution of molecules (especially in the fluid phase) - molecules must not only collide in order to react, but they have to do so within a defined energy range.

This is only an extremely brief overview of what in fact is a vast field; there are many subtle distinctions and lesser effects and particular cases. The main point is that the structure of molecules determines their reactivty.

Going back to our model system, the difference between r1 and r2 results in different quantities (and generally concentrations) of X and Y; the difference can be tiny initially but - and this is another important point - if the supply of reactants is enough to let the reaction proceed for a long time, eventually the difference of quantities will become relevant - in some cases, also the initial ratios of reactants will influence the final state of the system.

In any case, we will end up with a certain degree of order which did not happen by random chance but neither was deliberataly designed.

Knowledgeable readers will notice that I have left out two or three aspects - namely, reaction thermodynamics and, exact form of rate expressions and the difference between reaction rates and rate constants. The first two are not so important in the context of this discussion, and this just a blog post, not a thesis. The third is in fact more important, but I prefer not to overload a post which is already quite technical.

If the reactants in our models systems were three α-aminoacids (pick the ones you like the most), things would become fantastically more complicate altogether: aminoacids can react with each other and thus form all possible products (AA, BB, CC, AB, AC, CB) each with its own reaction rate; the dipeptides from this first stage can react further producing longer chains, which can in turn react again becoming longer (or eventually be hydrolized at some point), and so on.

A template effect can also kick in, where the peptides already present influence the formation of new ones - to the point of self-replication, as it has been observed.

Even considering only the first reaction stage (dipeptides), in order to have a random distribution of the six products, the six reaction rates would have to be the same - but we have seen that the circumstance has only a near-zero probability of occurring: the distribution of products will not be random.

What will happen next... a random (normal) distribution of reaction rates should be unable to eliminate the non-randomness originating from the distribution of concentration of reactants.

But even if the longer peptides had a completely random sequence of aminoacids, they would not have all the same structure: some sequences would able to fold or coil (as in the omnipresent alpha helix) acquiring different properties regarding both thermodynamics and kinetics.

Some structures may fold in conformations that leave them more exposed to hydrolysis or other degradations and consequently be destroyed.

The first structure able to self-replicate, even imperfectly, would then gain an advantage and tend to grow in concentration at the expense of others.

Etichette: , ,

11/10/08

The Data Grinder

Blogger Jeff Id is an aeronautical engineer who did some investigation into the statistical methods used by Mann to produce his Hockey Stick temperature graph.

And his work has uncovered what probably is a fundamental flaw not only of Mann's temperature reconstruction, but of paleoclimatology at large.

Past temperatures are measured through proxies such as tree ring width, which is (or at least, supposed to be) correlated with temperature of the environment where the tree grew.

These time series, however, are highly noisy and uncalibrated against temperature. Calibration is done by calculating the correlation of the proxy series with instrumental temperature record (that covers only recent times); Mann's method then rejects series having a correlation value lower than a certain threshold. And here is where the problems begin.

The chosen data series are then scaled to match the slope of temperature in the calibration period (or its mean/standard deviation), and all these manipulations have a curious effect: they introduce a rising temperature signal in recent times (obviously, because only series in agreement with this assumption are retained), and compress and offset the signal in the past. Jeff used random noise superimposed to temperature signal that is flat except for a hump in the past to obtain these findings.

He then used actual Mann's data, and not only confirmed his findings, but also that the scaling process gives a disproportionate weight, in the reconstruction, to the series with lowest correlation to instrumental temperature.

But the best comes last. The high noise of proxy series allows for a lot of room in correlating them to a temperature trend, so Jeff was able to fit the same proxies used by Mann to all sorts of temperature trends: linear rise, linear fall, cyclic variations... in a final twist of irony, the reconstruction getting the best score is one with a falling temperature trend in recent times.

Mann's method of temperature reconstruction is fatally flawed: it is only useful to find only what one is looking for. Paleoclimatology must find better methods for the task of reconstructing past temperatures.

Etichette: , ,

11/07/08

Application of Central Limit Theorem

This is a bit of a specialist thing; I'd like to hear the opinion of someone with relevant knowledge.

I've never been much mathematically inclined, and now I am seriously rusty, so I can't find my way out of this problem.

Suppose I am measuring a quantity using an instrument for which the manufacturer declares +/- 10% (or any other value) uncertainty: I take a series of measurements, calculate the standard deviation and realize that, in relative terms, it is less than 10%. Standard error of the mean will be even smaller.

So my question is, which result is the correct one? If the central limit theorem applies, the SEM for the sample should be the correct value.

In any case, back then when I did metrology, relative uncertainty was calculated as the square root of the sum of all contributions, among them SEM for the sample and instrumental uncertainty. That one is a very conservative method.

Etichette: ,

29/05/08

Set Your Mind

Set Theory is a relatively simple (compared to matrix algebra, for example) branch of mathematics, but it's a very important one. It is also useful in practice, but not in the same way that differential equations are.

I began studying an appropriately dumbed down version of set theory in elementary school, and my reaction was more or less "Hey, that's obvious. Why I need to study something that's just obvious?". The reason it happened is that set theory deal mainly with logic, and logic is something we have hardwired in us (well, most of us...)

Later on, studying set theory more in depth, I realized it's not so trivial - but it remians quite intuitive. Its biggest utility is that first of all it teaches logic, and logic is fundamental to make cogent arguments and understand them. Second, the idea of sets - their properties and operations - is a great thought-organizing tool.

While the lack of logic and thought organization are severe problems of this age.

Etichette: ,

17/11/06

Dangers of Extrapolation

Extrapolating is something we do very frequently, even at a rather subconscious level - it is correlated to inductive logic too. But there are dangers within extrapolation that one should be aware of.

In mathemathical terms, extrapolation can be defined as follows. We have a function y=f(x) which expression is known for a given interval of x, from x1 to x2. Extrapolation measn assuming that the expression of f(x) will be the same also for x out of the considered interval. In this way, for x3 we will simply have y3 = f(x3).

The obvious risk is that the expression of f(x) may not be the same out of the interval where it is known.

For example, if we did experimental measurements of some property in certain conditions and found a mathematical correlation, there is no guarantee that in different conditions said correlation will still be valid. In fact, in the scientific and engineering literature there are many examples of equations valid only in a given range of variables, or which require different coefficients for different ranges.

This means that results obtained through extrapolation should be taken with a grain of salt, because they can be amply different from reality. To make a pedestrian example, many saving bank accounts offer different interest rates for different amounts of money. Extrapolating the interest calculated for 1000€ to 10 000€ would thus give the wrong result.

Sure, not all extrapolations are the same: some can be taken with more confidence than others. However, the recommendation is not to extrapolate unless there is no other way.

An interesting twist is when the variable x is time. Extrapolating here means assuming that the time-trend of some quantity will remain the same. If the trend changes, the results of extrapolation will be wrong. Hence, infamous fiascos such as The Population Bomb.

Another complication is that, in the case of natural phenomena, often the function f(x) is obtained using regression or fitting techniques, which introduce errors by themselves. Using these functions for extrapolation then adds doubts to errors.

Extrapolation is a useful technique, but saddled with considerable inherent uncertainty.

Etichette: , ,