Get Adobe Flash player

The Measurement Challenge

I’m reintroducing the Measurement Challenge for the blog.  I ran it for a couple of years on the old site and had some very interesting posts. 

Use this thread to post comments about the most difficult – or even apparently “impossible” – measurements you can imagine.  I am looking for truly difficult problems that might take more than a couple of rounds of query/response to resolve.  Give it your best shot!

Doug Hubbard

58 Responses to The Measurement Challenge

  • bcalcott says:

    Hi Doug,

    I thought the book was excellent and I am already making practical use of it in the area of project portfolio management, so many thanks.

    One comment I would make is in regards to the passege in the book describing the thoughts of Stephen J. Gould on subject of IQ tests. as you porbably know, Gould’s argument in “The Mismeasurement of Man” was that tests for intelligence were primarily a tool to prove the superiority of one culture or race over another. I think his scepticism with regards to IQ tests were not that they fail as a basis for measurement per se, but more that they failed as a measurement for intelligence. Intelligence as a concept is ambiguous at best and, as you suggested at the start of the book, needs to be unpacked (including the motives for requiring measurement in the first place) before it can be meaningful. In the case of an IQ test, what you would be measuring would be the ability to solve a particular set of problems. In the case of measuring potential brain damage this would be useful. it would not however tell you if a particular subject was less or more intelligent after the incident.

    Anyhow, fantastic book and more power to your elbow.

    Regards

    Brian Calcott

  • dwhubbard says:

    Brian,

    Thanks for your input. But I was refuting Gould specifically regarding his claim that IQ tests are not a measure of intelligence. He stated that the IQ score is nothing more than an “artifact” of the mathematical method used to compute it. This alone is an easily testable and falsifiable claim. If it really were nothing other than an arbitrary artifact of a calculation, we should not see any correlations between this score and other measurable phenomenon like income, incarceration, welfare, and so on. And yet we do see correlations between IQ scores and behaviours we would generally associate with high or low IQs.

    IQ tests do, however, have lots of error for all of the reasons Gould lists. But the complete lack of error was never the criterion for a measurement. Even a test that has 90% confidence errors of +/-20 points two thirds of the time and then gives a completely spurious result a third of the time is still a reduction in uncertainty. I think this is where IQ test skeptics miss the point. They seem to be saying that if an IQ test even *can* give a wrong answer then it is measuring nothing. This is a misunderstanding of how the term measurement is used in the sciences.

    What they are not considering is the previous state of uncertainty before a test. Suppose you have persons A and B. You know nothing about them prior to a test. You could only say that there is equal likelihood for either to have a higher intelligence than the other. Then they take an IQ test. Person A scores 145 and person B scores 95. Now would Gould say that this result would not even justify a probability of 70% that A has a higher interligence? How about just 60%? How about if the tests results were 165 and 80 respectively? Would the chance that A has higher intelligence than B still only be 50%? I think If we repeated this test on different pairs of people many times, and you had to place real-money bets on who would perform better at other tasks associated with intelligence (however it is defined) like college grades, income, or ability to write, I’m sure you prefer to keep betting on the person with the higher tested score, especially if the differences were 30 points or more.

    Again, it is important to separate the claims that a measure has a lot of noise and little signal from the claim that it is only noise and no signal. Gould’s statement that IQ is merely an artifact of the method used to calculate it would imply he believes the latter. This would mean that no matter how far apart two people were on their IQ test scores, no matter how many trail pairs of people we use, that we would never have any reason to believe one person is even slightly more likely to be more intelligent than another. When you think it through, this is an extrodinary claim.

    Of course, we have no basis for being certain that a person who scores 125 is actually more intelligent than one who scores 105. Maybe we couldn’t even be certain if the difference were twice as wide. But measurement isn’t about achieving certainty. Its about reduced uncertainty. For example, would you really bet your own money that pairs of people whose tests scores were 40 points apart are indistinguishable from those wit the same score? How about 60 points apart? How about 100? Is there seriously no point at which the probably that one person in the pair ever budges from 50% chance of being the more intelligent? I know where I would bet my money on repeated tests.

    Regarding your example, why would it necessarilly be the case that IQ tests before and after a brain injury must tell us absolutely nothing? Are you saying that even if the person consistently scored above 150 before an injury and cannot score more than 100 after an injury that we can still conclude nothing at all about a change in intelligence? Are you saying that even if the difference was twice as much (a 100 point difference) we could still learn nothing? That also seems like an extraordinary claim to make.

    If IQ tests reduce uncertainty even slightly even just some small percentage of the time, it meets the criteria of a measurement in its strictest mathematical sense. And when it comes to important issues of public policy (such as the example I used where methyl mercury is correlated to reduced IQ points of children), a slight uncertainty reduction is preferable (perhaps the equivalent of millions of dollars preferable) to no uncertainty reduction at all.

    Thanks,
    Doug Hubbard

  • Model_Math says:

    Potential for organizational change.
    I am thinking of measuring the level of resistance to accepting change — what’s it going to take to blast an unproductive culture out of behaviors and methods that they freely admit are not working? (This is not academic or frivolous.)

  • sgordon says:

    A followup on the analysis of IQ as a measurement of intelligence (and then onto a more general point):

    It seems to me that M reducing uncertainty in evaluating V is a rather low hurdle for calling M a measure of V. By this criteria, wouldn’t Income be a measure of Intelligence?

    Clearly, to say Income is a measure of anything other than Income makes the concept of measure unnecessarily confusing and much more prone to be misapplied. In this light, I would say IQ is only a measure of what has been defined as IQ. The question then becomes how well does IQ correlate with what we believe Intelligence to be, which seems to me to be a much more clear debate than whether IQ is a measure of Intelligence.

    The main policy problem with substituting IQ for Intelligence is not so much that the correlation is not 100%, but rather than the correlation seems to vary quite a bit among different populations. In so many cases, the method for deriving a concrete measure that correlates well with an intangible concept is very context sensitive. For example, deriving a concrete measure that correlates well with intelligence in western populations may not correlate well in African populations. Deriving a concrete measure that correlates well with what we think of as Productivity on a factory floor in company A may not correlate well on factory floors in company B or in software development offices in either company. To call the measure Productivity instead of what it directly measures makes it too easy to overlook this contextual factor, and start applying a Productivity measurement where it does not apply.

    Understanding this issue is more clear if our language for measurement makes it more clear that:
    1. We can only directly measure concrete measurements (like IQ, throughput, income, MPG, age, …),
    2. We cannot directly measure intangible concepts (like Intelligence and Productivity),
    3. The book is about how to derive concrete measurements that correlate highly to intangible concepts in given contexts, not how to directly measure intangible concepts (and especially not in a context-independent way).

    Steven Gordon, PhD

  • dwhubbard says:

    Steven,

    Thanks for your input. I would argue that whether a measurement is “direct” or not is an unambiguous distinction mathematically, semantically and at the level of fundemental epistemology. This distinction adds no clarity whatsoever because most – or perhaps all – measures of properties in physical sciences are not “direct”. They rely on readings of an instrument (e.g. an digital scale, ohmmeter or photometer) and rely on inferences from related observations (e.g. observing the deflection of a particle in a magnetic field or the movement of planet to measure mass). I would even argue that we only percieve reality indirectly in the first place. All we can do is make observations to make inferences that ultimately have some utility to practical decisions.

    In the definition I propose for measurement (which is, in fact, consistent with information theory, decision theory and measurement theory) income really is a a measure of IQ which is, in turn, a measure of intelligence. Of course, this would be a measure with a huge amount of error but, on average, estimates based on income would be slightly more accurate than estimates based on no information at all. If you doubt this, let me propose a game that would test whether you really believe your position. Let’s identify 100 people who have had IQ tests representing a wide range of IQ’s. We will know nothing about them and we will not be told their IQ’s, except that I will be told thier income from last year and you will not. We will sort them into random pairs and we will bet which person is smarter. You should be indifferent because you would have no information about whether one person has a higher IQ. But since I know their incomes, I will not be indifferent. I will, for each pair, choose that the person with the higher income also has the higher IQ. Each time I’m wrong I pay you $100 and each time I win you pay me $100. I would love to play this game for a million people if we could. If I’m right only slightly more often then I’m wrong, then I would make a lot of money off of you. I would even be willing to pay you $500 to play this game with me for 100 persons. Now would you prefer to be the one of us who had the income information? Would you even be willing to pay me if you had it? You might say you are not a betting person, but I think betting is simply the ultimate test of whether a person really believes some position they have. You might say this is impractical but I think we could find a way to make it happen.

    The fact is that even when there is a small correlation between X and Y, knowledge of X slightly reduces your uncertainty about Y. And I would say that any definition of measurement that you propose that ignores the reduction in uncertainty would be far more confusing. You would have to resort to a definition that merely describes a procedure regardless of whether the result is informative. And if you said that the procedure only resulted in a measurement if it was informative while rejecting that the threshold is any uncertainty reduction, then you would have to define how much uncertainty reduction is required. Presumably, you would have to define this arbitrary threshold differently for every possible kind of measurement since error in measurements vary widely among fields.

    Taking your claim “Clearly, to say Income is a measure of anything other than Income makes the concept of measure unnecessarily confusing and much more prone to be misapplied.” to its logical conclusion, you would have to reject the validity of most measures in the physical sciences since, as I said, most are inferences based on indirect observations (nobody has ever “directly” measured the mass of an electron or a star or the age of a rock or even time itself). If, as you say “IQ is only a measure of IQ” appears to say that if X is correlated to Y, X is still only a measure of X and it reveals nothing about the possible values of Y. If this is the case, then you must also hold that the glow of a hot body is a measure only of its glow, not its temperature. You must hold that the depth of a rock formation is only a measure of its depth, not its age. You must hold that a credit score is only a measure of a credit score and says absolutely nothing about whether someone is an acceptable lending risk – meaning that a 1000 people with credit scores of under 600 are just as good a credit risk as 1000 people with scores over 750. There will be exceptions, of course, but given a portfolio of 1000 people, do you really want to charge the same interest for the under-600 score group as the over-750 score group?

    You stated “The main policy problem with substituting IQ for Intelligence is not so much that the correlation is not 100%, but rather than the correlation seems to vary quite a bit among different populations. In so many cases, the method for deriving a concrete measure that correlates well with an intangible concept is very context sensitive. For example, deriving a concrete measure that correlates well with intelligence in western populations may not correlate well in African populations.” How is this more uniquely “context specific” than any other measure I just mentioned? As before, if this context-specificity you speak of was any real obstacle at all to measurement, then again, most of what we know from science could not be possible. There is no complexity you can think of that does not have a measurement solution. Many of the procedures in scientific method are specifically to controll for such issues.

    Or perhaps you believe that it is not the “directness” but the mere existance of error that undermines any measurement. There is a common fallacy that the the existance of noise means a lack of signal. This would force you to defend yet another position that would be impossible to reconcile with all scientific knowledge. Or perhaps you believe that the existance of noise (i.e. error) in a measurement means that any signal that does exist must not have any utility. This would be impossible to reconcile with both decision theory and common sense. If you knew a coin favored heads on a flip just 52% of the time, that knowledge would be worth a lot of money over a large number of flips.

    On the other hand, I don’t deny that people misuse measures. Regarding your comment about productivity measures, if someone actually equates a measure of productivity on a factory floor to productivity in creating software, I could only respond that this would be a straw man argument. I make no claim that two obviously uncorrelated factors say anything about each other. I also point out in How to Measure Anything that most organizations measure the wrong things (because they are not computing the value of uncertainty reduction). If they applied this method consistently, they would measure what matters. And, as a side note, don’t confuse measures used for decisions like project approval to incentive programs. They are two different issues. Good incentives are based on good measures but that doesn’t mean any measurement should be part of some incentive. I clearly argue against this in the book.

    And don’t forget that you always are comparing a decision analysis method based on measurements to some *other* method – presumably your intuition. That also has a measurable performance and research shows it is often not hard to beat with even simple quantitative methods.
    Again, if you really believe what you believe, then let’s start recruiting some people for our bet.

    Thanks for your contribution to the conversation and I look forward to your response.

    Doug Hubbard

  • stefano.palestini says:

    Douglas,
    We are improving new procedure for financial planning whose target is to reduce the range of approximation of the foreseen from the actual range +- 20 Million likelihood 30% (as the greatest difference between actual and forecast of net financial position) +- 5 Million likelihood 70% (the most narrow difference actual vs forecast) , to the new interval +-15M E 10% /+- 5M 80%. The cost of the investment is Euro 500 K.
    It is possible and meaningful measure the value of the better information available in the new procedure of financial planning using EVPI as described in the capitol “Measuring Value of Information” in your book How to Measure Anything (Paragraph “the value of information for ranges)?
    Else what you suggest to set better the problem of the value of information?
    Thank you in advance for your answer.

    stefano palestini
    internal auditing & risk management

  • Douglas Hubbard says:

    Stefano,

    This sounds like a perfectly viable problem for the value of information. However, I would need clarification on something you said. It seems you are saying that the current accuracy of forecasts to actuals has a 30% chance of being within 20 million and a 70% chance of being within 5 million, right? That seems backward to me. The wider range should have the higher probability unless I misunderstand how you are stating this. The same would seem to hold for the target accuracy.

    But, that aside, improving the value of forecasts is certainly something the information value pertains to. You have a cost of overestimating and/or a cost of underestimating, right? How much would you lose for every million you over or under estimate? The product of this “loss function” and your probability distribution for the forecast is the expected opportunity loss (EOL). You have an EOL for the current state and for the desired target accuracy. The difference between the two EOLs is the value of information. Technically, you would only be computing EVPI if you were comparing the value of your current accuracy to perfect forecasts, which I’m sure is not what you mean.

    Thanks for your comment,
    Doug Hubbard

  • stefano.palestini says:

    The likelihood comes from the observation of differences between the quartly financial position forseen and the actual, during 3 years (12 observation) I’ve seen 9 cases (70%) where the difference has been inside the range +/- 5 million and for 3 other cases has been +/- 20 million.

    Thank you in advance for your answer.

    stefano palestini

Leave a Reply