I’m reintroducing the Measurement Challenge for the blog.  I ran it for a couple of years on the old site and had some very interesting posts. 

Use this thread to post comments about the most difficult – or even apparently “impossible” – measurements you can imagine.  I am looking for truly difficult problems that might take more than a couple of rounds of query/response to resolve.  Give it your best shot!

Doug Hubbard

64 comments

  1. How can we measure improvement in problem-solving after attending training in critical thinking?

  2. dwhubbard

    Testing critical thinking skills is actually a well-developed area of psychology. One way is to simply give different tests before and after the training. Different people can take the tests in different order so that any change in results isn’t just due to the second test being easier. There are several standardized tests for this. (A Google search on “critical thinking skills test” produces quite a few good hits.)

    Another way is to look into what the critical thinking skills are supposed to do for your firm. Why do you care that thier critical thinking skills improve? Is it because they regularly make decisions that affect the performance of their part of the firm? Then perhaps that is what you need to measure. But then the question is how do you know that the training was the reason for the performance improvement? In this case you might try training different people at different times and see if there are correlations between measured performance and when they took the test.

    In either case, if you have at least a couple of dozen people being trained and tested, you may have the basis for finding the correlation you are looking for.

    Thanks for your participation in the blog,
    Doug Hubbard

  3. janet 67

    How can we measure ‘fun’ as a benefit from a creative arts project? (in evaluations it is identified by participants as their most valued benefit from engagement with the process).

  4. Natalie7

    Okay, here are three things for you to try:

    1) My partner was at Cambridge, and they discussed one day how to measure the amount of chaos in a system – how do you measure this? I don’t know what answer they came up with.

    2) Progress of society, in itself over time or to compare with other societies?

    3) ‘Rightness’ of a judicial decision?

    Challenging enough? Also, very interested in the answer to Janet 67′s question

  5. Natalie7

    sorry, one more (more practical this time, rather than trying to come up with a ‘clever’ challenge):

    I work as a school nurse and I’m interested if there is anyway a person can measure the impact of sex education on reducing teenage pregnancies.

    How can you measure something like that, with all the variables involved, the secrecy/taboo factors, the uncertainty around motives (people who want to get pregnant anyway), how life happens by chance and you just end up in that situation, whether you understood it anyway, and so on.

    And presumably even if there is some statistic that you could use for a population (which I cannot even imagine), you could never make a prediction for any individual’s ‘improved’ position in respect of having the education.

    Look forward to your thoughts!

  6. dwhubbard

    I’m back from my long European trip. I thought I answered Janet67′s question at one point but apparently not. Anyway, I remember what I was going to say.

    As with all measurement problems we have to start with the definition. I will apply these following questions to each of the several measurement challenges I’ve been given including from jdconsultant and from Natalie7. These questions are:

    1) What do you mean by “X” (fun, innovation, chaos, etc.)? In other words, what are the observable consequences of this? If you knew it existed or if you knew ou had a reason to measure it, you must at least have conceived of some possible observations of it. Defining the problem is the beginning of all scientific epiphanies.
    2) Why do you care about it? Defining what you will do with the information tells us something about what you are really trying to observe. Are you trying to predict something? Are you going to make a decision based on this?
    3) What do you know about it now? It is unlikely that anything you really want to measure has bounds of negative infinity to possitive infinity. Once you defined

    If we can’t answer these questions about a measurement challenge, then thats no different than asking “How many Whazinkles can finakvil a bunch of huxoopers?” It other words, they only seem like immeasurable because you haven’t really figured out what you mean by them.

    So how do you observe “fun”? You must have observed it before, otherwise it is unlikely you would even proposed that it is something that might be an “amount”. In an example in my book, I show how the quality of performances at the Cleveland Orchetstra were evaluated by measuring the number and duration of standing ovations. Of course, the board could have simply used a survey, but the standing ovation measure might even be more revealing (and much less of a bother to patrons).

    But lets consider both options. One is a “revealed” opinion, the other “stated”. That is, one measure is based on what people do, the other on what people say. You could use a survey of parents and/or children. But I think you would also find revealed feedback is possible. What other observations do you make that tell you kids are having fun? Are they noisier? Are they more active? Do they laugh? Do they smile? A video that is analyzed with impartial judges armed with timers and notepads are a great place to start.

    Now ask why you care? Isn’t it obvious they are having fun? Or are you saying that some decision will rely on this measure and it may come down to which activities are more fun than others?

    I’ll await your answers before I finish.

    Thanks,
    Doug Hubbard

  7. dwhubbard

    I’m always curious when people ask how to measure something that is actually the subject of well-developed areas of mathematics. Chaos is one such area. Feigenbaum and Mandelbrot had measured something they later called chaos (although it was not always a well defined concept). Chaos theory is all about expressing the concept in some mathematical rigor. Feigenbaum’s Constant is a measure of chaos.

    If you mean “randomness” (which is not strictly the same as chaos, but as far as I know that might be what you mean), then there is another well developed method for that. Claude Shannon showed that the randomness of a signal could be measured by how small the signal could be compressed. If the signal was a repeating pattern of 01010101 then it would not take much to state it even if the signal when on for a million digits. A truly random signal, however, would be difficult to compress by very much at all.

    But lets go to another one of the basic questiosn I like to ask: Why do you care? What was the decision that would be based on measuring the chaos of a system? It is ok if you were just curious. In that case it might suffice to just state it unambiguously for the purpose of academic publication. But was some busienss decision based on this? Why? That would tell me a bit about what you intend to measure.

    I’ll await your response on this item but I’ll proceed with just one more of your questions before I turn in tonight (the flight back from London was long).

    Regarding measuring progress, again, the first question, as always, is “what do you mean” followed by “why do you care?” The answer the second actually helps with the answer to the first if it is not apparent. Do you mean technological progress? Do you mean progress in terms of laws, democracy, scientific exploration? Regarding the “why do you care?” question, is it because you are thinking that people in societies with more progress are happier?

    So that ball is in your court to respond to those questions. I’ll respond more this week.

    Thanks,
    Doug Hubbard

  8. Natalie7

    Hope you had an interesting trip!

    Okay, to try and answer your questions.

    Maybe ‘chaos’ was the wrong word. Maybe my partner was interested in that academically, I must ask, probably. But from my perspective I was thinking of something Goldratt said, about the degrees of freedom of a system. This is probably a different thing altogether, so perhaps I am using language sloppily.

    If you look at this video, you might get a better idea of what I am asking:

    http://www.youtube.com/watch?v=tWvMODJ9cVc

    Goldratt talks about how complex a system is, and then he goes on to say something about how you can describe a system in four sentences versus a thousand pages, the latter being the more complex.

    Maybe this is the measurement after all, but this sounds interesting but not sure how it helps a decision. Perhaps I am being short-sighted, but I can’t see how it helps in itself.

    Then he goes on to talk about the degrees of freedom of a system which seems to be the thing that matters in practical terms. So I think want to know is how to count the degrees of freedom.

    The progress question is a little different, but I think it is inspired by the whole Barack excitement and the economic recession stuff.

    If Barack does a good job, presumably he moves society on. Or is that just an assumption. But there are so many things you could measure to determine that as you point out. How do you arrive at some kind of ‘net’ or aggregate value or whatever that says ‘yes, things moved on’ or got better. Surely that is what we all want to know when we vote in an election (we have one coming up in the next few weeks here).

    But presumably you can’t because all of that is so value-laden.

    Lord Ashcroft over in the UK, where I am, said that he felt that once the recession was over there would be a new world order, and that he felt the UK would be lower in that order than pre-recession. I know he is probably just talking about economic order, but I want to get at what that ‘order’ is fundamentally across all things.

    The decision it might support is where I live or where I want my children to grow up! Maybe that is about happiness.

    Maybe it begs the question, what is the measure of the value or quality of life? Surely you can’t answer that one!

  9. Natalie7

    Sorry, in case you need to look him up, I meant, Lord Ashdown – (Paddy), former leader of the Liberal Democrats. Too many Lords over here!

  10. dwhubbard

    To Natalie7,

    Before I head off to work for the day, let me answer your most recent question first. Of course you can measure the value and quality of a life. And, just like in all of your other questions, it is already being done.

    In the US, UK and any country that has government agencies concerned with the allocation of resources to public health and safety, these decisions must be made. And in many cases, they are made with the help of a measurement. Both in the US and many other countries there is a method called the Value of a Statistical Life (VSL). This is based on the idea that people are only willing to pay up to a certain amount to reduce their OWN chance of mortality by a given increment. Would you pay an extra $10,000 for a car that was proven to reduce your chance of death on the road from 1% per year to 0.5% per year? Some peopel would, some would not. However you change the numbers, there is a limit to what you would pay. If you found this payment just barely acceptable (you would pay just $10k for a 0.5% reduction in chance of death), then you presumably value your life at about $2 million.

    And you make several other decisions like that throughout the course of your life. Some governements (like the US) collect this as a “revealed preference” indication of how much you value your own life. That is, instead of asking people how much they are willing to pay to save their own life, they simply record what they *actually* spend. In the US, people appear to act as if they value their own lives at somewhere between $2 million and $20 million.

    Quality adjusted value of life is, of course, just standard operating procedure for any country that has to allocate limited health care resources.

    But your issue here is hidden in your statement that something cannot because “it is so value-laden.” Why is that an obstacle? When someone agrees to pay $200 for a painting, but would have refused it at $300, they told you something about how they value it. There is in fact a huge industry dedicated to measuring opinions and values using surveys and market research. This is no different.

    I have a bit more time so I will also address on other comment you made in an earlier post before I leave for the day. You indicated that something would be difficult to measure because of “all the variables involved.” Are you under the impression that all or even most variables must be known before one particular variable can be correlated to an outcome? If that were the case, then all clinical drug trials would be impossible. But this is a common fallacy. In order to correlate whether one drug reduces, say, ulcers, you don’t need to first measure or even know all the other variables. We are not concerned with whether the drug fixed one particular subjects ulcers. What we want to know is whether the test group did significantly better than the placebo group. Since the drug vs. placebo was the only systemic difference between the groups (all individuals were randomly assigned to the placebo or test groups) then at some point it is unlikely that an observed difference between the groups could have been due to some other factors.

    I’ll try to get to the other questions this week. But one item I noticed right away is that each of these questions are something that someone already measures on a regular basis. Always assume its been measured before. My response will be little more than telling you how other people already do it (a little Google searching would reveal the answer in each of these questions).

    Finally, I will note that almost everything everyone has mentioned so far is not only already a measurement problem solved by others before me, but many similar examples were already mentioned in my book, including the VSL, measuring subjective value, and isolatiing the effect of variables (in addition to my advice of assuming its all been measured before and to define the measurement problem in terms of observables). Have you seen the book?

    Thanks,
    Doug Hubbard

  11. Natalie7

    I have seen but not read the book yet. It is on my recommended reading list along with this website.

    I am in general quite convinced by most your comments, but I’m not entirely sure that a £ or $ value is the right measure of the value of life, just for instance. Is this really a useful measure, unless you are a politician, policy maker or planning a healthcare budget?

    I have seen QALY (quality adjusted life years) used before in healthcare over here, and it does offer something and maybe is the only way to do things. But I am not sure that that is meaningful. Do you know what I mean?

    How does that help a frontline clinician – who is bound first and foremost by a professional duty to the patient in front of them – in their decision-making?

    I really do need to read the book, don’t I? :)

  12. Okay Doug, I have one for you! The obvious one I think…

    How can I measure the value your book will bring me over the course of my life and how can I measure the certainty you are right in advance?

  13. dwhubbard

    There are as many different measurement methods for a life as there are decision objectives. Your challenge was originally that the value of a life was not measurable at all. I gave one example where it was for a given objective (such as a policy maker who allocates limited resources).

    If you had some other specific objective for a particular person in mind, that is part of the clarification process of any measurement. Are you asking about a triage situation where a health-care professional has to decide who to save after a big disaster? Are you asking about how a hospital decides to keep a dying person on expensive life-proloning measures? If you are not confronted with allocating limited resources to save a life, then what is the actual delimma?

    QALY may not be a meaningful measure depending on the decision it must support. And duty-bound is only one constraint – the other constraint is limited resources. You are duty bound to help as many as you can after a major disaster. That itself offers a measurement problem in terms of triage. Of course, this would be something that requires immediate decisions, but that, again, is where a simple equation is often better than a human judge.

    More to come…

  14. dwhubbard

    Picorna,
    Do you make big decisions? How often? How often are they right? You can measure your uncertainty about each of these (and your uncertainty about my claims) using the method called “calibrated probability assessment” I describe in the book. Assessing your own uncertainty subjectively but quantitatively is a skill you can learn (it turns out that bookies are quite good at putting odds on events – they are right 80% of the time they say they are 80% confident and so on).

    Describe your uncertainty about the size, frequency and % correct of decisions you make in a given year by applying a calibrated confidence interval (a range that represents your uncertainty). Also apply a range to how many years you have left to live and a range to the reduction in decision errors from reading my book (the book itself offers estimates of reductions in decision errors based on previous research). Decision errors due to your inconsistency, overconfidence, and tendency to be concerned with the wrong variables are all significant factors. Even just a slight reduction in any of them would be a big payoff. The payoff is itself measured as a range (the book describes how to compute an output as a range instead of an unrealistic point value). Unless, of course, you have no risky decisions of any kind in your life, have no chance of being wrong, or expect to die very soon.

    Thanks,
    Doug Hubbard

  15. Mr. Hubbard, I am just finishing your book and have greatly enjoyed it. I do have a question about measurement. You cite work that analytic measures (even simple regression models) work as well or better than experts (even if they slightly underperform, you could determine the value of the better prediction and if the added experts are worth the cost). To play devil’s advocate, when no measurement model is present a model based on ‘uninfluenced’ behavior may be a good predictor. However, once the model criteria are known you have now created ‘measurement influenced’ behavior (a type of Hawthorne Effect). Examples are “teaching to the test” and the influence of university admissions criteria on high school students and parents. Are the (now measurement influenced) students that have checked all the right boxes for college admissions still the best predictor of success? Is the answer to continually measure and revise the criteria or keep the criteria/measurement device hidden to avoid influencing behavior? It seems that those evaluated would continually try to game the measurement system.

    A recent related article on NCLB:

    http://www.slate.com/blogs/blogs/thewrongstuff/archive/2010/05/17/diane-ravitch-on-being-wrong.aspx

  16. dwhubbard

    It doesn’t appear with necessarilly disagree at all (and you even mention the Hawthorne effect I mention in the book) about your main point. I actually talk about the specific problem of unproductive incentives from measurements in the second edition of my book in regards to “The Houston Miracle”.

    First, we have to separate the issue of uncertainty reduction for management vs. incentives for everyone else. There are many ways to reduce uncertainty and for each of those ways there are multiple possible incentive structures. You can have a great set of measures combined with poor incentive structures in such a way that you create more problems than you solve. But there are incentive structures that outperform other incentive strucutures. The fact that someone stumbles across one incentive structure based on a measurement which turns out to produce undesirable outcomes does not mean that all measurements should be abandoned. It just means they did it wrong.

    When we evaluate the performance of any method for anything we have to ask “compared to what?” If the comparison is to decision making based on unaided intuition, then all we need to do is outperform intuition by enough to justify the cost of the analysis method.

    Also, there are actually incentive systems called “proper” systems that are mathematically impossible to game. The “Brier Score” is one I mention in the second edition. It is an incentive system for making forecasts where the only way to maximize the score is to make better forecasts at the appropriate level of confidence. The Oakland A’s developed a set of baseball statistics that better correlate to the outcomes of games. The only way to “game” the system is to win more games, which is what management wants.

    The flaw in most incentive systems is that if X correlates to Y, then we must incentivize X, which changes the relationship to Y. But all we need to do is compute the “contribution to Y” regardless of X.

    Finally, we shouldn’t lose sight of basic benefits of measurement because it is possible to identify anecdotes where incentives based on measurements turned out to produce unproductive results. How often is this really the case? Should we be paralyzed into inaction because this happens 10% of the time? How about 1% of the time? The fact is, prior to implemention the measurement and incentive program, we don’t really know. But that just means the incentive program itself needs to be measured.

    Thanks for your input,
    Doug Hubbard

  17. No, I didn’t think we disagreed, but I wanted to hear your opinion on the influence of measurement on behavior. Your distinction of measurement and incentives based on measurement clarifies the issue. It sounds like I just need to finish the book and these issues will be addressed. Thanks for your response. Between your book, Sam Savage’s book, and Stephen Powell’s writings, I have greatly changed my view of how business analytics should be presented to our students.

  18. Richard Watson

    You have a considerable enthusiasm for Monte Carlo simulations, but have you ever compared these to Fuzzy Logic solution development, as used by the UK RiskAid product

    regards
    Richard Watson

  19. dwhubbard

    Please excuse my delayed response. I have a question, first. How large are these investments? If they are in the range of several million dollars and higher, then the full risk/return analysis I describe would be appropriate. If these are product decisions that are each a few hundred thousand dollars or less, then a simple approximation with the Z-score or perhaps the Lens method (which I also discuss in the book) is probably an improvement on unaided intuition.

    I don’t believe some subjective score is always avoidable but you are correct to be cautious. My first assumption I try to apply in any such situation is that the subjective score must be missing some more fundamental point. In this case, why is innovation important? I like the Madison Avenue quote “If it doesn’t sell, it wasn’t creative.” Evaluating whether something is innovative or not is beside the point. Don’t you really just care whether it might be commericially succesful or have some other major benefits? How is being innovative a benefit by itself it the innovation doesn’t produce other observable benefits? If it doesn’t produce other observable benefits, how innovative was it, anyway?

    For those reasons, I would avoid innovation altogether. It is simply not a benefit on its own. You are ultimately concerned about the effects of the alleged innovation, not the innovation itself.

    If you do have other factors that may be modeled by ordinal scales, I would first consider the possibility that they, too, are really hiding some underlying obervable result which is the real focus of your objectives. If you still find it necessary, your approach makes sense but show the result as the result of a survey of several people. The factor than really becomes “The percentage of experts who judge this project to be X.”

    I would consider trying a Lens model as well. It does show a measurable improvement over both intuition and the Dawes Z-score and it avoids the arbitrary choice of weights.

    Thanks for your input,
    Doug Hubbard

  20. dwhubbard

    Yes, I’m also a big fan of Fuzzy Logic. But note that prior to the 1980′s, fuzzy logic was called “Monte Carlos”. Just kidding. There are are some differences but it is true that the fuzzy logic movement seems to be a repackaging of existing stochastic methods. Fuzzy logic was all about analyzing situations without certainty. That’s exactly what most of the decision sciences were always about. I find when I talk to fuzzy logic experts they tend to be simply unfamiliar with work in the area of decisions under uncertainty and, therefore, come to the conclusion that what they are doing is “new”.

    But there is one difference between Fuzzy Logic and previous methods of the analysis of uncertain systems. Proponents of fuzzy logic also attempt to apply it to situations that are not just uncertain (which is already addressed with other stochastic methods) but areas where there is ambiguity. For example, I hear fuzzy logic proponents use the examples of “baldness” or “warm” as fuzzy concepts the can model. They state that there is no exact point where if you remove one more hair the person is bald if if you increase temperature one tenth of a degree it becomes warm.

    But this is the area where I find the methods fuzzy logic uses to be unecessary for a different reason. Uncertainty is not avoidable but ambiguity is. We can simply define our terms better. When we look at specific applications of fuzzy logic, I tend to find that some other unambiguous language was possible and makes the fuzzy logic application a moot point. For example, if we are assessing “baldness” to determine how big a man’s toupee needs to be, what we really end up asking is simply the area that needs to be covered. The arbitrary binary point where the person is officially “bald” is not relevant to this problem. Likewise, if we are trying to assess what temperature where people find themselves most comfortable, we find that a function that relates objective inputs to scales of subjective perception (such as Weber functions) already serve that purpose and are very useful. I think these are probably the reasons why there have been a steady decline in the use of the term “fuzzy” in the literature (an analysis of hits on the JSTOR database by year will show this). The part that works isn’t new and the part that’s new isn’t necessary.

    Thanks for your input,
    Doug Hubbard

  21. Richard Watson

    Thank you for your reply

    Richard Watson

  22. Mr. Hubbard,
    I’ m a Software Developer from Italy with a passion for the ‘uncertainty sciences’ last century gave us so plentiful. I’ve read both your books and I am about to order the second edition of HTMA. In fact I am so intrigued by AIE methodology that I convinced some quantitatively skilled colleagues to set up a workgroup to apply AIE on some relevant problems to us.
    I want to say it’s an honor to confront you with a measurement challenge. Let’s start:

    The Swiss bank UBS published an article in its ‘UBS investor’s guide’, special edition April 2010, predicting the outcome of the FIFA 2010 Soccer World Cup. http://www.ubs.com/1/e/bank_for_banks/news/topical_stories/edition_10.html
    You will agree this is a relevant problem, as the ‘uncertainty reduction’ on the game’s outcome will give an advantage in sports-betting.
    With hindsight, they failed the prediction miserably, claiming:
    (1) Brazil is most probable winner – didn’t reach the semis
    (2) Germany and Italy likely to go far – true for Germany (3th in Rank) but Italy didn’t survive the first round.
    (3) “Spain – favored by many – will likely not do well, and could exit before the semi-final stage” – Spain won the World Cup.
    UBS has now an inglorious record of 1 success in 3 attempts – Wordcup 2006 went good, but European Championship 2008 and Wordcup 2010 failed.
    I am inclined to argue that you can’t predict the outcome of the game a priori.

    1. UBS likely has built a state of the art econometric model but the conclusive verdict about the rightness of the model can only be “it works”. This show: you certainly can make a sound argument about how you measure it, but still failing miserably.

    2. But you cannot know if your model is right or you had luck. This is so because the experiment is not repeatable well. The basic dilemma of social sciences: social systems are complex and adaptive. Using a model: the stochastic process is itself complex, if not random. When we cope with induction we can only believe in the stable nature of the stochastic generator. What UBS’ case tells me: there is anecdotal evidence that the underlying principles of “who wins” are not stable. You cannot say if it will work for the next FIFA world championship or not, making it useless.

    3. But probably even if you would know the exogenous factors that influence the game, I suspect the endogenous factors in the system are much more important. Making any reasonable forecast before the games started futile.

    Mr. Hubbard: can you measure it?

    Sincere Regards,
    Roland Kofler

  23. dwhubbard

    Roland,

    Thanks for your interest and contribution. I can’t tell from the link you provided but did Swiss Bank UBS produce an actual probability? Or did they just say this would “probably” happen? That would be the first big question in evaluating their prediction. Only a prediction that was certain can be wrong based on one observation. If the prediction was a stated probability, and if the probability was – say – 60%, then a single failure is not a “failure”. I’ll explain this in more detail by responding to each of your three points individually.
    1) We have to state what we mean by “it works”. As I argued in the books, a model works if it reliably predicts outcomes. In physics, a sophisticated and elegant theory has failed if it didn’t predict outcomes. But in probabilistic statements, we have to look at a larger number of examples. Now, you say their track record is 1 in 3. Surely, there are hundreds of individual matches to draw data from, not just three. Remember Assumption #2 from Chapter 2 – you have more data than you think. The link you provided says “Back in 2006, UBS Wealth Management Research (WMR) made waves when it not only correctly picked Italy to win that year’s World Cup, but also correctly picked 50% of the semi-finalists, 75% of the final eight and 81% of the final 16.” This indicates a larger number of individual predictions. But even this information is not the most enlightening about the bank’s real success. I would rather see how often they were right on individual matches and what their confidence was in each match. That brings me to the next point.
    2) True, you cannot know on a single event if a probability statement is absolutely wrong or right (which is why I’m not sure you can conclude the Swiss Bank model failed until we see more details). This is why I explain in my books that you have to look at a number of trials to determine if probabilities are realistic. If the bank’s model produces specific probabilities, and it predicted 100 events with 90% confidence, another 100 events with 80% confidence and so on, it should get about 90/100 events of the first group, about 80/100 of the second, and so on. On a related note, “Stability” of the individual games is not a requirement. (This is a key fallacy promoted by W. E. Demming followers.) Well-calibrated probabilistic forecasts are applied to many “unstable” systems – like the weather . All you have to look at is a large number of your predictions and see if the percentage you got right were about the same as your stated confidence. Look at the calibration questions I ask in the books. They are all from completely different topics. Is that “stable”? No, but you can still determine if the expected number of correct predictions is close to the actual number of correct predictions.
    3. I don’t think which factors are exogenous or endogenous is the key issue. Whether the factors are internal or external the fact is that we are uncertain about their influence and our uncertainty can be stated.
    Finally, to your question “can you measure it?”, I will, of course, say yes. I’ll invoke the first “measurement assumption” I mention in chapter 2 – It’s been done before. Yes, this can be measured and I know it can because well-calibrated methods already exist for other sporting events. I cited research showing how when “prediction markets” are applied to American football and how when we look at all of the games where the market predicts an 80% chance of a specific team winning, it won about 80% of the time. (If one were to argue that the reason for this that American football is “more stable”, I would like to see the math behind that claim)
    So, in order to evaluate the bank’s model, we have to ask the following –
    1) Did the bank quantify its uncertainty (e.g. “Brazil is 55% likely to win)?
    2) How many total predictions are part of the bank’s model? (if it were used for just a few years, then it surely must have a large number of individual soccer games.)
    3) When predictions of similar confidence are grouped together, did the predicted outcomes happen about as often as expected (e.g., the prediction was right about 75% of the time that the confidence was somewhere around 70% to 80%)?
    4) What are the odds that an un-calibrated method could have produced the same results. For example, the odds of this unfortunate result being bad luck would be very low if they regularly made their predictions with 98% confidence. But if they were merely 70% confidence in outcomes, perhaps this result is not so unlikely.
    5) What would the success rate – for a large number of matches – have been for the average unaided sports fan compared to the banks model? Remember, the definition of measurement is to reduce uncertainty about a quantity based on observation. It doesn’t have to be right very often if the unaided intuition is even worse. In some situations, this improvement over the unaided intuition of experts can be worth a lot of money. Would a survey of sports fans, sports “experts” or astrologers have done just as well as the bank’s model? If the model was even a slight improvement on unaided intuition – but more than what can be explained by chance alone, then it worked.
    Thanks again for your input,
    Doug Hubbard

  24. Touché, and a detailed answer will follow next weekend. there are epistemological problems. no doubt. and I have mixed feelings – spending most of my life in the city of Brunswick (you teached me about this Viennese mastermind – he is totaly forgotten), Wittgenstein (you operate on language), and Critical Rationalism (Thinking about Science was ‘invented’ in Vienna) i want to argue more in detail. Dare to try you on the ‘epistemic’.
    Thank you for your work!
    RK

  25. Just for curiosity -as i want to know if such prediction is credible at all:

    UBS disclosed likelihoods of a teams chance to reach the next round. You can see them on page 16 of UBS Investors Guide. http://www.ubs.com/2/e/medlib/wmr/IGWM_spez_2010_en.pdf
    In fact the article at page 14 explains some of the model, citation: “As in our previous studies, we rely exclusively on three factors to estimate the different winning probabilities:
    1) past performance;
    2) whether or not a team is a host nation;
    and
    3) an objective quantitative measure that assesses the strength of each team three months before the start of the World Cup. Socioeconomic factors like population size or GDP growth have been proven to have no explanatory power when it comes to forecasting the performance of a specific team.”

  26. dwhubbard

    Roldand,

    Why would it necessarilly have no credibility? Remember, the real test is whether it outperforms your intuion. If you are both equally well calibrated but they turn out to be wrong 30% of the time in a large number of trials, but you turn out to be wrong 40% of the time, then its an improvement. Heed Voltaire when he says the perfect is the enemy of the good. Measurement is about uncertainty reduction and if a model – with all its flaws – is right more often than your previous model (intuition) then it was a measurement.

    Is your question about whether such a model could concevaibly outperform intution of the average sports expert? Why do you think it couldn’t? It’s all about results and if the results are extremely unlikely to be do to chance alone then the results are informative.

    Doug

  27. Doug, today I tried and drafted now several arguments why immeasurables exist, but no one seems to be right.
    The only thing I could say is that a false reliance on induction could harm you. Because the future might change. But this is in the TFoRM book.
    (btw. I think it would be an interesting experiment if you would try to convince us that immesurables exist.)

    More than keeping up with the challenge I would like to send you eventually some models our little measurement study group here in Italy will come up with. Next week is our second meeting, I will calibrate the guys and then we identify some problems to work on.
    Many thanks, Roland

  28. dwhubbard

    Roland,

    So then we must compare the harm of false reliance on induction to the harm of false rejection of induction. I can never say this enough: What was the rate and magnitude of error before using a particular method and the rate and magnitude of errors after using it? All of the empirical evidence says that unaided human intuition is easily outperformed by even simple modeling.

    Remember, all models are wrong. They all have error. The question is whether your previous model (intution) really had less error. The eveidence says no.

    Yes, feel free to tell me more about your measurement study.

    Doug Hubbard

  29. In fact I already did a simple experiment, translating the Chicago Piano Tuner problem into the Viennese Hair Salon Monte Carlo Simulation

    http://objektorient.blogspot.com/2010/08/freelibre-and-open-source-aie-models.html

    I hope its okay if I copyleft this even if the inspiration came from you, as the bin-slicing excel formula does.

  30. gchesterton

    Doug:
    Here’s one for you, since you mention capture-recapture in HTMA. What I’d like to do is estimate the true number of events of type t in a system. I have two databases, A and B, neither of which has complete reporting of these events. I review records from database A and count reports of type t ( I suspect they’re under-reported). Lets say there are 193. I then review database B and count reports of type t. (This database also subject to under-reporting). There are 69. There are fields in the records that allow the analyst to identify events of type t from database A that are the same events as those found in database B. There are 20 that are common to the two counts. Can I use capture-recapture? Any caveats to its use? I come up with an estimate of 665, with a 95% CI of [476, 929].

  31. You certainly can use capture/recapture. That is a perfect example. But, yes, you are also correct in asking about caveats. The formula I refer to works if there is no relationship at all between the event A captures and the events B captures. But if they both tend to miss the same kinds of events, then this method will be an underestimation of the total events. Likewise, if A and B are more likely to be sensitive to different kinds of events, then this method will over estimate the population of events. The good knews is that this kind of error is often the error people know about. You may already have an idea about whether A and B have a strong negative or positive correlation and, if so, you can at least put a “direction” on the error. If you know you are underestimating, for example, and the computed CI is 476 to 929, then you know that a value of 300 is even less likely than before and if that is your “decision threshold”, then you have made a slam-dunk argument for the case you are making.
    Doug

  32. gchesterton

    Let’s say your MC model /simulation produces an output metric with some shape and skewness and a 90% interval estimate of, say Prob{event A}=[0.005, 0.015]. Now a colleague discovers an independent study that estimated Prob{A}=[0.001, 0.020], with no information about the shape of the estimate. Would you attempt to combine this information in some way? Perhaps use the first analysis as a prior with the second study as new information? Or maybe convolute the two distributions somehow? Or weight them in some way?

  33. I recently read and deeply appreciated your book. As I set out to apply the lessons one of the luminaries of viral marketing, Seth Godin, posted a blog entitled “On buying unmeasurable media”. I would be thrilled to see you intellectually spar with him since so much of his work is on softer topics without much data to support it.

    Here’s the blog link: http://www.feedblitz.com/t2.asp?/198516/5620782/3910073/http://feedproxy.google.com/~r/typepad/sethsmainblog/~3/WKYoGnEzan4/on-buying-unmeasurable-media.html

  34. I would be interested in connecting with Seth. I’ve been swamped for a month trying to finish my next manuscript but now that I’m done, I’ll be getting back into the blogs.

    Thanks for your interest.

    Doug

  35. Hmm, here’s a measurement challenge I haven’t cracked yet at work!

    A lot of businesses pay external agencies to do link-building to help their search engine rankings, and from that generate more visits and money. The problem is that search engine rankings change organically anyway, simply from things like pages staying the same. Plus there can be a lag between something changing and the search engine reflecting that in its rankings.

    So how can you measure how much of the natural search uplift is due to your SEO agency’s work?

    I think this one may literally be an immeasurable, at least without access to Google’s ranking algorithms :)

  36. One of the great problems of our time is how to measure education. In Texas, we use the Texas Assessment of Knowledge and Skills (TAKS) test, which is highly controversial. First of all, I suspect that reliance on a single measurement method is problematic. Secondly, the method requires everyone to take the test multiple times instead of a simpler, and less costly sampling methodology. Maybe the complete coverage is required to reduce errors sufficiently? I see a problem in measuring something (education) in which the result (“better” education) manifests itself at some future time (years to maybe decades later). Do you have any thoughts on how to measure “better” education? I assume you will include such measurements as: 1) the number of students who graduate from high school, 2) the number who graduate from college, 3) the number of scholarships awarded to graduates, and 4) the number of graduates drawing unemployment, or some similar set of measurements.

    I have your book HTMA2 (as well as TFORM) and am now devoted to applying AIE.

    Thank you.

  37. Proger thats roughly where I am going. How do you
    1) measure educational effectiveness and to wit choose
    2) between competing educational alternatives?

    I am on a research junket on number 2 above. I have barely began reading HTMA on Amazon kindle.

    I hope you keep me posted about your discoveries with the help of Doug of course.

    Thanks a million.

    mf

  38. Doug:

    I am reading HTMA via Kindle on Amazon. I can’t “see” the exhibits I have come across so far (5.1 and 5.2).

    Are they disabled on kindle or? I am holding off on reading further until I am done with the tests.

    Thanks

    mf

  39. Thanks for your observation about the Kindle. I have heard this from someone else so I’m sure you are not the only person having this problem. I am having an assistant gather and post all of the exhibits to make them available for anyone having trouble reading the exhibits with their Kindle.

    I’m sure other author’s have heard this, too. Kindle should provide “exhibit testing” for author’s so that we can see what our charts look like on the Kindle.

    Doug Hubbard

  40. If it has any observable effect at all, then it is measurable. In fact, even if it has no observable effect, you simply measured it to be zero!

    Remember the general rules: 1) it has been measured before 2) you have more data than you think and 3) you need less data than you think. Regarding the first rule, I’m sure you would find many SEO measurement methods with a little more Googling. But let’s see if we can devise something on our own using the second two rules. Specific knowledge of Google’s ranking algorithm is probably not required – especially if you are highly uncertain about this. But Google Analytics does give you a lot of tools for this.

    First, a little set up. Why do you want to measure this how much do you know about it now? Let’s say only 20% of your traffic can be attributed to your SEO efforts. Does this mean you give up on it or keep going with it? What if it were only 5%? At some point, there is a “threshold” where some action would be taken. Define what that threshold is. Now state how much you know about it now. Is your current estimate a wide range of 10% to 80%? 0% to 90%? If it really were on the upper end of this range, you would probably have seen a dramatic effect as soon as you started the SEO effort. Let’s say its 0% to 50% for now.

    Given a range that wide, you probably don’t need much data to make the range narrower. Remember, any reduction in uncertainty counts as a measurement! Do these visitors have to register? Have you considered sampling those who register and ask them how they found your site? If you use Google Analytics and have been tracking which are the top search terms that lead to your site, are some of these searches based on terms that you only recently added to your site due to SEO efforts? Are some of the backlinks that have been added since the SEO effort among the most productive refering sites according to Google Analytics? If so, then there is at least some traffic that you know to be new since the SEO effort started. Your range is narrower.

    Once the range is narrowed, take a look at your threshold. Did the uncertainty change enough so that you are confidently on one side or the other of the threshold you identified? Then you have measured it enough to make you decision about whether SEO is worthwhile or if you need to change the SEO strategy.

    Keep me posted on how that is working for you. I’m sure others would like to know.

    Doug Hubbard

  41. Yes, this is extremely important. But we have to ask the same kinds of fundamental questions for all measurement problems. That is, what decisions could you make differently if you knew the answer, at what point would the quantity make a difference (i.e. what is the “threshold”) and how much do you know now? Are you trying to measure each child or simply the overall effectiveness of the curriculum in a school district? Are you trying to measure this in order to decide teacher bonuses? You have to figure out why you are measuring this and define the specific decisions, first. You may want to answer “all of the above” but, for now, let’s pick one to focus on.

    How your answer that will determine what you need to measure and how to measure it. And whaterver that turns out to be, you will need to determine how much you know now about it. If you wish to measure the effectiveness of a new curriculum, I seriously doubt you need to measure each student several times in order to significantly reduce your uncertainty. Remember, contrary to popular belief, if you have a lot of uncertainty, you don’t need much data to significantly reduce it. And if that uncertainty reduction shows that you are very likely over some decision threshold (e.g. the point at which you need to change the curriculum because it is proving to be ineffective), then you have measured it enough to make a decision. I suspect that if your decision objective for the measurement is at that level, then the TAKS is probably overkill.

    Have you thought about these kinds of questions? It is the first place to start.

    But I have some thoughts about what methods you may use when you do answer these questions. As I have said before in this blog and in the books, you should assume this has been measured before, that you have more data than you think and that you need less data than you think. It is correct that education may manifest itself much later in life but that has been studied before and you can use that research to tell you about conditions in the present that correlate to the future. I would be surprised if simple test results of children now didn’t actually correlate with the 4 future measures you present and I would also be surprised if someone, somewhere, hasn’t already written a disertation on that. If you are using this to measure not individual students but overall effectiveness of a program, and if your current uncertainty is high, then giving a small fraction of the students very short tests of randomly selected questions would probably reduce your uncertainty.

    Let me know your thoughts about the objective of the measurement and we can talk about how to design the best measurement approach for this critical problem.

    Thanks for your interest in my books and thanks for contributing to the blog.

    Doug Hubbard

  42. Thanks for you question. See my response to Proger below. Is there a high degree of uncertainty at this point between education alternatives? Is there literally no research already on the topic? If so, than surprisingly little data can be a significant reduction in uncertainty. As with all measurement problems, we should assume it has been measured before (and we should do the research to find what was done before), we should assume we have more data than we think, and we should assume that we need less data than we think. These kinds of assumptions turn out to be much more productive when searching for measurement methods. The opposite assumptions (i.e. that it was never measured before and that we have insufficient data) invariably lead to feeling stumped by the measurement problem and are also invariably wrong.

    You can see my responses to Proger but I’ll ask you another. What do you mean by “educational effectiveness”? Proger offered some ideas about what that means, but I wanted to get your input. If you can figure out what you mean, why you care (i.e. what decisions could be different), at what point that quantity affects the decision (i.e. the threshold), how much you know about it now (your current calibrated estimated) and what observations would correlate in any way to what is being measured, then you are well on the way to measuring it.

    I look forward to your comments.
    Doug Hubbard

  43. It’s been quite a while since the “challenge” has been out but I do hope that you do get time to reply:

    How would you peg a value to ‘decision analysis’?

    I mean in the context of taking a decision “NOW” vs. educating the stakeholders that it is more valuable to take an informed decision with supporting evidence etc.,

    Now, how would you present a ‘business case’ for performing decision analysis? (Note this is ‘internal’ to the organization i.e., a consultant is not hired to do this – let’s assume there is a someone competent in the company to ‘perform’ this activity.) In this case time would be an important variable – both the analyst’s and all the people he/she intends to interview but no direct cost of hiring a consultant.

    The thing that I’m fumbling upon is this: Let’s assume you can see the future and you could take decisions accordingly i.e., Perfect Information. Now how can you calculate the EVPI of this?? How would you measure a good decision from a better decision? (There is a ‘delta’ so it is somehow measurable). What is the probability that the decision supported by decision analysis would indeed be better than the one taken “Now – in the heat of the moment” and how would you quantify/justify that it’s worth waiting and conducting the decision analysis to better understand what the decision entails (capturing this uncertainty).

    I’m a bit inclined towards Multi-attribute utility theory (MAUT) so am unable to think of how to present a business case for conducting Decision Analysis using MAUT.

    The comment often made is “It’ll just take more time and MAUT won’t give us a perfect decision since the future is unknown. We’d rather take a decision today and move on with it – we could correct our course later”

    The case for decision analysis seems measurable but the “how” has made me scratch my head, quite a bit!

    Any ideas? :)

  44. Thanks for your question. I routinely compute the value of decision analysis itself since I compute the value of all of my analysis as part of the standard deliverable. We work out the value of the uncertainty reduction as I describe in chapter 7 of HTMA. If we reduce uncertainty about a decision, we change the odds of choosing an economically inferior strategy. Roughly speaking, the cost of being wrong times the chance of being wrong (i.e. the Expected Opportunity Loss or EOL) is reduced with the analysis. Since the chance of unfavorable outcomes is computed with methods I discuss in chapter 5 to 7 in HTMA, we can compute the change in EOL. There is also quite a lot of sound scientific research about how methods like this cause estimates and decisions to improve.

    But the value of a method like MAUT is trickier for two reasons. First, it doesn’t actually forecast anything objective and, therefore, it is not clear how to verify whether – even in retrospect – whether the outcome of a MAUT analysis is right or wrong. The theory is simply that if your own preferences are logically consistent, you will be more satisfied. Unfortunately, research in decision psychology has generated findings that complicate this. Our preferences change for random, unrelated reasons that we are unaware of. For example, you are more risk tolerant if you are around smiling faces, if you are angry, or if your testosterone levels are higher than usual. You even tend to reengineer your memories of preferences based on immediate conditions. Our subjective utilities appear to represent only very temporary conditions of our emotional states, they change frequently, and for reasons that should have nothing to do with the decision at hand. But we do know that some for of decision analysis appears to increase confidence in decisions even when the particular method being used made forecasts and decisions measurably worse. Much of the perceived benefit of tools like MAUT may be a kind of placebo effect. I cite sources for all of this research in the second edition of HTMA.

    The second reason that making the case for MAUT is difficult is because – unlike some other methods – there is actually no empirical research showing that decisions are even any better in a measurable way. Some research has shown that decision makers feel slightly more satisfied with some methods than others but that doesn’t mean the decisions are actually better. Ideally, we would like to have a large controlled experiment like a kind of tournament for forecasting and decision making. But research has rarely gone beyond showing that users of many decision analysis methods simply feel more confident (which we know they would feel even if it didn’t work, because of the placebo effect).

    Fortunately, there are methods that show a measurable improvement in estimates, forecasts and decisions. MAUT just isn’t one of them, yet (but it may be if anyone manages to actually conduct a real-world experiment with sufficient data points). The methods that have been shown to work with overwhelming empirical evidence in controlled studies are the following:
    1) calibration training to teach people how to provide better probabilistic estimates
    2) quantitative historical models to forecast objectively observable outcomes
    3) the use of quantitative modeling methods with simulations
    4) the use of the Lens method for removing expert inconsistency
    5) models based on empirical measurements (my method helps to identify and prioritize which uncertain variables justify empirical measurement efforts)

    I cite sources for these as well in HTMA. The difference between this and MAUT is that almost all of the inputs and outputs are objective values. They may be initially subjective estimates of objectively observable values but the original subjective estimate can at least be eventually evaluated as correct, incorrect, or close enough. There is no way to determine if your stated utility curves were the “correct” ones since they state nothing except the preferences you had at the moment you defined them (which we now know changes for arbitrary reasons the decision maker is not aware of).

    There is one unavoidable tradeoff that is a subjective utility curve and all of my models have at least this. We have to subjectively trade off acceptable risk vs. return. Given the problems with preference statements discovered just in the last few years with experimental psychology studies, its a good idea to minimize the use of purely subjective tradeoffs in a model – although some will be unavoidable. But most really important decisions involve at least some forecasts of objectively observable outcomes. Its not all just a matter of utility curves. If a DA method works, we should be able to show that estimates, forecasts, and decisions were – after some significant number of trials – a measurable improvement on alternatives like pure intuition. The Lens method, for example, can show measurable reductions in error in a variety of forecasts like business failures, cancer prognosis, the success of graduate students.

    Good luck with the business case. If you find any research that shows that MAUT has any positive, measurable effect (other than simply the confidence of the users) over a large number of trials, then let me know. My hypothesis is that the real answer may be small or zero. Given the importance of many decisions made with this method, I hope someone tests it soon.

    Thanks for your input,
    Doug Hubbard

  45. Hi Doug,

    I feel genuinely obligated to start off by saying “what a great book!” I have purchased innumerable statistics books and always have been stuck at the front door. Your book is the comfortable foyer that I needed to get a proper introduction to and understanding of the guests inside those other books. Thanks!

    OK, now for a crass and shameless challenge:

    I am an investor and I develop trading systems for investing my retirement funds in the financial markets. The systems are by their nature prone to Data Mining and Data Snooping. I want to rank my inventory of systems and select the best system(s) to trade I and don’t want it(them) to underperform after selection. How do I pick the system (measure = historical performance rank) that is most likely to perform going forward as it has in the past (measure = reliability) and is the system that will likely perform better than the other systems in my inventory (measure ==> best rank = best forecasted return = best return)?

    My many issues with this question:
    a) Have I asked the right question so that it lends itself to analysis (I think I have)?
    b) Do I need “high level” statistics to deal with the Data Mining/Snooping issues (they sound scarey and somewhat intractible – and were discussed briefly in a stat book by David Aronson)?
    c) Is there a better decomposition that I am missing?
    d)How do I iterate using easy measures first?

    And yes, as you can imagine, the value of perfect information on this question is, well, huge!

    Thanks in advance, – Carl

  46. Carl,
    These are good questions. In fact, these are exactly the questions that matter. You need to know if your system is working. But you might want to be prepared that the measurement might show marginal or no benefit for the systems. Zero is a possible answer. I only say that because I’m generally skeptical of methods that claim to beat the market consistently. But I am certainly keeping an open mind.

    I don’t think the methods need to be all that advanced, but I would recommend the use of a concept called the “p-value” or power of a measurement. You want to ask the chance that two unrelated, random variables could show a given correlation or higher as a random fluke. If you look at large numbers of data points – say, hundreds or thousands, a correlation of greater than .9 or better would be extremely unlikely with two random variables. So the key question you need to answer is not just whether one is performing better than the others, but what is the chance that this could be a fluke? In my second book, I talk about what I called the “Red Barron” effect. Two electrical engineering professors wrote an article asking the questions “Was the Red Barron good or lucky”. They considered the fact that he had 80 kills in WWI in the context that there were over a couple of thousand German fighter pilots in WWI and that the average chance of a victory in any encounter was 0.85 or higher for the average fighter pilot. Taking these values into account, it appears that even if all Gerrman fighter pilots were equally good, there is a 0.3 chance that at least one of them would have had 80 kills.

    So the best question for you to ask is not just whether one system outperforms others, but what is the chance that this performance is a random fluke? If you have a coin flipping contest with 1000 entrants, to see who can flip the most heads in a row, someone is going to be the winner. But that tells us nothing about whether the individual really has a special skill at flipping coins. If the winner had flipped, say, about 10 heads, we shouldn’t be impressed. Thats about what we would expect for the best performer out of 1000. But if the winner flipped 40 heads in a row, now we should start considering the possiblility that the winner actually has some kind of skill or is cheating. Even out of 1000 contestants, there is only a one in a billion chance of seeing one person flip 40 heads in a row.

    I think this might be a trap for many financial analysts but it is not limited to them. Scientific publications have shown a “publication bias”, meaning that scientists are more likely to publish an interesting result. That means if they throw away 5 studies for every one they publish, the calculation of a p-value will be off. The chance that some observed correlation would be due to chance is actually higher than they indicate in the study because now they are taking the best of several.

    Let me know if that is sufficient to get you started. You will have to do a little research on this point but it is fairly basic. It may show that observed ranks are just random flukes. But if you find that one is so much better than the others that the chance that it is a fluke is extremely unlikely, then you have a very interesting and powerful finding.

    Doug Hubbard

  47. Point Being

    Mr Hubbard:

    As a clinically-oriented anesthesiologist I am impressed by the writing (clear, concise, not too jargon-ny) and encouraged by the message.

    We’ve given a lot of thought to how (and why) we measure pt’s perception of acute pain (chronic is a completely different game). We use the Visual Analog Scale developed many years ago. I wonder what your thoughts are about that scale and how might improve our measurements. This gets at the issue of efficacy which is a big “quality” indicator in our post-hospitalization surveys that get sent out randomly to our pts. Your ideas have stimulated my ever-inquisitive mind and for that – and that alone – this book was well worth the time and money. I have recommended it to several people in the business already. Any thoughts you have will be much appreciated.

  48. dwhubbard

    Thanks for your interest.

    It is interesting that you mention pain scales. We have a family friend who is a doctor at a pain clinic. I once mentioned to him that I would be interested in interviewing him about the pain scale (I think he said they use some 1 to 10 scale) for measuring pain. I thought it was an interesting measurement problem that could merit some space in the second edition of the book. But I didn’t get around to discussing it with him in detail. Is the Visual Analog Scale the one that uses the “smiley face” to “frowny face” spectrum of responses?

    I think it is an important issue because it gets at the heart of an issue that seems like it should ultimately be subjective. So one might wonder how such a scale would be validated. I suppose pain has bearing on activities and a pain scale should have some predictive power about activities of individuals. In my next book – Pulse: The New Science of Harnessing Internet Buzz to Track Threats and Oportunities – I discuss, among many other things, research which uses Twitter, Facebook and Google to track economic, social and even health trends. I also discuss research which uses accelerometers in mobile phones to track movement and predict illness. I suspect that a similar method could be used to infer pain from changes in movement.

    I think that people who are experiencing certain sorts of pain probably move differently in both sutble and obvious ways. I think it would be hard to deliberately fake this behavior over a long period of time. Detailed activity tracking might be a much more objective measure of pain. Details about sleep patterns and eating patters are also likely indicators of pain. But perhaps not so much in self-reported surveys. I refer to using small and cheap tracking instruments – many of which may already be able to be approximated via our mobile phones. It takes the Skinnarian approach to pain – we only study what we actually detect objectively and we avoid the problem of comparing subjective experiences.

    I will read a bit more about the research behind the Visual Analog scale. Feel free to stay in touch.

    Doug Hubbard

  49. Hi Mr. Hubbard — I’m enjoying the book a lot. As a former director of quality assurance for social services agencies I’ve often taken the position that anything can be measured. One of the hardest things to measure is progress of preschool children. They are not yet literate or able to do complex math but are developing important pre-literacy and pre-numeracy (sorry) skills. Generally I’ve settled on accomplishment of developmental milestones within fairly wide ranges as an outcome measure. I have also looked at fidelity to a preschool educational model. These are both somewhat unsatisfying and I wondered if you had any other ideas. Thank you. AB

  50. Sorry, on re-reading I realize I left out an important point: we’re trying to judge the quality of pre-school education. Thanks for your thoughts.

You must log in to post a comment.