Dispelling the myths about our new Index methodology

RP Data along with our strategic partners Rismark International released two world firsts this week;

  1. the world’s first genuine “daily” house price index suite, which will cover all the major cities and the national market.
  2. the world’s first house price indices that track the change in the value of the overall asset class rather than simply tracking changes in the performance of just those properties that transact.

Obviously this is a new way to measure housing markets and as such there are a lot of questions around how we can produce such a high frequency / low volatility measure which is just one day in arrears.  The points below explain how the index is calculated and how we, together with Rismark International, produce the most timely home value index ever.

  • The methodology used is an improvement on our existing hedonic methodology.  It provides a more accurate measure of the housing market by utilising what is known as a hedonic imputation methodology (the previous methodology was an adjacent period hedonic calculation).  Although these measures are both calculated using a hedonic regression methodology there are significant fundamental differences on how the results are calculated.
  • The hedonic method allows us to understanding the value associated with each attribute of a property (ie, land area, living area, bedrooms, bathroom, location, view, car parks, etc.) we observe selling. As such we can then ‘impute’ the value of dwellings with a given set of known characteristics for which there is no recent sales price by observing the characteristics and sales prices of other dwellings which have recently transacted. It then follows that changes in the market value of all residential property comprising an index can be accurately tracked through time.
  • The most important difference is that the new methodology is calculated based on the entire housing market, not just those properties that transacted over the reporting period.  Previously we would bundle all the transaction data we had collected over a month and calculate the index results using data from only those homes that had transacted in the period. We previously did this by observing only sold properties in a given month and used the hedonic method to separate price changes associated with the differences in the composition of the various properties trading month to month from those changes associated with variation in the underlying residential property market value. Whereas, the new imputation method ‘imputes’ the value of all properties utilizing all know data up to the date of calculation
  • The annual volatility measurement of the new imputation based index is about the same as the previous adjacent period measurement.  Thanks to the daily frequency, for the first time ever we can now see intra-month movements of dwelling values.
  • The ‘new’ index imputes the value of every Australian home each day taking into consideration every single data point we knew about the housing market at the point in time of calculation.  Factors such as lot size, the number of bedrooms and bathrooms, car spaces and whether the home has a swimming pool or view are some of the hedonic attributes factored into the analysis.  Based on a flow of around 1,400 new transactions received each day as well as a constant flow of new attribute data our most accurate view of the imputed value of the property market is updated each day.
  • Unlike the previous methodology, the new imputation methodology does not get revised.  Because the index is calculated using every data point available on the day, each day the index value updates it is incorporating the full set of attribute and transaction data available.  Time variables are applied such that less recent sale information is benchmarked up to today using our knowledge of the most recent sales. While the most recent sales have the greatest influence on the change in the index, all of the information received on a given day (include sales we’ve just learned about which may have occurred a few weeks ago) is important in calculating the daily value.  An important requirement of the design of the index and its subsequent audit, was that the index be an accurate reflection of current market conditions.
  • The daily index which is being published is not seasonally adjusted.  Given that the Index has been designed to be tradeable (more on this below), the absolute change in dwelling values is the preferred measure. We are currently undertaking a review of the seasonal adjustment method used. A seasonally adjusted series will be available to our subscribers and we will be publishing a seasonally adjusted analytical  series within our monthly summary in  the coming months once our review is complete.
  • The index is designed for high frequency publication to facilitate trading liquidity.  The Australian Securities Exchange (ASX) is a partner in this project and their intention is to build tradeable products from this Index.  Note that these products are not yet available and will be subject to regulatory approval.  The daily updates are available at both the ASX web site and at www.rpdata.com.
  • The daily index is only able to be produced due to RP Data’s enormous investment in aggregating and collecting data and with a significant investment in processing infrastructure.  RP Data spends more than $10 million dollars each year on data collection.  The production of the indices each day takes approximately eight hours of processing time overnight on very fast servers.   Imputing a value of every Australian property and calculating the change in those values overnight simply wasn’t possible several years ago.
  • Unlike any other house price indices that are commercially available, our Index has been independently audited by both Alex Frino and KPMG.  Additionally the technical papers can be viewed here.

Please post any further questions you have and we will endeavour to answer any further questions.

About Cameron Kusher

Cameron Kusher is Head of Research at CoreLogic, specialising in primary and secondary data analysis, property market commentary and consultancy. Cameron has a thorough understanding of the fundamentals such as demographics, trends, economics and spacial analysis and is a regular keynote speaker for property-related groups, regulated industry bodies, corporations and the government sectors. Follow Cameron on Twitter @cmkusher

Connect with CoreLogic

Enter your email address to subscribe to our e-newsletter, and have new posts delivered via email. You can also connect with CoreLogic on social media.

26 Responses to Dispelling the myths about our new Index methodology

  1. wildebeest March 3, 2012 at 7:21 am #

    What are the magnitude of the statistical errors associated with the published numbers?

    • Tim Lawless March 5, 2012 at 1:04 pm #

      Hi Wildebeest, Each city index tracks the value of a synthetic market portfolio of hundreds of thousands of properties. Because of the very large number of properties in each index, the estimation errors mostly cancel when averaged to get the index value. Based on observable tests and statistical theory, the standard error in the index value is a few basis points for the larger cities of Sydney and Melbourne and around 10 basis points for Perth, which is the most difficult city to estimate. The error in the Aus composite is about half the error in the Sydney and Melbourne indices.

      • wildebeest March 5, 2012 at 1:20 pm #

        Thanks for your reply Tim.

        I confess to being confused about your reference to a large number of properties. We know what typical daily volumes of house sales are in each capital city and regional market. I’m surprised that these volumes would be considered large. Would you agree that if this index is to be tradeable your estimate of these errors — and the methodology of the error estimate — needs to be quantified and explained and published?

      • Tim Lawless March 5, 2012 at 1:32 pm #

        The Index is calculated across all properties in the marketplace, not just those that have transacted – so I think the term ‘large number of properties’ is very much relevant. Look at it this way: every day when the index is calculated it is imputing the value of every residential property based on every data element we know of at that point in time. That includes both recent and historical sales. Recent sales have a higher level of importance, but historical sales are also used in conjunction with time variables (just like a valuer doesn’t rely on only those property sales that settled over the past couple of weeks to determine the market value of a home).

        With regards to the second part of your questions, any tradable products would be the domain of the ASX. Clearly these instruments would be subject to the same rules of disclosure and diligence as any other tradable index.

        Thanks for your comments, Tim

      • wildebeest March 6, 2012 at 7:35 am #

        Tim could I get some further clarification please. You say all properties in the market place. The index uses transacted properties to update the entire housing stock in Australia (unless I’m mistaken). Is that what you mean? The errors introduced in the calculations arise from the transacted properties since this is the data that changes from day to day. This is a small number. Using some sort of time weighting increases the number but it remains small.

      • Tim Lawless March 6, 2012 at 9:47 am #

        Wildebeest, yes, you are correct on the first point: the index uses transacted properties (as well as attribute information) to update the value of all housing stock across Australia. The errors aren’t introduced from the transacted properties, rather the errors arise from imputing the value of each property based on all the data points we are aware of. There will be a level of error associated with each imputed value that largely cancels out across the portfolio.

      • wildebeest March 7, 2012 at 7:55 am #

        Tim, the changes to the imputed value are due to daily sales (small volumes), plus, from what you are saying, a time weighting of recent sales (also small volumes). It is this small sample data that is effecting the imputed values of your large sample data.

  2. Gavin R. Putland March 3, 2012 at 5:22 pm #

    Christopher Joye states: “RP Data has very detailed information on 99 per cent of all Australian homes, including the exact address, … property type, historical sales prices paid for the property…” (“A new way to bet the house”, Business Spectator, March 1.)

    I have quoted only a selection of the “information”. Is that selection available to economic researchers for the purpose of calculating aggregates? (For that purpose, one would not need to know that exact address of each property; the local government area would suffice.)

    It would be even more useful if each sale record were associated with the site value (land value or unimproved value) at the time of sale. There is undoubtedly enough information in the database to value any particular site at any particular time, and this is surely done (at least implicitly) as part of the portfolio valuation. Are these site values available?

    • Tim Lawless March 5, 2012 at 11:10 am #

      Hi Gavin, We spend a lot of money each year on acquiring, cleaning and improving our data holdings. As a data company, our data is our primary asset and we are very protective of distributing bulk releases outside of our secure operating environment. RP Data subscribers have on-line access to our data holdings and we also provide custom data extracts on a fee basis. We have provided data to universities for academic purposes in the past.

      • wildebeest March 5, 2012 at 1:22 pm #

        Just following on from Gavin’s comment: given the breadth of your data it would seem you would be perfectly placed to develop a Case-Shiller type housing index based on sales of the same properties. Would you consider that?

  3. Gavin R. Putland March 3, 2012 at 5:26 pm #

    P.S.: To what extent does RP Data collect similar information on rural/commercial/industrial property, and to what extent is that information available?

    • Tim Lawless March 5, 2012 at 11:11 am #

      Hi Gavin, we collect data for all property types, both residential and non-residential. That data is available to the same extent as all our other data (as explained in my previous response).

      • Cameron Kusher March 5, 2012 at 1:34 pm #

        Hi wildebeest
        The Case-Shiller Index is a repeat sales type index and we already produce such an index however, it is actually not a very good measure of the true performance of the housing for a few main reasons:
        1. It only looks at those properties which are transacting (which is around 5% to 7% of the total housing market in any given year).
        2. More importantly, it only looks at those properties which have previously sold as it needs to match the current sale to a previous sale. Given this it excludes all brand new homes.
        3. Finally, it does not take into consideration material changes to a home, although it compares the previous sale of a house to the current sale of a house it does not consider whether it has been renovated or extended.

        Although repeat sales indices are widely used in the US we believe that they are certainly not the best measure of housing market performance. You’ll note that hedonic indices such as those produced by RP Data and Rismark are widely used in Europe.

        As mentioned, we do calculate a repeat sales index however, it is not our flagship nor is it the best measure of market performance in our opinion so we tend not to focus on the results of that index.

  4. Mav March 4, 2012 at 7:33 am #

    Given that we know what the price of a median house in Sydney is, I am curious to know what are the attributes of a median house in Sydney – i.e. land area, living area, bedrooms, bathroom, location, view, car parks, etc?

    I would be surprised if the median house had more than 2 bedroom. And I predict that over time, the attributes of the median house deteriorated while its price climbed exponentially.

    • wildebeest March 4, 2012 at 7:37 am #

      I asked what the statistical errors were but the question was either deleted or removed by the moderator. Interesting. Let’s see if this survives.

      • Mav March 4, 2012 at 9:15 am #

        Hugo, Is that you? 🙂

        I doubt RP Data has the answers for that query. AFAIK, I think RP Data does the data collection and Rismark does the super-slow 8 hours of “calculations” on their “super fast” servers.

        So maybe you are barking up the wrong tree. But I sympathise with you, there is no way of asking putting these questions across to Chris Joye of Rismark, is there?

        For the record, none of my comments so far have been censored by RP Data and I have received prompt responses to all my queries.

      • Tim Lawless March 5, 2012 at 8:59 am #

        Thanks for your endorsement Mav, and yes we are following this up with Rismark. The computing time of 8 hours is pretty good in my view (and all the computation is done on secure servers at RP Data using the methodology designed by Rismark). For each of the 8 million plus homes included in the index calculation, for each one of those properties we need to undertake a spatial search to identify the comparable sales, apply proximity weightings and time variables to all the relevant data and calculate the value differential – that’s a very CPU intensive task.

      • Mike Salway March 5, 2012 at 6:53 am #

        Wildebeest, I can see your comment fine. Perhaps it just wasn’t approved at the time you looked again. All new commenters’ comments are held in moderation until approved for the first time.
        After that, new comments go straight through.

    • Tim Lawless March 5, 2012 at 10:59 am #

      Mav, keep in mind we provide the median price only for the sake of relativity in pricing. The median price has no bearing at all on the index values or capital growth/decline figures we are reporting in our Home Value Index.

      Out of interest… the median house price in Sydney was $516,000 in February and there were three houses that sold at exactly that price. The first was a 3bed/1bath house at Merrylands in Holroyd on 556sqm of land. The second was a 3bed/1bath house at Guildford in Parramatta on 929sqm of land. The third was 3b/2bath house with a granny flat at Forresters Beach in Gosford on 1,410sqm of land.

      • Mav March 6, 2012 at 11:19 am #

        Thanks for the info. The samples across the median are much better than I had predicted.

        Personally, I doubt I would go for a median priced house in Merrylands and have my garage door shot up in an accidental drive-by 🙂

  5. wildebeest March 5, 2012 at 7:12 am #

    @Mike Salway

    When I checked back after I first commented I noticed other comments that had been posted up to a day after mine were there but not mine. So the moderator had obviously been moderating and approving during that period.

    Now that the comment is up I look forward to some information on statistical errors.

    If you are going to use this thing as a trading instrument the errors need to be known and disclosed. For example, if I short housing with a strike of -1.0% and the published data is -0.9% I lose. But what if it is -0.9+-0.1% ? It seems to me that a bet needs to fall outside the statistical error to be considered a losing bet.

    • Tim Lawless March 5, 2012 at 8:50 am #

      Hi Wildebeest, we will come back with a response on your questions with regards to the level of statistical error in the daily index. This is a technical question I have referred to Rismark and expect a response today.

    • Mike Salway March 5, 2012 at 9:20 am #

      @Wildebeest, the other comments were probably made by people who had previously made comments on our blog before – which is why their comments go straight through.

      • wildebeest March 5, 2012 at 1:23 pm #

        thanks for the explanation

  6. wildebeest March 6, 2012 at 7:39 am #

    @Cameron Kusher

    there was no reply button under your comment.

    I was suggesting a Case-Shiller type index as a complement to existing information– not as a replacement. Regarding point #3, given the data you claim to have you could surely correct for material changes?? Do you do that correction in the repeat sales index that you said you calculate? Is that a public index?


    • Cameron Kusher March 6, 2012 at 8:20 am #

      Hi wildebeest

      As mentioned, RP Data and Rismark already produce a Case Shiller style repeat sales index which is available via subscription. As I have pointed out there is a number of weaknesses with this type of index which is why we prefer the much more accurate hedonic methodology.

      We could correct for material changes in homes in a repeat sales index however, it is still nowhere near as accurate as using a hedonic index. Given this, it is our preference to invest time, effort and capital into the much more accurate hedonic index rather than an inferior repeat sales index. The repeat sales index we currently produce does not correct for material changes to homes, it is a replication of the Case-Shiller model. It is available by subscription.


Leave a Reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.