Every local authority housing officer needs
to be able to target the worst areas. Robert Flynn explains
how techniques developed by the Building Research Establishment
can assist local authorities undertaking local house condition
surveys and add value to existing data
Part of the process of designing house condition survey contracts
is explaining to councillors and senior officers how impossible
it is to provide the ward data they covet with any accuracy unless
huge and economically unjustifiable samples are selected. It is
more typical for authorities to use smaller, less expensive samples
and then divide the area into three or four groups of wards for
reporting purposes. While this might be sounder statistically, it
is, and remains, of little value for targeting purposes. Occasionally,
areas where problems are suspected can be surveyed using larger
samples but this is rarely a complete solution. Because they are
so expensive, surveys cover several purposes and a sample which
is suitable for one purpose, eg unfitness, may be unsuitable for
another such as fuel poverty.
Recognising these problems, Chris Jarvis at the Greater London
Authority, who advises London boroughs on local stock condition
data gathering and Simon Nicol, director of the Building Research
Establishment's Housing Centre, had considered using English House
Condition Survey (EHCS) data, which the BRE analyses for the Office
of the Deputy Prime Minister, to produce a mathematical model to
predict local housing conditions. In 2002, the BRE was successful
in its bid to the Foundation for the Built Environment for a grant
to fund the development of a local housing stock model.
Eighteen months on, the BRE has designed a series of models which
have taken EHCS 2001 data and combined it with data sources with
national coverage (such as the census) to produce detailed maps
for key indicators, notably non decent homes. The BRE's initial
development partner, the London Borough of Richmond provided invaluable
feedback on ward data supplied by the preliminary models.
Map 1
Map 2
This paved the way for the next stage in model development, which
resulted in maps (see figures 1 and 2) that show typical outputs
from the models for a local authority area (in this case non decent
homes for the London Borough of Richmond and unfitness for the London
Borough of Redbridge). Figures 1 and 2 show the percentage of non
decent and unfit homes by census output area and statistical ward
while the box below provides a description of census output areas
and statistical wards.
CENSUS GEOGRAPHY
Census output areas: these are now the smallest geographical
unit the census reports on. They were intended to have similar
population sizes and be as socially homogeneous as possible
(based on tenure of household and dwelling type), avoid urban/rural
mixes and irregular shapes and tended to be constrained by
obvious boundaries such as major roads. The specified minimum
size was 40 households to ensure the confidentiality of data
and the recommended size above 125 households.
Statistical wards: census output areas in England and Wales
have been designed to fit into 2003 statistical wards, which
reflect administrative boundaries promulgated (laid down in
statute) by 31 December 2002. Most 2001 census outputs however,
use Census Area Statistics (CAS) wards - these are a subset
of 2003 statistical wards, with particularly small wards merged
to protect data confidentiality.
So far, the BRE has produced models for:
dwellings which fail the decent homes standard;
dwellings which fail the decent homes standard due to unfitness,
inadequate thermal comfort, disrepair and non-modern facilities
and services;
non decent homes occupied by a vulnerable household;
households in fuel poverty; and
dwellings with a SAP rating less than 30.
Each model produces predictions of the percentages for each variable
at the level of the local authority, the statistical ward and the
census output area. The BRE was able to do this by using a method
called CHAID, which is short for Chi-squared Automatic Interaction
Detector. The method as the BRE used it can be broken downs into
six stages:
Select the dependent variable ie the EHCS statistic that is
intended for modelling down to local level eg non decent homes.
Prepare and establish links between national small area databases
eg census and the EHCS data.
Select the independent variables eg for non decent homes, any
that may be related to the level of non decent homes such as dwelling
type from the census, bearing in mind that one of the advantages
of CHAID is it can accept large numbers of independent variables.
Carry out the CHAID analysis. This is done using a computer
programme called AnswerTree, which splits the database into groups
that maximise the differences between them for each variable.
This results in a diagram in the shape of an inverted tree with
splits occurring at nodes. The independent variables that have
greatest influence on the dependent variable are found at the
top of the diagram. Once the differences become too small to be
useful the programme stops splitting the database at what is known
as a terminal node.
The terminal nodes give predicted percentages for the dependent
variable eg non decent homes, which can be assigned to every census
output area in England.
The percentages can then be summed to provide results to statistical
ward and local authority level.
These six stages describe the process for the non decent homes
model but each of the models was developed in a similar way. While
this may sound reasonably straightforward, the process of preparing
the data was a long and tortuous one of matching datasets that were
difficult to reconcile. Until the 2001 census data and other related
datasets arrived in August 2003, the BRE found it difficult to assemble
the data in a usable format, which made it hard to establish whether
the technique was yielding results of any real value. Now that the
data has now been assembled and the models produce results, how
reliable are they?
Before answering this question, it ought to be said that CHAID
is considered to be a stage towards developing models rather than
a tool that produces a fully developed model. However, the results
appear to be sufficiently encouraging to start making use of them.
The BRE has now supplied the models to two London authorities,
Richmond and Redbridge, and their initial reactions have been very
positive. Nevertheless, positive reactions are not proof of a model.
The big problem has been and remains how can the models be proved
when there is very little evidence to test against them? Furthermore,
what should be regarded as proof? To verify the models, the BRE
has followed four approaches:
Comparison with private sector house condition surveys. The results
have been encouraging with survey data showing good agreement with
the unfitness and disrepair models. The comparisons have, however,
been limited to highly targeted surveys that yielded data for a
few wards.
Comparison with self-completion energy questionnaires. The problem
of small samples is bound to dog any attempt at verification, so
the large responses achieved by self completion energy questionnaires
presented a means of verifying the SAP and fuel poverty models.
However, in the one case where the BRE attempted this, it found
that problems of data quality and bias precluded their use at ward
level or below.
Comparison with data supplied by an energy company. The BRE used
this information to develop a surrogate fuel poverty indicator,
which had a correlation coefficient of 0.4 with the BRE's fuel poverty
model. For the complex mix of social and physical data that make
up the measure of fuel poverty the BRE felt that this was a good
result at the level of the census output area. Certainly the fuel
poverty model would have been of assistance in targeting the fuel
poor in this local authority.
Comparison with data supplied by specific local authority partners.
The BRE was fortunate to be provided with good quality information
by both local authority partners. As the data supplied by the London
Borough of Redbridge provides the most recent comparison, it is
important to consider this case in some detail.
The main statistic the London Borough of Redbridge provided was
unfitness. Officers from the borough carried out high quality fieldwork
for a 1996 survey, which was analysed by a team from the then London
Research Centre. The wards were put into three groups for reporting
purposes. The modelled data and the 1996 survey data are included
in the table (see page 23). There are four important points that
should be made at this stage:
there was a 3 per cent decline in unfitness nationally between
1996 and 2001. If a similar decline took place in the London Borough
of Redbridge then the modelled data is very close to the survey
data;
the modelled data tends to result in smoother data and therefore
narrower ranges. The rank order is therefore of greater importance
when making comparisons than the actual value predicted;
the grouping imposed by the 1996 survey may mask real differences
between wards within the groups; and
the model includes all tenures whereas the 1996 survey excludes
the social rented sector.
There are two very obvious areas of agreement between the model
and the survey:
two out of the three worst wards from the 1996 survey were predicted
to be the worst and second worst wards by the model; and
four out of the six wards in the second worst group of wards
from the 1996 data were placed in the next six places.
This indicates a fairly good agreement between the model and the
survey data at ward level although differences do exist.
The only other area comparison that could be made with the survey
data was between substantial disrepair in the 1996 survey and the
disrepair component of decent homes. While there are differences
between these two measures they are sufficiently similar that a
good agreement on rank order would be expected. In fact the disrepair
model performed slightly better than the unfitness model and there
was close agreement with the survey data.
So far, users of the models have been less interested in the reports
of the BRE's attempts at verification than in their own gut reactions
to data provided for them. The maps in particular provide an easily
assimilated impression of the predicted conditions, which result
in a rapid and usually positive response.
As the end of the FBE-funded project nears, the BRE is considering
how the knowledge gained might best be taken forward. The options
include:
seeking funding to develop the CHAID analysis into more formalised
models using techniques such as logistic regression;
seeking partners to provide good quality data to help verify
the models; * providing a service to authorities using the existing
models; and
developing models for other EHCS variables such as the housing
health and safety rating.
The techniques described are primarily intended for use before
a local survey has been carried out. This supports ODPM guidance1,
which promotes the ideal of using all available data sources to
understand the local housing stock before commissioning a survey.
Most of the BRE's efforts have therefore concentrated on developing
a tool that will be of use before commissioning a survey. However,
the question that is inevitably asked is, can this be used instead
of a survey? This depends on what the purpose of the survey is.
If all that is required is a strategic overview of conditions in
an authority to inform a housing strategy, then this will probably
suffice in the short term. The results could then be used to stratify
future survey samples to target areas of interest in a much more
focused way, possibly in the form of a rolling programme. Authority-wide
results could still be achieved by including small samples in the
areas of lesser interest.
This is not to say that the outputs of the models should be approached
uncritically but neither should local data sources or the received
wisdom on local conditions. The models merely add to a cocktail
of information that needs to be considered in developing a strategy.
There is one other area that may prove to have as much potential
as those already mentioned. A lot of good local surveys are undertaken
that are only prevented on reporting small area statistics by their
small sample sizes. Where the authority has confidence in its data,
there is no reason why it cannot be used in place of the EHCS data
to model down to census output area level. This is more time consuming
than using the national model as the CHAID analysis has to be repeated,
but if the local data quality is good then it should provide a more
accurate picture of local conditions. This means that these techniques
are of potential interest not only to authorities about to embark
on surveys but also those looking to gain added value from their
existing data.
Reference
Collecting, managing and using housing stock information: a
good practice guide, ODPM 2000.
The work reported in this article was supported by the Foundation
for the Built Environment. The author also gratefully acknowledges
the part played by officers at the London Borough of Richmond and
the London Borough of Redbridge in the development of the work.
I would also like to thank Kevin White and Alan O'Dell at the BRE
for their advice and assistance.
Robert Flynn is a principal consultant in the Housing Centre
at BRE. He can be contacted at flynnr@bre.co.uk