We are in a data-rich world; businesses in all sectors are embracing the opportunities to uncover previously hidden insights by analysing their data in new ways, made possible through advances in computing power.
It is the aspect of blending domain knowledge – be it engineering, scientific or sociological – together with data science that is now enabling companies to develop data-led asset health strategies. These new approaches help us provide greater clarity on the relationships between asset condition, asset health and system resilience. In so doing, we uncover deeper insights into possible root causes of asset failure, sense-checked with engineering principles to ensure that we are not investing in our assets ‘blind’ – or worse, trusting poor quality data to do so.
And that’s the rub: despite massive advances in data engineering and science, the old adage of ‘garbage in, garbage out’ still holds true. As datasets grow in size, and the number of datasets that need to be linked grows, data acceptance and quality assessment becomes more important, and potentially more unwieldy. Unfortunately this data wrangling and acceptance phase is not glamorous to those outside data engineering, least of all time-pressed and budget-constrained asset managers who just need an answer – and preferably a definitive one – to their questions.
We’re all human: we like to be wowed and impressed and we’re not always entirely logical in our reasoning. If we go out to buy a television, we like to see the picture quality and compare it to others before we buy – one might be better on paper, but you prefer the look of another. Likewise, if I am planning to procure a way to become a more proactive asset manager, I need to know with some certainty what I can expect to be delivered and to see how that will work for me, before I invest. And this is where the rise of the hackathon in the water sector has become so powerful. The process typically involves giving a group of like-minded individuals from allied disciplines access to enough data and challenging them to come up with new insights which can be readily visualised by the client, in a matter of a few hours or days.
Perhaps the greatest visible adoption of new data science techniques in our sector is in the data-rich area of water infrastructure. The act of monitoring water networks is no longer just about looking at SCADA screens or retrospective night-flow analysis, but about seeking patterns of data behaviour and digital signatures in complex hybrid datasets from multiple sources, in a bid to understand, and pre-empt, undesirable asset health states. The problem with the purist data science approach is that you may have to reduce the available data to work at the lowest common spatio-temporal resolution, and in so doing, be forced to set aside other valuable data. One way through this compromise is to try hybrid modelling methods such as Bayesian Belief Networks which can add real value in situations where digital data quality is partly compromised: combining machine learning techniques with encoded expert engineering knowledge can transcend the problem of variable data quality.
One example of how these methods are evolving internationally is in a commission for a major utility manager in Italy in which WRc is helping to define optimal water loss control strategies and forecast asset maintenance by newer methods of analysis, processing and management of data. Like much work in this field, it started with a experimental data sample, and has now grown to support a fuller data quality review of our client’s extensive range of data sources, databases, GIS information and data processing activities. Whilst the work revealed a number of areas where our client is able to readily implement new modelling and move towards predictive asset management, the principal immediate benefit for our client has been in the third-party validation of key weaknesses in their existing data processes.
By implementing a robust data acceptance procedure that tests not only individual datasets, but also their readiness for integration with wider data sources and systems, quick wins for maximizing value were identified. The process also revealed where two versions of the truth are stored in separate locations, where different systems require updating with the same data, and where data links are broken or need to be created. Importantly, this stage of work has provided confidence to both client and contractor that it is feasible and worthwhile progressing to more advanced predictive models of their asset base, even though the current view of asset health is imperfect.
The latest approaches to infrastructure asset health modelling typically include a geospatial representation of extrinsic factors such as soil corrosivity and traffic loading as well as factors intrinsic to the asset. Indicator variables such as surface type, soil type and road class can be useful proxies to improve the predictive ability of the model. At WRc, we use a combination of spatial and statistical modelling combined with engineering know-how to enable assessment of failure probability and deterioration for each pipe cohort. Cohorts of assets are formed through grouping by material, diameter, era laid and soil class, for example. Characteristics found to have a significant impact upon repair rates are used to define these groupings for each asset class.
By way of example, WRc has supported one UK water company at each of the last four water price controls by developing and refining asset failure models for their infrastructure assets to inform alternative mains renewal strategies for the following AMP. Our approach combines pipeline cohort modelling with hotspot analysis and Bayesian regression in a single modelling framework which allows us to model the failure probability of a given pipeline with greater predictive ability than with cohort modelling alone. This approach has proven to be readily scalable as we have demonstrated its use in both limited and larger-scale networks.
Of course, with any modelling of this type, the sound application of data science principles is essential. With new techniques that allow for better integration of non-linear relationships such as regression tree-based approaches, it is very easy to produce an over-fitted or poorly validated model that has limited predictive power. The fine tuning of such models must give end-users the confidence that important historical features have been well-captured that the engineering or scientific principles can explain the relationships seen and, crucially, that the model is well validated against observations.
In conclusion, we still have some way to go as a sector to implement fully our data-driven asset health strategies. It is clear that better leading indicators to quantify and manage asset health for the longer term are needed as well as a better understanding of the role of regular asset inspection in the measurement and modelling of asset health. If we can successfully draw upon all of these indicators and data sources, we will improve our knowledge of both asset health and system resilience and have greater confidence in future investment strategies to provide a safe, reliable supply of water to customers.
Carmen Snowdon & Mark Kowalski,
Principal Consultants, Asset Management Solutions, WRc