5th International Symposium on Machine Learning & Big Data in Geoscience (5ISMLG)​

10-13 May 2026, Hong Kong

Redefining “Sites” using Tailored Lego Clustering

Speaker:

Kok-Kwang Phoon, Singapore University of Technology and Design

Abstract:

One central feature of geotechnical engineering is site uniqueness or site-specificity. However, there is no data-driven method to quantify site uniqueness. The corollary is that it is not possible to identify “similar” sites from big indirect data (BID) (a site database with large geographic coverage) automatically and no method to combine sparse site-specific data with big indirect data to produce a quasi-local transformation model that is less biased compared to a generic model and less imprecise compared to a site-specific model. This “site recognition” challenge is difficult because site-specific data is MUSIC-X-G (Multivariate, Uncertain and Unique, Sparse, Incomplete, and potentially Corrupted with “X” and “G” denoting spatial variability and geologic uncertainty, respectively). The tailored clustering has been shown to be more effective than classical clustering (reference solution) in identifying “similar” sites from big indirect data (BID). There are other data-driven site characterization (DDSC) methods but all of them share a common assumption that a site is identical to a project site. For illustration, why not combine two adjacent project sites as a single “site” for DDSC?  Would a “similar” site be even more similar to a target site if only records within one or more depth intervals are extracted? In this lecture, this fundamental question – “what is site?” – is being studied using a large geotechnical site whose 3D subsurface volume that can be sub-divided into unit blocks (“Lego bricks”). It would be shown that “similar” sites, each defined by a different assembly of such Lego bricks, can produce a better quasi-local model. This lecture further explains that a target site itself can be divided into two or more target Lego assemblies. The quasi-local model for each assembly can be distinct. In short, the general question is how to look for similar Lego assemblies in BID to match a particular target Lego assembly, which is a subpart of a target site. “Data-driven site characterization” refers to any site characterization methodology that relies solely on measured data, both site-specific data collected for the current project and existing data of any type collected from past stages of the same project or past projects at the same site, neighboring sites, or beyond. This definition remains valid, but the term “site” should be interpreted as any ground volume without the “project site” qualifier. With this generalization, it is possible to imagine “precision site characterization” when more data are made available in the future.

Biography:

Kok-Kwang Phoon is President, Singapore University of Technology and Design (SUTD), as well as Cheng Tsang Man Chair Professor. Concurrently, he is serving as the Deputy Executive Chair (Research) of AI Singapore and a member of the Committee of Government Scientific Advisors. He has also served as the Deputy Chief Scientific Advisor (DCSA) to the National Research Foundation, Prime Minister’s Office, Singapore. He has been elected to serve on the board of the International Council of Academies of Engineering and Technological Sciences (CAETS), 2026-2027. Prof Phoon is a world leader in the development of reliability and data-centric geotechnics. He was bestowed the ASCE Norman Medal twice in 2005 and 2020, the Humboldt Research Award in 2017, the Harry Poulos Award in 2023, and the Alfredo Ang Award in 2024 among other accolades. Prof Phoon is the Founding Editor of Georisk and Founding Editor-in-chief of Geodata and AI.

滚动至顶部