Data Preparation

How raw inputs from many sources are cleaned, standardized, and vetted before any analysis.

What happens to a raw figure before it is used?

Every raw figure passes through a preparation stage before it enters any assessment. In this stage the value is cleaned, its format standardized, validated against expected ranges, and screened for anomalies. Only figures that clear these steps move forward, so downstream calculations rest on a consistent, vetted foundation. Preparation is the first half of the transformation stage; the second half — normalization to a common basis — is covered on the Comparability and Normalization page. This collection-and-integration path carries each figure from its raw source through extraction, transformation, and loading into the data warehouse that feeds every assessment:

Data collection and integration: raw source data is extracted, transformed, and loaded into the data warehouse.
Data collection and integration: raw source data is extracted, transformed, and loaded into the data warehouse.

How are formats checked and standardized?

Source data arrives in many file and number formats, each provider with its own conventions. All incoming files and number formats are converted into a single internal convention, units are standardized to the International System of Units (SI) using fixed conversion factors (detailed on the Units page), and each value is validated against expected ranges before any anomaly screening begins. These consistency checks ensure that differences observed later between figures reflect the market, not formatting artifacts.

How are trades of different sizes and grades handled?

Official trade statistics aggregate transactions that can differ widely in volume, specification, and delivery conditions. Mixing them indiscriminately would distort the assessed price level, so trades are filtered and clustered before use. Filtering applies criteria in four categories:

  • Minimum-volume thresholds — trades too small to represent the market level are excluded.
  • Specification and grade — only trades matching the assessed specification are kept together.
  • Location basis — trades are grouped by the location basis they reflect.
  • Trade size and delivery terms — clustering prevents distortions from mixing trades of different sizes or delivery conditions.

The filter criteria categories are disclosed; the specific numeric thresholds and the statistical techniques applied within them are proprietary.