In this R-tutorial, we demonstrate the usage of Kriging prior Regression (KpR). This tutorial accompanies the paper “Kriging prior Regression: A Case for Kriging-Based Spatial Features with TabPFN in Soil Mapping” by Schmidinger et al. (2025). Please note that we do not reproduce the full pipeline used in the paper (available in this GitHub repository). Instead, this tutorial focuses on providing a single simple and intuitive introduction to KpR. For users who are mainly interested in applying KpR to their own datasets, a generic KpR function is included at the end of this file.

Geostatistics vs Machine Learning

In this tutorial, we illustrate KpR by predicting soil pH in the G.150 dataset from LimeSoDa (Precision Liming Soil Datasets).

library(LimeSoDa) # Contains G.150 dataset
library(ggplot2)
library(sf)

G.150_dataset_coordinates <- cbind(G.150$Dataset,G.150$Coordinates) # We bind the dataset with coordinates
G.150_dataset_coordinates.sf <- st_as_sf(x = G.150_dataset_coordinates,
                                      coords = c("x_32722", "y_32722"),
                                      crs = 32722) #Make it spatial

#Plot
ggplot(G.150_dataset_coordinates.sf) +
  geom_sf(aes(color = pH_target),size=3) +
  scale_color_gradientn(colors = c("darkred","red", "orange", "yellow", "green", "cyan", "blue", "purple"),
                        limits = c(5.7, 7.7))+
  theme_bw()+
  theme(panel.grid.major = element_blank(),
        axis.title = element_blank(),
        strip.text = element_text(size = 16,color = "black"))+
  scale_y_continuous(limits = c(8178100,8179300)) +
  scale_x_continuous(limits = c(713600,714700))

In digital soil mapping, two main quantitative approaches are commonly used to create soil maps: (1) geostatistics, which exploits the spatial structure of known observations (e.g., pH) for spatial interpolation, and (2) machine learning, which leverages the relationship between soil properties and high-resolution sensing data as predictor features. To illustrate the practical differences between these approaches, we compare the predictions of (1) Ordinary Kriging (OK) and a (2) non-tuned Random Forest (RF) model in a 10-fold cross-validation (CV). Let us now briefly examine their performance and resulting predictions:

library(randomForest)
library(automap)
library(purrr)

G.150_ML <- G.150_dataset_coordinates[-c(1,3,ncol(G.150_dataset_coordinates)-1,ncol(G.150_dataset_coordinates))] # We need to get rid of the other targets (Clay and SOC) and in a non-spatial ML also coordinates for our ML task.
G.150_Folds <- G.150$Folds

#Store predictions and test values
RF_pH_prediction <- c()
OK_pH_prediction <- c()
pH_test <- c()

set.seed(2025)

# Run non-spatial RF model and OK in 10 fold-CV
for(i in 1:10){
  G.150_train_ML <- G.150_ML[G.150_Folds != i,]
  G.150_test_ML <- G.150_ML[G.150_Folds == i,]
  
  pH_test <- c(pH_test, G.150_test_ML$pH_target)  

  
  RF_model <- randomForest(
    pH_target ~ .,
    data  = G.150_train_ML)

  RF_pH_foldwise <- predict(RF_model, newdata = G.150_test_ML)
  RF_pH_prediction <- c(RF_pH_prediction, RF_pH_foldwise)


  G.150_train_OK <- G.150_dataset_coordinates.sf[G.150_Folds != i,]
  G.150_test_OK <- G.150_dataset_coordinates.sf[G.150_Folds == i,]
  
  
  OK_pH_foldwise <- quietly(autoKrige)(
  pH_target ~ 1, input_data = G.150_train_OK, new_data = G.150_test_OK)$result
  
  
  
  OK_pH_prediction <- c(OK_pH_prediction, OK_pH_foldwise$krige_output$var1.pred)
  
}


RFvsOK <- data.frame(
  RF = RF_pH_prediction,
  OK = OK_pH_prediction)

# Evaluate Correlation
ggplot(RFvsOK, aes(x = OK, y = RF)) +
  geom_point(color = "steelblue", size = 3, alpha = 0.7) +
  geom_abline(intercept = 0, slope = 1, color = "black", linetype = "dashed") +
  labs(x = "OK predicted pH",y = "RF predicted pH") +
  theme_bw() +
  xlim(6, 7.3) +
  ylim(6, 7.3)+
  annotate("text", x = 6.1, y = 7.2, label = paste0("R = ", round(cor(RFvsOK$OK, RFvsOK$RF, method = "pearson"), 2)),size = 5)

# Evaluate performance
cat(
  "RF R2:", round(1 - sum((pH_test - RF_pH_prediction)^2) / sum((pH_test - mean(pH_test))^2), 3),
  "\nOK R2:", round(1 - sum((pH_test - OK_pH_prediction)^2) / sum((pH_test - mean(pH_test))^2), 3)
)
## RF R2: 0.392 
## OK R2: 0.456

As this example shows, OK clearly outperformed RF. This result is not unexpected, since soil pH is not easily detectable by the sensors used in this study (electrical conductivity, optical remote sensing, and terrain attributes). In such cases, relying on the spatial structure of pH (as OK does) proved more powerful than RF. Naturally, we would prefer OK over RF here. However, an alternative would be to enhance RF by incorporating spatial information.

Adding spatial-context through Kriging prior Regression (KpR)

In our paper, we explored several spatial techniques designed to add spatial context to ML. Among these, feature engineering based on Interpolation prior Regression (IpR), often referred to as the spatial lag, proved particularly effective, outperforming the more common Residual/Regression Kriging. The principle of IpR is straightforward: we interpolate from the training data to unsampled locations (here, the test set), but instead of treating the interpolated values as final predictions, we use them as additional features in a ML model. Traditionally, this interpolation has been based on simple methods such as K-nearest neighbors (KNN) averaging or inverse distance weighting (IDW). In contrast, we propose Kriging prior Regression (KpR), which uses OK for IpR. This has several advantages: (1) OK has known theoretical advantages over IDW or KNN. (2) KpR allows us to include the kriging variance as an additional feature for more spatial context. (3) KpR does not need tuning of extra parameters such as the number of neighbour for KNN and IDW.

In the following, we do a nested leave-one-out (LOO) scheme, to create KpR features for our training sets, and use the observations from our training set to create KpR features for our test set. This avoids data-leakage in the outer and inner loop.

library(gstat)

#Store training and test sets with KpR for the 10 fold-CV in these lists
list_of_training_data <- list()
list_of_test_data <- list()

# Create KpR features
for(i in 1:10){
  training_data <- G.150_dataset_coordinates[G.150_Folds != i,]
  test_data <- G.150_dataset_coordinates[G.150_Folds == i,]
    
  training_data.sf <- st_as_sf(training_data, coords = c("x_32722", "y_32722"), crs = 32722)
  test_data.sf <- st_as_sf(test_data, coords = c("x_32722", "y_32722"), crs = 32722)

  KpR_pH <-c()
  KpR_pH_var <-c()
  variogram <- autofitVariogram(pH_target~1,training_data.sf) # Careful! For the calculation of the inner-variogram, we used also the LOO-inner test sample, to have a fixed variogram, preventing potential considerable changes in the variogram structure. However, it may be better to calculate the variogram without the "leaky" LOO-inner test sample, to prevent overfitting on "artificially boosted" KpR in the outer fold. Nonetheless, we assume the effect is neglectable.
    
  for (j in 1:nrow(training_data)){
    OK_KpR_pH <- quietly(krige)(pH_target~1, training_data.sf[-j,], training_data.sf[j,], model = variogram$var_model)$result
    KpR_pH<-c(KpR_pH, OK_KpR_pH$var1.pred)
    KpR_pH_var<-c(KpR_pH_var, OK_KpR_pH$var1.var)
  }


  training_data_KpR <- cbind(training_data, data.frame("pH.KpR"=KpR_pH,"var.KpR"=KpR_pH_var))
  OK_KpR_pH_test <- quietly(krige)(pH_target~1, training_data.sf, test_data.sf, model = variogram$var_model)$result

  test_data_KpR<- cbind(test_data, data.frame("pH.KpR"=OK_KpR_pH_test$var1.pred,"var.KpR"=OK_KpR_pH_test$var1.var))

  list_of_training_data[[i]] <- training_data_KpR
  list_of_test_data[[i]] <- test_data_KpR
}

Data_list_KpR <- list(training = list_of_training_data, test = list_of_test_data) 

# Evaluate KpR features
head(Data_list_KpR$training[[1]][(ncol(Data_list_KpR$training[[1]])-1):ncol(Data_list_KpR$training[[1]])])
##     pH.KpR    var.KpR
## 2 6.611482 0.05061700
## 3 6.762569 0.04629221
## 4 6.792359 0.04506271
## 5 6.867678 0.04671923
## 6 6.908595 0.05154544
## 7 7.012967 0.04854950

Having created the KpR features, we first take a closer look at their spatial structure and correlation before running RF with KpR to compare performance. For simplicity, we illustrate this using only the first iteration of the subsequent 10-fold CV.

library(egg)
library(corrplot)

# Analyse the KpR features
G.150_KpR_training_fold1 <- st_as_sf(x = Data_list_KpR$training[[1]],
                                      coords = c("x_32722", "y_32722"),
                                      crs = 32722)

G.150_KpR_test_fold1 <- st_as_sf(x = Data_list_KpR$test[[1]],
                                      coords = c("x_32722", "y_32722"),
                                      crs = 32722)

G.150_KpR_training_fold1$title <- "Train KpR: pH"
p_train_ph <- ggplot(G.150_KpR_training_fold1) +
  geom_sf(aes(color = pH.KpR), size = 3) +
  scale_color_gradientn(
    colors = c("darkred","red","orange","yellow","green","cyan","blue","purple"),
    limits = c(5.7, 7.7)
  ) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        strip.text = element_text(size = 16, color = "black"),
        legend.position = "none")+
  scale_y_continuous(limits = c(8178100,8179300)) +
  scale_x_continuous(limits = c(713600,714700))+
  facet_wrap(~ title)

G.150_KpR_training_fold1$title <- "Train KpR: var"
p_train_var <- ggplot(G.150_KpR_training_fold1) +
  geom_sf(aes(color = var.KpR), size = 3) +
  scale_color_gradientn(
    colors = c("lightgrey","darkblue"),
    limits = c(0.04, 0.065)
  ) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        strip.text = element_text(size = 16, color = "black"),
        legend.position = "none")+
  scale_y_continuous(limits = c(8178100,8179300)) +
  scale_x_continuous(limits = c(713600,714700))+
  facet_wrap(~ title)

G.150_KpR_test_fold1$title <- "Test KpR: pH"
p_test_ph <- ggplot(G.150_KpR_test_fold1) +
  geom_sf(aes(color = pH.KpR), size = 3) +
  scale_color_gradientn(
    colors = c("darkred","red","orange","yellow","green","cyan","blue","purple"),
    limits = c(5.7, 7.7),
    name = "pH KpR") +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        strip.text = element_text(size = 16, color = "black")) +
  scale_y_continuous(limits = c(8178100,8179300)) +
  scale_x_continuous(limits = c(713600,714700))+
  facet_wrap(~ title)

G.150_KpR_test_fold1$title <- "Test KpR: var"
p_test_var <- ggplot(G.150_KpR_test_fold1) +
  geom_sf(aes(color = var.KpR), size = 3) +
  scale_color_gradientn(
    colors = c("lightgrey","darkblue"),
    limits = c(0.04, 0.065),
    name = expression(sigma^2 * " KpR")) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        strip.text = element_text(size = 16, color = "black"))+
  scale_y_continuous(limits = c(8178100,8179300)) +
  scale_x_continuous(limits = c(713600,714700))+
  facet_wrap(~ title)

ggarrange(p_train_ph, p_test_ph,
          p_train_var, p_test_var,
          ncol = 2, nrow = 2)

cor_mat <- cor(Data_list_KpR$training[[1]][-c(1,3,ncol(Data_list_KpR$training[[1]])-3,ncol(Data_list_KpR$training[[1]])-2)][sapply(Data_list_KpR$training[[1]][-c(1,3,ncol(Data_list_KpR$training[[1]])-3,ncol(Data_list_KpR$training[[1]])-2)], is.numeric)], use = "pairwise.complete.obs")

corrplot(cor_mat, method = "circle", type = "upper",
         tl.col = "black", tl.srt = 45)

As a reminder, the KpR features are generated using a LOO-scheme for the training set, while the test set KpR features are derived solely from the training data. We emphasize this again because spatial lag features are sometimes incorrectly criticized for causing data leakage. In our setup, however, the test set remains fully independent of the training set.

In this grid-sampling dataset, the additional feature KpR.var plays only a minor role, as the kriging variance is nearly constant except at the field boundaries. Nevertheless, in the case of spatially clustered data, KpR.var could potentially become much more informative. While we have not explored this aspect in depth, it represents an interesting direction for future research looking at the effect of the sampling on the success of spatial techniques.

set.seed(2025)

# Run RF model with KpR in 10 fold-CV
RF.KpR_pH_prediction <- c()
for(i in 1:10){

  
  RF.KpR_model <- randomForest(
    pH_target ~ .,
    data  = Data_list_KpR$training[[i]][-c(1,3,ncol(Data_list_KpR$training[[i]])-3,ncol(Data_list_KpR$training[[i]])-2)])

  RF.KpR_pH_foldwise <- predict(RF.KpR_model, 
                            newdata =Data_list_KpR$test[[i]][-c(1,3,ncol(Data_list_KpR$test[[i]])-3,ncol(Data_list_KpR$test[[i]])-2)])
  RF.KpR_pH_prediction <- c(RF.KpR_pH_prediction, RF.KpR_pH_foldwise)
}

# Evaluate performance
cat(
  "RF R2:", round(1 - sum((pH_test - RF_pH_prediction)^2) / sum((pH_test - mean(pH_test))^2), 3),
  "\nOK R2:", round(1 - sum((pH_test - OK_pH_prediction)^2) / sum((pH_test - mean(pH_test))^2), 3),
  "\nRF+KpR R2:", round(1 - sum((pH_test - RF.KpR_pH_prediction)^2) / sum((pH_test - mean(pH_test))^2), 3)
)
## RF R2: 0.392 
## OK R2: 0.456 
## RF+KpR R2: 0.494

As shown, incorporating KpR features led to the highest accuracy by enriching the ML model (here RF) with spatial context derived from geostatistics. In this way, we combine the strengths of both approaches and exploit their complementary information.

Correlation_predictions <- data.frame(
  RF = RF_pH_prediction,
  OK = OK_pH_prediction,
  RF.KpR = RF.KpR_pH_prediction)

# Evaluate Correlation
corr_mat <- cor(Correlation_predictions)

corrplot(corr_mat, 
         method = "number", 
         type = "lower", 
         diag = FALSE,
         tl.col = "black", 
         tl.srt = 45)

RF.KpR is basically a hybrid between non-spatial RF and OK. This is reflected in its strong correlation with both OK and RF.

For a more detailed discussion, see Schmidinger et al. (2025), where we show under which conditions KpR is most effective and when it does not provide additional benefits to a ML model.

A generic Kriging prior Regression function

If you would like to apply KpR to your own dataset, you can use the function provided below. Simply pass in your training and test datasets, specify the coordinate reference system (CRS) number, and define the target_property along with the XY-coordinate vector.

# "sf", "automap" and "krige" are needed to run this code

# KpR_dataset_list generates the KpR features and returns a list with two elements: the augmented training set and the augmented test set, each enriched with the KpR features.

# Inputs:
# training_data: Input the training set as a dataframe for which we create the KpR features through a LOO-scheme.

# test_data: Input the test set as a dataframe for which we create the KpR features based on our testing set.

# coordinates_crs: Input the crs of the coordinate system to make the training and testing set spatial for kriging.

# target_property: Input the column name of the target property.

# xy_vector: Input the vector with the names of the x and y coordinates as a vector with two entries.

KpR_dataset_list <- function(training_data, test_data, coordinates_crs, target_property, xy_vector) {

  training_data.sf <- st_as_sf(training_data, coords = xy_vector, crs = coordinates_crs)
  test_data.sf     <- st_as_sf(test_data, coords = xy_vector, crs = coordinates_crs)

  KpR_pred <- c()
  KpR_pred_var <- c()

  variogram <- autofitVariogram(as.formula(paste(target_property, "~1")), training_data.sf) # Note: For the calculation of the inner-variogram, we used also the LOO-inner test sample, to have a fixed variogram, preventing potential considerable changes in the variogram structure. However, it may be better to calculate the variogram without the "leaky" LOO-inner test sample, to prevent overfitting on "artificially boosted" KpR in the outer fold. Nonetheless, we assume the effect is neglectable. In the worst case, we slightly harm our performance (no data-leak in the outer CV!).

  for (j in 1:nrow(training_data)) {
    OK_KpR_pred <- krige(as.formula(paste(target_property, "~1")), # Before we forced kriging to suppress printing "[using ordinary kriging]" messages through quietly. Here we left it out, to rely on less packages. Hence, "krige" will spam a lot of prints.
                         training_data.sf[-j,], training_data.sf[j,],
                         model = variogram$var_model)
    KpR_pred    <- c(KpR_pred,    OK_KpR_pred$var1.pred)
    KpR_pred_var<- c(KpR_pred_var,OK_KpR_pred$var1.var)
  }

  training_data_KpR <- cbind(training_data,
                             data.frame("Pred.KpR" = KpR_pred,
                                        "Var.KpR"  = KpR_pred_var))

  OK_KpR_test <- krige(as.formula(paste(target_property, "~1")), # Before we forced kriging to suppress printing "[using ordinary kriging]" messages through quietly. Here we left it out, to rely on less packages. Hence, "krige" will spam a lot of prints.
                       training_data.sf, test_data.sf,
                       model = variogram$var_model)

  test_data_KpR <- cbind(test_data,
                         data.frame("Pred.KpR" = OK_KpR_test$var1.pred,
                                    "Var.KpR"  = OK_KpR_test$var1.var))

  return(list(training = training_data_KpR, test = test_data_KpR))
}

# Test run with this function:
# KpR_test <- KpR_dataset_list(G.150_dataset_coordinates[G.150_Folds != 1,],G.150_dataset_coordinates[G.150_Folds == 1,],32722,"pH_target",c("x_32722", "y_32722"))
# KpR_test$training
# KpR_test$test

Cite & Contact

If you have questions or suggestions feel free to contact me at or connect with me through LinkedIN.

When you use KpR; please cite the associated paper:

Schmidinger, J., Barkov, V., Vogel, S., Atzmueller, M., & Heuvelink, G. B. M. (2025): Kriging prior Regression: A Case for Kriging-Based Spatial Features with TabPFN in Soil Mapping. arXiv:2509.09408