Last Updated: December 30, 2023By

Introduction: In collaboration with aNumak & Company, a consulting firm specializing in data-driven decision-making, we explore the process of predicting concrete compressive strength to enhance construction practices. This case study employs advanced analytics to understand the factors influencing concrete strength and build a predictive model.

Exploratory Data Analysis (EDA): aNumak & Company initiates the analysis by performing Exploratory Data Analysis (EDA) to comprehend feature interactions. Correlational analysis reveals key factors influencing concrete strength:

  1. Age and Water Content: Strength increases and decreases with higher water content.
  2. Cement and Superplastic: Positive correlations with strength.

Further graphical analysis illustrates relationships between concrete strength and the amount of cement and water.

Causal Discovery Analysis: Causal relationships are explored to identify direct influences between variables. The analysis reveals relationships such as the expected connection between water and superplasticizer, where an increase in superplasticizer correlates with a decrease in water content.

Regression Analysis: ANumak & Company employs regression analysis to predict concrete compressive strength. Key steps include:

  1. Model Configuration: Selection of predictors (e.g., age, cement, water) and outcome variable (compressive strength).
  2. Hyperparameter Tuning: Fine-tuning model parameters for optimal performance.
  3. Explain Predictions Option: Enabling Shapley values to understand each variable’s impact on predictions.

Model Performance Evaluation: The trained model exhibits high performance:

  • Root Mean Squared Error, Mean Absolute Error, and Median Absolute Error close to optimal values.
  • R2 (coefficient of determination) indicating approximately 8% relative error.

Feature Importance Analysis: The model identifies age, cement, and water as crucial features affecting concrete compressive strength. This aligns with earlier correlational analysis findings.

Shapley Values and Prediction Analysis: aNumak & Company examines individual predictions, Shapley values, and raw values to understand how each variable contributes to predicted concrete strength. Granular insights enable adjustments to achieve desired strength levels.

Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE): PDP and ICE plots provide nuanced insights into the impact of variables like age and cement on concrete strength. These visualizations aid in determining optimal conditions for strength attainment.

Model Leaderboard and Export: The platform generates a leaderboard showcasing metrics for all trained models. Detailed information, including hyperparameters and training time, aids in selecting the most suitable model. Export options allow seamless integration into existing applications or Python environments.

Live Model Deployment: The trained model can be deployed in real-time using the ‘Live Model’ tab. It also offers API integration for incorporation into web or mobile applications.

Conclusion: aNumak & Company leverages advanced analytics to enhance construction practices by predicting concrete compressive strength. The insights gained empower stakeholders to make informed decisions on material quantities and aging processes, ensuring optimal strength outcomes. The platform’s user-friendly interface and robust functionalities facilitate a comprehensive understanding of the data and efficient model deployment for real-world applications.

For more information on aNumak & Company’s data-driven consulting services, please contact@anumak.com

Settings for the correlational analysis

The variable of interest, namely the concrete strength, is specified in the ‘correlation target’ field. Meanwhile, any features for which the correlation needs to be measured with the target are specified in the ‘compared factors’ field. Other options are also available, such as the number of factors to display and whether values should be shown on the bar chart.

After clicking the ‘Run’ button and waiting for a few moments, the results are generated and displayed to the user:

Bar chart showing the results of the correlational analysis

As can be observed, two essential features appear to be age and water. Specifically, the strength tends to increase with age and decreasing water content. Other features, such as the amount of cement and superplastic, correlate with the strength.

Several other graphs are also generated, which can also be used to deduce the relationship among features:

Relationship between strength and the amount of cement

Relationship between strength and the amount of water

The causal discovery analytic can also be used to discover causal relationships between variables. A causal relationship exists when one variable in a data set has a direct influence on another variable. Thus, one event triggers the occurrence of another event. 

The causal discovery analytic has several options, as follows:

Features to be used in the causal discovery analytic

First of all, the features to be considered need to be specified in the ‘Selected Features’ field. Hence, the interactions between all of these features will be determined. The model to be used, its options, and any constraints and causal variables can be specified in the ‘Causal Graph’ tab:

Specification of constraints and causal variables

Specification of model to be used and its options

Once satisfied with the options (default values should suffice in most cases), the ‘Run’ button can be clicked and a graph showing the causal relationships will be displayed after a few moments:

Identified causal relationships using the causal discovery analytic

As can be observed, there are quite a few relationships amongst our features.  Values representing the strength of the relation are also provided. For example, there is a clear relationship between ‘water’ and ‘superplastic’. This is actually expected, since superplasticizers enable reduction in water content. Furthermore, the negative value of the relationship (as indicated in red) signifies that as the amount of superplasticizer increases, the amount of water decreases and vice versa. 

Having gained a better understanding of our data, we can proceed to training a model that can predict the concrete compressive strength. The regression analytic can be chosen, with the following options selected:

Options for the regression analytic

Similar to the correlational analysis, the outcome that should be predicted (concrete compressive strength) should be specified in the ‘predicted target’ field, while any other features that should be used to predict the target are specified in the ‘predictors’ field. 

Several other options can also be specified, including:

Options for the regression analytic

In this case, the ‘explain predictions’ option has been selected. This will enable the generation of what are known as Shapley values that can help us understand to what extent each variable has increased or decreased the prediction. 

More advanced options can also be specified, such as the models to be trained and their hyperparameters. While the default settings generally work well, you might want to specify certain values to your liking or try to tune them to improve performance. Actable AI will then leverage state-of-the-art AutoML techniques to train several models with different hyperparameters automatically and select the one achieving the best performance. 

Model and hyperparameter options

The metric used for optimization can also be specified:

Advanced options

Once we are satisfied with the settings, the ‘Run’ button can be clicked to start the model training process. When it is completed, a number of results are displayed:

Performance metrics of the best model

We can first analyze the model’s performance using several metrics. Each of these compares the ground-truth values of the concrete compressive strength with those predicted by the best model. As can be observed, the results in this case are very good, with the Root Mean Squared Error, Mean Absolute Error, and Median Absolute Error being relatively close to 0 (the optimal value), and R2 being close to the optimal value of 1.0 (in this case, it can be said that the model has approximately 8% relative error). 

These metrics indicate that the model would perform well when used on real-world unseen data (data that is not used by the model when training it).

We can then observe which features are deemed to be necessary by the model:

Feature importance of the best model

It is clear that age, cement, and water are all very useful features for the trained model, which is unsurprising given the results in the correlational analysis discussed earlier. Hence, these features affect the concrete compressive strength the most.

Next, we can check out the raw values of the predictions and the Shapley values mentioned earlier:

Predicted values, ground-truth values, and Shapley values

Comparison of the ground-truth values (column ‘strength’) with the predicted values (‘strength_predicted’), it is clear that the predicted values are indeed very close to the actual values. Moreover, the extent to which each variable affects the outcome is also given in red or green; red values indicate that the value has decreased the value of the outcome (i.e. the concrete compressive strength), while green values indicate that the value has increased the value of the prediction. These values are generated for each specific sample, enabling highly granular analysis of the model and how each variable affects the outcome. This also helps determine how the concrete strength can be adjusted to the desired values.

Further analysis of how the model predictions vary across different values of the variables can also be checked out in the recently introduced PDP and ICE plots:

PDP/ICE for age

PDP/ICE for cement

An ICE plot shows the effect of a feature on the outcome, by freezing all the values of a sample except for the feature being investigated. The average across all samples yields the PD plot (PDP). In the above images, it is again evident that a greater age and cement quantity tends to increase the strength. 

However, in the case of cement, there appears to be a point where additional age does not yield further gains in strength, both in terms of the PDP (average) and in terms of the individual samples selected in the ICE. This helps us determine the amount of time required to attain the desired strength, without wasting any time for minimal to no gains.

More information on the best model and the other models that have been trained can also be viewed in the ‘leaderboard’ tab:

Metrics of all models trained

Apart from the chosen evaluation metric, the time required to train the model and perform the predictions is given. This helps us determine if the amount of time required for the model to work will be sufficient for the given application. Note that the desired inference time can also be specified in the ‘Advanced’ tab. The hyperparameters and variables that have been used by the model are also shown, allowing us to gain a better insight into the model composition. 

Once we are satisfied with the trained model, it can be used with new data by selecting the ‘Live Model’ tab, where predictions can be generated with a new data set. Predictor values can also be input interactively in a form, and predictions are generated on the fly:

‘Live Model’ tab

An API can also be used to integrate the model into your existing application (web app, mobile app, etc.) . Click on ‘Live API tab’ and all the details of the API are shown:

‘Live API’ tab

Finally, the trained model can also be exported and used directly within Python by following the instructions in the ‘Export Model’ tab:

‘Export model’ tab

Explore Blogs :