Briefing: Fine-Scale Predictive Modeling for Dengue Risk in Malaysia
Source: Dom, N.C., Abdullah, N.A.M.H., Dapari, R.
et al. Fine-scale predictive modeling of Aedes mosquito abundance and dengue risk indicators using machine learning algorithms with microclimatic variables.
Sci Rep 15, 37017 (2025).
https://doi.org/10.1038/s41598-025-17191-yDate: Received - 01 February 2025 | Accepted - 21 August 2025 | Published - 23 October 2025
Executive Summary
This briefing document synthesizes the findings of a study on the use of machine learning (ML) for fine-scale prediction of Aedes mosquito abundance and dengue risk in Kuala Selangor, Malaysia. Faced with a doubling of dengue cases in 2023, the study addresses the limitations of coarse, regional forecasting models by incorporating daily microclimatic data (temperature, relative humidity, rainfall) to improve predictive accuracy at the neighborhood level.
Key Takeaways:
- Variable Model Performance: No single machine learning algorithm—Artificial Neural Network (ANN), Random Forest (RF), or Support Vector Machine (SVM)—was universally superior. Performance was highly dependent on the specific mosquito species (Ae. aegypti vs. Ae. albopictus), the risk indicator being predicted (Aedes Index vs. Dengue Positive Trap Index), and the combination of microclimatic inputs. For instance, ANN excelled at predicting the Ae. aegypti Aedes Index, while SVM was most effective for predicting the Ae. albopictus Dengue Positive Trap Index.
- Impact of Predictor Complexity: Models incorporating multiple microclimatic variables (dual or triple combinations) generally yielded lower error metrics than single-variable models. However, increasing model complexity did not always improve accuracy and, in some cases, led to overfitting and higher prediction errors, particularly for ANN models. This highlights a critical trade-off between model complexity and predictive power.
- Moderate and Time-Lagged Climatic Influence: While statistically significant, the correlations between microclimatic variables and mosquito indices were weak to moderate (correlation coefficients ranged from -0.30 to 0.32). This indicates that microclimate alone is insufficient to fully explain mosquito population dynamics and that other unmodeled factors, such as breeding site density, vegetation, and human activity, play a crucial role. The analysis also revealed significant time lags of up to 91 days, suggesting cumulative or delayed environmental effects on mosquito life cycles.
- Species-Specific Ecological Responses: The study identified distinct ecological sensitivities between the primary dengue vectors. Aedes albopictus demonstrated a quicker response to rainfall for dengue risk (a lag of -28 days) compared to Aedes aegypti (-63 days), which aligns with its known preference for more transient breeding habitats.
Conclusion: The research validates the potential of fine-scale, microclimate-driven ML models as a valuable tool for creating proactive and targeted dengue control strategies. However, it underscores that effective implementation requires careful model selection tailored to specific species and local conditions. Future predictive systems would benefit from integrating a broader range of ecological and anthropogenic data to enhance accuracy and operational value.
--------------------------------------------------------------------------------
1. Background and Rationale
Dengue fever remains a significant and escalating public health threat in Malaysia. The Ministry of Health reported over 123,000 cases in 2023, a twofold increase from 2021, with the state of Selangor bearing the highest burden. This trend suggests that existing vector control strategies, public awareness campaigns, and regulatory enforcement face significant limitations, particularly in densely populated urban areas.
The proliferation of Aedes mosquitoes, the primary vectors for dengue, is heavily influenced by environmental conditions, especially microclimatic variables like temperature, humidity, and rainfall. Previous predictive models have often relied on coarse-resolution data from regional weather stations or satellites. This approach fails to capture the localized microclimatic variations critical to mosquito breeding at the neighborhood or household level, thereby limiting the models' utility for guiding timely and targeted interventions.
This study aimed to bridge this gap by developing and evaluating fine-scale predictive models for Aedes mosquito abundance and dengue risk indicators in Kuala Selangor, a known dengue hotspot. The core objective was to leverage machine learning algorithms to analyze daily, localized microclimatic data, thereby improving forecasting accuracy for more effective, data-driven vector control.
2. Methodological Framework
The study was conducted over 26 weeks, from February 6 to August 6, 2023, in urban and suburban districts of Kuala Selangor, a region with a tropical climate conducive to mosquito breeding.
2.1. Data Collection and Key Indicators
- Microclimatic Data: Daily mean, minimum, and maximum temperature, relative humidity, and rainfall were recorded using calibrated weather sensors.
- Entomological Data: A total of 60 Gravitrap-Outdoor Sentinel (GOS) traps were deployed in shaded, sheltered outdoor locations to capture adult female Aedes mosquitoes. Traps were serviced weekly.
- Outcome Variables (Risk Indicators):
- Aedes Index (AI): The proportion of traps containing at least one adult female Aedes mosquito. This serves as an indicator of mosquito abundance.
- Dengue Positive Trap Index (DPTI): The percentage of traps with at least one female Aedes mosquito testing positive for the dengue virus NS1 antigen, indicating active virus transmission risk.
- Species Analyzed: Predictions were generated for Aedes aegypti, Aedes albopictus, and the combined "Total Aedes" population.
2.2. Machine Learning Approach
- Algorithms: Three ML algorithms were selected for their strengths in modeling complex, nonlinear relationships:
- Artificial Neural Networks (ANN): Adept at capturing subtle patterns in high-dimensional data.
- Random Forest (RF): Robust in handling feature interactions and noisy data.
- Support Vector Machines (SVM): Performs well with limited datasets and resists overfitting.
- Predictor Combinations: Models were trained using single-variable (e.g., temperature alone), dual-variable (e.g., temperature + rainfall), and triple-variable (all three factors) inputs to assess individual and synergistic effects.
- Data Processing:
- Time Lags: Cross-correlation analysis was used to identify the most significant time lag (up to 91 days) between each microclimatic variable and the mosquito indices.
- Data Standardization: Predictor variables were standardized using z-score transformation to ensure uniform scaling.
- Data Split: The dataset was split chronologically into a 70% training set (first 18 weeks) and a 30% test set (final 8 weeks) to simulate real-world forecasting conditions.
- Model Training and Evaluation:
- Models were trained using 10-fold cross-validation, and hyperparameters were optimized via a grid search strategy.
- Performance was evaluated on the independent test set using two standard regression metrics: Mean Absolute Error (MAE) and Root Mean Square Error (RMSE).
3. Key Findings: Microclimatic Influence and Time Lags
Cross-correlation analysis revealed statistically significant but moderately strong associations between microclimatic variables and the mosquito indices. This suggests that while climate is an important driver, other factors not included in the models (e.g., vegetation, breeding site availability, human activity) also significantly influence mosquito populations.
- Correlation Strength: Correlation coefficients (r) ranged from -0.30 to 0.32, indicating a weak to moderate explanatory power. The strongest relationship observed, between rainfall and Ae. aegypti DPTI (r = 0.31), explains only about 9.6% of the variance.
- Time Lag Patterns: Environmental changes demonstrated delayed effects on mosquito indices, with significant lags extending up to 91 days. These long lags may reflect cumulative environmental influences or delayed biological responses, such as the persistence of larval habitats.
- Species-Specific Responses: Ae. albopictus showed a shorter lag response to rainfall (-28 days for DPTI) compared to Ae. aegypti (-63 days), suggesting a quicker ecological adaptation to rainfall events, likely due to its preference for transient breeding sites.
- Anomalies: Some correlations, such as a -1 day lag for maximum temperature, lacked a plausible biological explanation and were considered likely artifacts or transient fluctuations.
Summary of Microclimatic Associations
Mosquito Category | Microclimate Variable | AI Prediction | DPTI Prediction
| | Time Lag (day) / r | Time Lag (day) / r
Total Aedes | Temperature (°C) | |
| Mean | -52 / 0.21 | -51 / 0.24
| Min | -44 / 0.23 | -51 / 0.22
| Max | -40 / 0.16 | -1 / -0.27
| Relative Humidity (%) | |
| Mean | -29 / -0.21 | -80 / -0.30
| Min | -36 / -0.24 | -82 / -0.26
| Max | -44 / 0.16 | -91 / 0.21
| Rainfall (mm) | -56 / 0.24 | -63 / 0.29
Ae. aegypti | Temperature (°C) | |
| Mean | -52 / 0.21 | -47 / 0.23
| Min | -44 / 0.23 | -72 / 0.19
| Max | -40 / 0.20 | -40 / 0.20
| Relative Humidity (%) | |
| Mean | -29 / -0.24 | -73 / -0.27
| Min | -35 / -0.28 | -82 / -0.26
| Max | -38 / 0.19 | -91 / 0.21
| Rainfall (mm) | -56 / 0.32 | -63 / 0.31
Ae. albopictus | Temperature (°C) | |
| Mean | -59 / 0.19 | -53 / 0.27
| Min | -44 / 0.20 | -51 / 0.22
| Max | n.s. | -19 / -0.17
| Relative Humidity (%) | |
| Mean | -49 / -0.18 | -80 / -0.25
| Min | -49 / -0.22 | -21 / -0.19
| Max | n.s. | -23 / 0.16
| Rainfall (mm) | -63 / 0.20 | -28 / 0.20
Note: A negative time lag indicates the influence of past conditions on current predictions. "n.s." denotes non-significant correlations.
4. Key Findings: Predictive Model Performance
The performance of the ML models varied significantly across species, risk indices, and predictor sets, challenging the notion that a single algorithm is consistently superior.
4.1. Aedes Index (AI) Prediction
For predicting mosquito abundance (AI), the best-performing model differed for each species category when using all three microclimatic variables.
- Total Aedes: The Random Forest (RF) model performed best with the triple-variable combination (MAE = 0.454, RMSE = 0.685).
- Ae. aegypti: The Artificial Neural Network (ANN) model achieved the lowest error with the triple combination (MAE = 0.175, RMSE = 0.248).
- Ae. albopictus: The Support Vector Machine (SVM) model was the most accurate with the triple combination (MAE = 0.242, RMSE = 0.344).
It was noted that for Ae. aegypti, some dual-variable ANN models produced higher errors than single-variable models, suggesting a risk of overfitting when adding complexity without sufficient data or regularization.
Performance Metrics for Aedes Index (AI) Prediction
Mosquito abundance | Microclimate | ANN | RF | SVM
| | MAE / RMSE | MAE / RMSE | MAE / RMSE
Total Aedes | Triple (ABC) | 0.458 / 0.688 | 0.454 / 0.685 | 0.474 / 0.763
Ae. aegypti | Triple (ABC) | 0.175 / 0.248 | 0.182 / 0.255 | 0.186 / 0.283
Ae. albopictus | Triple (ABC) | 0.273 / 0.349 | 0.285 / 0.367 | 0.242 / 0.344
Note: ABC = Temperature + Relative Humidity + Rainfall. Bold values indicate the best performance in each row.
4.2. Dengue Positive Trap Index (DPTI) Prediction
For predicting dengue virus presence (DPTI), SVM and RF models often outperformed ANN, particularly for Total Aedes and Ae. albopictus.
- Total Aedes: The SVM model delivered the best performance with the triple-variable combination (MAE = 3.118, RMSE = 4.164).
- Ae. aegypti: The RF model was most accurate using the triple combination (MAE = 2.624, RMSE = 3.441).
- Ae. albopictus: The SVM model consistently outperformed other algorithms, achieving its lowest error with the triple combination (MAE = 1.723, RMSE = 2.446).
Performance Metrics for Dengue Positive Trap Index (DPTI) Prediction
Mosquito abundance | Microclimate | ANN | RF | SVM
| | MAE / RMSE | MAE / RMSE | MAE / RMSE
Total Aedes | Triple (ABC) | 4.553 / 6.053 | 4.365 / 5.057 | 3.118 / 4.164
Ae. aegypti | Triple (ABC) | 2.924 / 3.792 | 2.624 / 3.441 | 2.351 / 3.581
Ae. albopictus | Triple (ABC) | 2.904 / 3.460 | 2.410 / 2.896 | 1.723 / 2.446
Note: ABC = Temperature + Relative Humidity + Rainfall. Bold values indicate the best performance in each row.
5. Synthesis and Overall Model Comparison
The study's findings provide a nuanced view of ML model application for dengue risk forecasting.
- No Universal Best Model: The results definitively show that the optimal model choice depends on the specific prediction target. Claims of general superiority for any single algorithm, such as ANN, are not supported by the evidence.
- Complexity vs. Accuracy: While multi-variable models often enhanced predictive accuracy, the improvements were sometimes marginal and came with the risk of overfitting. In certain cases, a simpler single-variable model performed better than a more complex dual-variable one.
- Role of Rainfall: Rainfall emerged as a highly influential single predictor in many scenarios, but it was not universally the most predictive variable.
- Model Reliability: Visual analysis of predicted versus observed values showed that ANN and RF models had tighter error distributions, suggesting more consistent performance. SVM predictions were more dispersed, indicating greater variability and potential for under- or overestimation in certain conditions.
6. Discussion, Limitations, and Future Directions
This study successfully demonstrates that fine-scale ML models using microclimatic data can enhance the prediction of Aedes mosquito abundance and dengue risk. The results underscore the importance of species-specific modeling and aligning model complexity with data characteristics to avoid overfitting.
Limitations Identified:
- Unmodeled Variables: The moderate predictive power of the models indicates that crucial ecological and anthropogenic factors—such as vegetation structure, breeding container density, human mobility, and vector control interventions—were not included and are necessary for a more complete picture.
- Model Convergence: Random Forest models using only rainfall as a predictor failed to converge, leading to their exclusion from the results and highlighting a potential limitation of the algorithm under certain data conditions.
- Hyperparameter Optimization: While a grid search was used for tuning, performance variability suggests that more advanced optimization strategies (e.g., Bayesian optimization) could improve model consistency.
Future Directions:
The study concludes that while these models provide a strong data-driven basis for proactive dengue control, future research should focus on integrating additional predictors to improve their practical value. Incorporating data on land cover, human population density, and ongoing control efforts would likely enhance model accuracy and support more effective integrated vector management (IVM) strategies.