(PDF) Machine Learning-Driven Best-Worst

Article Machine Learning-Driven Best–Worst Method for Predictive Maintenance in Industry 4.0 Eyad Megdadi 1,*, Azza Mohamed 2 and Khaled Shaalan 1 Faculty of Engineering and IT, British University in Dubai, Dubai 345015, United Arab Emirates;

[email protected]

2 Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt;

[email protected]

* Correspondence:

[email protected]

1 Abstract Academic Editors: Yi-Xin Su, Wentao Mao and Jie Liu Received: 02 September 2025 Revised: 01 November 2025 Accepted: 04 December 2025 Published: 8 December 2025 Citation: Megdadi, E.; Mohamed, A.; The rapid proliferation of Industry 4.0 technologies has created an urgent need for intelligent and reliable predictive maintenance (PdM) systems. While multi-criteria decisionmaking (MCDM) frameworks like the Best–Worst Method (BWM) offer structured approaches for prioritizing maintenance tasks, their traditional reliance on subjective expert opinion limits their scalability and adaptability in dynamic industrial settings. This study addresses these limitations by introducing a robust, data-driven framework that integrates machine learning (ML) with BWM. This study presents a framework integrating ML models with BWM, an MCDM technique. While prior work has explored ML for fault detection/classification and hybrid MCDM + ML approaches, our innovation lies in automating BWM weight calculation via ML-derived feature importances, transforming tacit expert knowledge (traditionally subjective) into explicit, data-driven criteria weights aligned with Knowledge Management (KM) principles. The proposed methodology moves beyond a single-model proof-of-concept to present a comprehensive validation blueprint for industrial deployment. The framework’s efficacy is demonstrated using the standard Case Western Reserve University (CWRU) dataset, where rigorous cross-validation and statistical significance testing identified the optimal model, offering a compelling balance of high stability and efficiency for adaptive systems. Furthermore, simulations demonstrated the framework’s real-time viability, with low processing latency, and its resilience to concept drift through an adaptive retraining strategy. By integrating the empirically validated model’s feature importances into the BWM, this work establishes an objective, data-driven, and adaptive system for prioritizing maintenance, thereby advancing the transition toward autonomous and self-optimizing industrial ecosystems. Shaalan, K. Machine LearningDriven Best–Worst Method for Predictive Maintenance in Industry Keywords: Industry 4.0; predictive maintenance; Best–Worst Method (BWM); Machine Learning (ML); bearing fault diagnosis 4.0. Automation 2025, 6, 91. https:// doi.org/10.3390/automation6040091 Copyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/license s/by/4.0/). Automation 2025, 6, 91 1. Introduction The fourth industrial revolution, or Industry 4.0, has catalyzed a fundamental transformation in manufacturing by embedding intelligent technologies into core processes [1]. A critical enabler of this paradigm is the Industrial Internet of Things (IIoT), which facilitates the collection of vast, high-velocity data streams from machinery via interconnected sensors. This data influx has not only rendered traditional reactive and time-based https://doi.org/10.3390/automation6040091 Automation 2025, 6, 91 2 of 25 maintenance strategies inefficient but also created an imperative for advanced, datadriven approaches. Predictive maintenance (PdM) stands at the forefront of this evolution, promising to minimize downtime and extend asset lifecycles by forecasting failures. However, the sheer volume and complexity of IIoT data, coupled with the dynamic nature of industrial environments, present significant challenges that demand more than simple predictive models. It is this context that necessitates the development of frameworks that are not only accurate but also statistically robust, economically aware, and adaptive to changing conditions. To address this limitation, this study proposes a Machine Learning (ML)-enhanced BWM framework for predictive maintenance in Industry 4.0 settings. By combining the structured decision-making capability of BWM with the adaptability and predictive power of ML, the proposed approach enables continuous updating of maintenance priorities based on live operational data. This integration reduces reliance on subjective human input, optimizes resource allocation, and supports adaptive maintenance strategies tailored to evolving conditions. The primary objectives of this research are threefold: 1. 2. 3. To develop a decision-support framework that merges ML and BWM for the realtime prioritization of maintenance activities. To enhance the predictive accuracy of PM models using ML-driven insights. To validate the practical effectiveness of the proposed framework through case studies and simulations in industrial environments. By bridging the gap between established decision-making methodologies and immediate data analytics, this work contributes to the advancement of smart maintenance and supports the transition toward autonomous, self-optimizing industrial ecosystems. Leveraging the CWRU bearing fault dataset [2], the framework demonstrates high predictive performance—achieving 96% accuracy in fault detection—while offering scalability across diverse industrial sectors. Ultimately, this approach underscores the transformative potential of integrating ML with BWM to deliver resilient, sustainable, and cost-effective maintenance strategies, meeting the growing demands of Industry 4.0. 2. Literature Review 2.1. Knowledge Management Systems (KMSs) Knowledge Management Systems (KMSs) are an implemented concept in all industries, which make enterprises more competitive and efficient by storing, learning, and exchanging information within organizations. The systems are designed to facilitate the reuse of the right knowledge and monitoring of working processes, which are crucial to automating and enhancing the intelligence of various services, such as power grid management [3]. Using digital tools with KMSs is particularly important in the Fourth Industrial Revolution (4IR) period, as it allows for the sharing of knowledge internally and externally, fostering a culture of learning and adaptation to new technologies [4]. 2.2. Predictive Maintenance Predictive maintenance is a part of each industry’s strategy that enhances equipment performance, reduces failure possibilities, and reduces operational costs through AIdriven analytics. AI-driven predictive maintenance has achieved an 88.5% accuracy rate in fault prediction across various manufacturing environments, significantly reducing unplanned downtime by up to 72% and extending machine lifetime by 25–30%. This integration of advanced analytics with IoT sensor networks marks a shift from reactive to proactive maintenance strategies, enhancing equipment reliability and operational Automation 2025, 6, 91 3 of 25 performance in manufacturing facilities [4]. Predictive maintenance tools are essential for collaborative IoRT technologies and autonomous mobile robots in cloud computing infrastructures, aiding in task allocation and collision avoidance and enhancing preventive maintenance strategies [5] Implementing predictive maintenance enables early fault prevention through machine data analysis, improves system reliability, and ultimately reduces operational costs. Machine learning algorithms are employed for fault classification, and IoT data is utilized for effective predictive maintenance strategies [6]. By monitoring anomalies in machine symptoms like vibration and temperature, predictive maintenance systems can prevent failures and optimize planned maintenance costs while ensuring effectiveness [7]. The integration of big data analytics enhances predictive maintenance capabilities in flexible manufacturing environments, supporting proactive maintenance and quality control, leading to increased efficiency [8]. Traditional and classical predictive maintenance approaches are being replaced by predictive maintenance strategies that utilize AI and IoT technologies. Predictive maintenance forecasts equipment failures, allowing timely interventions to optimize maintenance costs and extend machinery lifespan. This shift from preventive to predictive methods highlights the importance of data-driven approaches in enhancing maintenance efficiency compared to traditional preventive methods [9]. The integration of smart sensors and data analytics allows energy companies to monitor asset performance, supporting preventive maintenance efforts and emphasizing the role of IIoT in various sectors, including maintenance tasks and applications [10]. 2.3. Industry 4.0 Environment 2.3.1. The Impact of IoT on Predictive Maintenance Industry 4.0 has impacted predictive maintenance within the manufacturing sector, largely due to the integration of Internet of Things (IoT) technologies. As discussed by [11], the ongoing industrial revolution has made extensive monitoring of complex industrial systems possible at affordable costs, aligning with the core principles of Industry 4.0. This has given rise to the demand for new maintenance solutions that leverage IoT data to drive efficiency and predictive maintenance approaches, which are necessary for modern manufacturing use cases. Proactive maintenance approaches are feasible as a result of being able to monitor and analyze machine data in real time, reducing downtime and enhancing the overall reliability of industrial systems. 2.3.2. Integration of Advanced Technologies in Maintenance Activities Industry 4.0 uses IoT devices to enhance data collection and operational efficiency in maintenance activities, as discussed by [7], which add challenges and opportunities for predictive maintenance for improving safety, quality, and productivity in manufacturing. The approach enables autonomous analysis of machine symptoms, reducing reliance on human interpretation and enhancing the accuracy of predictive maintenance. Artificial intelligence plays a crucial role in optimizing equipment management and reducing downtime, thus emphasizing the importance of integrating advanced technologies for effective maintenance strategies in smart manufacturing. Overall, Industry 4.0 facilitates proactive maintenance, maximizing equipment lifespan and minimizing operational costs. Automation 2025, 6, 91 4 of 25 2.3.3. Digital Transformation and Decision-Making in Industry 4.0 Industry 4.0 represents a digital transformation in industrial processes, emphasizing monitoring and control devices for equipment efficiency, as noted by [12]. The transition to Industry 4.0 aims to reduce operational costs through improved information exchange between systems and devices. Machine learning techniques are the core concept of Industry 4.0 for fault detection and IIoT device data diagnosis, enabling the development of high-accuracy models that detect and classify faults. Also, ref. [13] recognize the emphasis on data-driven decision-making in Industry 4.0, which increases operational efficiency through smart manufacturing systems and real-time analytics. The advent of Industry 4.0 renders conventional decision-making approaches outdated through the real-time flood of data, necessitating quick decision-making practices grounded in machine learning and systematic methods like AHP to be effectively embraced. 2.4. Best–Worst Method (BWM) Decision-Making Technique The Best–Worst Method (BWM) is a multi-criteria decision-making (MCDM) technique that is used for comparison tasks. Ref. [14] discussed BWM in five steps, starting with the definition of evaluation criteria. Evaluators identify the best and worst criteria and express preferences using a scale like the scale of 1 to 9. This process results in a Bestto-Others vector and an Others-to-Worst vector, which represent evaluators preferences, and demonstrates a practical application of BWM in supplier selection, showcasing its potential to enhance decision-making satisfaction by providing a clear and structured framework for evaluating multiple criteria. Applications of BWM outside of maintenance—such as innovation management in aerospace industries—demonstrate its reliability in prioritizing complex, interdependent criteria, reinforcing its suitability for adaptive decision support systems. The Best–Worst Method (BWM) represents a powerful tool for multi-criteria decision making and defining criteria weight coefficients [15]. This method is integrated with other MCDM approaches, reflected in applications across various fields, including logistics and manufacturing. As [16] discuss the fuzzy version of BWM, which utilizes triangular fuzzy numbers for weight determination and its implementation within a holistic Failure Mode and Effects Analysis (FMEA) framework to address traditional limitations. This adaptation underscores BWM’s flexibility and effectiveness in diverse decision-making contexts. IIoT systems in industrial settings generate high-frequency vibration data from rotating machinery (e.g., bearings), requiring real-time decision-making to detect faults, assess risk, and schedule maintenance proactively. Traditional fault detection models (e.g., classification) often output binary or categorical labels (e.g., “healthy” vs. “faulty”), but real-world maintenance demands quantitative prioritization: which faults are most critical, and how do they evolve over time? BWM aligns with IIoT needs by translating feature importance (from ML models) into actionable weights, enabling the computation of priority scores for each fault sample. These scores quantify severity, allowing systems to rank faults in real time and guide resource allocation (e.g., immediate inspection for high-priority faults). Unlike static ranking methods, BWM’s weight calculation is flexible, making it suitable for dynamic IIoT environments where feature relevance may shift with operational conditions 2.5. ML-Driven BWM and Real-Time Decision-Making Newly developed technology like machine learning has enhanced real-time decisionmaking processes across industries. The research article of [11] presented how to utilize IoT sensor data for predictive maintenance in machinery systems, which can empower decision-makers by providing reliable and real-time fault classification results through clustering-based processes, which are adaptable to complex systems and effective even Automation 2025, 6, 91 5 of 25 with limited historical data. Similarly, ref. [17] discussed the role of AI-driven predictive maintenance systems in manufacturing, emphasizing the high sensitivity and specificity of machine learning models in detecting equipment anomalies. These systems integrate with ERP systems to automate work order generation, further optimizing maintenance workflows and enabling proactive strategies through neural network-based anomaly detection. By machine learning and IoT integration in predictive maintenance and real-time decision-making, real-time maintenance decisions can be more accurate as [7] explained how IoT devices facilitate maintenance decisions by collecting data that enables real-time and proactive strategies through machine learning algorithms. In the same direction, ref. [12] presented a methodology for anomaly detection and classification in IIoT, envisioning distributed deployment for fault detection and supporting real-time operational efficiency through AI-enabled hardware and federated learning. Ref. [13] introduce a decision-support system that integrates machine learning with multi-criteria decision-making for realtime performance analysis, emphasizing the dynamic nature of decision-making in Industry 4.0. The importance of real-time analysis of data to enhance decision-making is also brought out by various studies. Ref. [18] emphasize the importance of machine learning in behavior monitoring systems, citing the importance of specifying system parameters for maintaining model consistency over time. Ref. [5] discuss the use of machine learning in autonomous decision-making in IoT-based robot systems, utilizing deep neural networks for perception and decision-making. Ref. [8] discussed the integration of machine learning with IoT and big data analytics in intelligent monitoring systems, optimizing production processes and minimizing downtime. Ref. [19] discuss the creation of hybridaugmented intelligence systems that combine AI and human expertise for improved maintenance outcomes. The research article of [20] emphasizes the use of AI in sustainability accounting, maximizing resource allocation and cost savings through real-time decision-making. Ref. [21] present a method for real-time decision-making for AIoT systems that assists in service management with less human intervention. Lastly, ref. [22] discuss real-time processing of data in operational environments, employing LSTM networks for the forecasting of time series data and integration with cloud computing for efficient processing of data. 2.6. Theoretical KM Models of Knowledge Creation One of the KM theories, Nonaka and Takeuchi’s (1995) [23] SECI (Socialization, Externalization, Combination, Internalization) model of knowledge creation, offers a rich paradigm for examining dynamic interactions between tacit and explicit knowledge, particularly for developing predictive maintenance in Industry 4.0. The SECI model contends that organizational knowledge is created in a spiral of interplay between explicit and tacit knowledge in a spiral cycle continuously translating individual learnings to collective organizational capital [23]. The model serves as a theoretical framework of how human–AI collaboration in predictive maintenance can create continuous knowledge creation. • • Socialization (Tacit to Tacit): It is the sharing of tacit knowledge in an explicit way by using mutual experience, observation, imitation, and practice. In predictive maintenance, it occurs when experienced maintenance engineers pass on their intuitive diagnostic skills and “feel” for machine behavior through on-the-job instruction and group problem-solving seminars [23]. Externalization (Tacit to Explicit): This is a very important step where tacit knowledge is articulated into explicit forms so that it is shareable. For ML-based predictive maintenance, this could be an expert engineer’s qualitative remarks about Automation 2025, 6, 91 6 of 25 • • machine failures being translated into organized data features or ML algorithmic rules. It also involves capturing expert decision-making reasons, which are subsequently utilized to create BWM criteria or interpret ML outcomes [23]. Combination (Explicit to Explicit): This mode type is directed towards integrating and structuring currently available explicit knowledge. For the purpose of this report, it concerns integrating ML model output (e.g., fault detection performance and feature importances) and BWM priority scores, CWRU dataset features, and historical maintenance records. The integration creates new, fuller explicit knowledge, such as optimized maintenance schedules or better fault classification rules [23]. Internalization (Explicit to Tacit): This entails explicit knowledge becoming submerged and being assimilated into an individual’s tacit knowledge through “learning-by-doing”. Maintenance staff, by executing the ML-based BWM recommendations and observing their impacts, internalize such explicit knowledge, refining their own intuitive expertise and developing new tacit skills for preventive maintenance [23]. Organizations require flexible KM frameworks that can adapt to different levels of Industry 4.0 adoption, ensuring that the proposed ML-driven BWM system can evolve and refine its knowledge capture, sharing, and application processes as the organization’s capabilities mature. 2.7. ML-Driven BWM: From Tacit Expertise to Explicit Knowledge The shift from applying subjective expert opinion in the Best–Worst Method (BWM) to applying Machine Learning (ML) to calculate the weights of the criteria is intricately associated with basic Knowledge Management (KM) principles, namely transformation and application of knowledge within an organization. As classically, BWM is based on human experts’ tacit knowledge and subjective assessment to rank maintenance strategies. This tacit knowledge, accumulated after years of working and knowing by intuition, is usually abstract, hard to explain, codify, or transmit. If an organization heavily relies on such personal, uncodified expertise, it is susceptible to risks such as loss of knowledge through personnel turnover or variability in decision-making; the comparisons are presented in Table 1. The ML-enhanced approach evades this KM barrier by externalizing (converting tacit to explicit) and codifying aspects of this expert knowledge. This is how: • • • Explicit Knowledge Creation: The ML model, i.e., Random Forest Classifier, takes enormous amounts of raw sensor data and, through its training, calculates “feature importances”. Feature importances quantify what machine parameters (e.g., standard deviation, root mean square, mean, and kurtosis) are most significant to predict faults. This output is a form of explicit knowledge—it is codified, recorded, and easily transferable. It takes an expert’s intuitive “sense” of a machine’s state and translates it into measurable readings that others can read and apply. Knowledge Repositories and Accessibility: These insights derived using ML, and the data that train them, are stored in knowledge repositories or KMSs. This makes the knowledge readily available for follow-up analysis, tuning of the models, and new staff onboarding. Instead of knowledge being locked inside experts themselves, it is made a collective organizational asset that enhances data-driven decision-making. Enhanced Organizational Learning: Through the systematic quantification of the relative importance of various factors, ML-based BWM provides an impartial and standardized basis for making decisions, reducing reliance on human bias. This results in organizational learning by providing clear, fact-based rationale for setting maintenance priorities. Maintenance personnel can internalize this formalized Automation 2025, 6, 91 7 of 25 • knowledge by observing the model’s performance and iteratively refining their own knowledge about machine operation, creating a self-sustaining spiral of knowledge building (as defined by the SECI model) where AI enhances human knowledge and vice versa. Fewer Dependences on Tacit Knowledge (for routine tasks): ML cannot substitute all forms of tacit knowledge (e.g., an engineer’s subtle problem-solving skill in non-routine scenarios) but definitely reduces the dependence on it for routine task planning. It allows human experts to keep their valuable tacit knowledge for solving more complex, non-routine problems that still require human intuition and problem-solving. In essence, the ML-driven BWM acts as a valuable KM tool in converting the critical tacit knowledge into explicit, actionable knowledge that enhances the organization’s capability to harvest, distribute, and apply knowledge for the more effective and responsive predictive maintenance. Table 1. Characteristics and Management of Explicit vs. Tacit Knowledge in Predictive Maintenance. Knowledge Characteristics Type Explicit Tacit Examples in Predictive Maintenance Management Strategies/Tools KMS, Databases, DocuCodifiable, Easy to Sensor data, ML model outmentation systems, AI/ML transfer, Docu- puts, Maintenance schedules, for capture/analysis, mented, Objective, Technical manuals, Fault Standard operating proceSystematized codes, Performance reports dures (SOPs), Digital twins Personal, Experi- Engineer’s diagnostic intui- Mentoring, Communities ential, Intuitive, tion, Troubleshooting skills, of Practice (CoPs), StoryDifficult to formal- Best practices from experi- telling, Apprenticeship, Soize, Subjective, ence, Understanding machine cialization, Expert referral Context-depend- “feel,” Problem-solving heu- services, After-action reent ristics views 2.8. Broader Applications and Recent Advances in Industry 4.0 The principles of data-driven optimization central to predictive maintenance are mirrored in recent advancements across the broader Industry 4.0 landscape, particularly in logistics and supply chain management. These domains increasingly leverage real-time data and machine learning to enhance efficiency, reduce waste, and improve decisionmaking, moving beyond traditional, static operational models [24–26]. An example is cold chain logistics management, in which the maintenance of specific temperature and humidity levels is critical to prevent product degradation. With the application of machine health monitoring, this field also utilizes IoT sensors for real-time continuous monitoring of environmental conditions within transport containers. This data stream allows for immediate detection of breakage or temperature deviation, enabling proactive intervention to avoid waste. This is a philosophy shared by many Industry 4.0 firms: using sensor data to predict and prevent “failures,” either of a machine component or an at-risk product. A second illustration of this tendency is Real-Time Location Systems (RTLSs), which leverage technologies like Bluetooth Low Energy (BLE) to gauge the precise location of assets, vehicles, and materials within a plant. By delivering a real-time digital representation of physical operations, RTLS enables the optimization of internal logistics, the reduction in equipment search times, and the real-time detection of process disruptions. This application of IoT technology to create a “digital twin” of the area of operation for data-informed decision-making parallels the Automation 2025, 6, 91 8 of 25 manner in which that PdM uses sensor data to create a digital health record of an asset [3]. Furthermore, machine learning is being applied directly to solve complex logistical problems, such as the Product Allocation Problem (PAP) in warehouses. Research has found that the use of Artificial Neural Networks (ANNs) and clustering algorithms for the analysis of picking data can lead to more efficient product placement strategies. By considering several factors simultaneously—a challenge for traditional approaches—these ML models can optimize warehouse layouts in a manner that significantly minimizes picking routes and order fulfillment times, with studies demonstrating a 10% reduction in picking time in certain instances. These examples of logistics and supply chain management illustrate the state of the art, in which the intersection of real-time data and machine learning is a common thread underlying efficiency and transforming operations in several industrial sectors. 3. Implementation Methodology The findings derived were practical, from the use of the method on an actual dataset. 3.1. The Methods The practical method starts with the predictive model for identifying various types of faults in bearings using machine learning techniques. Using Random Forest as a Classifier model, the approach attempts to categorize different fault states, such as ball faults, inner race faults, and outer race faults, from pre-processed time series data. Training and testing the model are carried out on a dataset that features attributes extracted from vibration signals in order to achieve high accuracy and robust performance in fault detection for predictive maintenance purposes and ensure mechanical system reliability. The flow diagram in Figure 1 outlines a sophisticated workflow that transforms raw sensor data into a prioritized maintenance schedule using machine learning. The process begins by preparing historical data and training a Random Forest model to accurately classify different types of equipment faults. The key innovation is in repurposing this model beyond simple prediction; the system extracts the “feature importances”, which reveal which sensor readings are most critical for identifying a fault. These importances are then used as objective, data-driven weights to calculate a “priority score” for each component, effectively quantifying its operational risk based on its live data. By aggregating and ranking these priority scores, the workflow provides an actionable list of which components are in most urgent need of attention. This shifts the maintenance strategy from being reactive or following a rigid schedule to being predictive and risk-based. The primary business value lies in optimizing the allocation of limited resources—time, budget, and personnel—by focusing efforts on the most critical issues first. This data-driven approach helps to minimize costly operational downtime, prevent catastrophic failures, and ultimately create a more efficient and safer operational environment. The core principles are as follows: • • • • Identify Criteria: Define the features (e.g., vibration metrics like RMS, and kurtosis) that influence the decision (e.g., fault severity ranking). Select Best and Worst Criteria: Identify the most (best) and least (worst) impactful criteria based on domain knowledge or data-driven insights. Pairwise Comparisons: For each criterion, compare its importance to the best and worst criteria using a ratio scale (e.g., 1 = equally important, 9 = extremely more important). Calculate Weights: Derive normalized weights for each criterion using these comparisons, ensuring consistency and alignment with real-world priorities. Automation 2025, 6, 91 9 of 25 • In our work, BWM is adapted to quantify fault severity by integrating ML-derived feature importances as weights, bypassing manual pairwise comparisons and enabling data-driven prioritization. Figure 1. Flow chart of a data-driven framework for fault prioritization. 3.2. CWRU Dataset The CWRU bearing dataset from [2] is a prominent benchmark dataset in the field of mechanical fault diagnosis. It consists of vibration signal data collected from bearings under various operating conditions, including different levels of fault severity and fault types, such as inner race, outer race, and ball defects. The dataset is used widely in the training and testing process of different machine learning models to predict maintenance and detect faults because it provides a full set of real-world data that simulates common bearing failure scenarios. The data is often pre-processed in order to receive meaningful features, which are then utilized for fault classification model training. The dataset contains 2300 instances with 10 features. The type of data used is statistical values like max, min, mean, sd, rms, skewness, kurtosis, crest, form, and the fault target variable, which is the fault type. Data was captured using high-sensitivity sensors mounted on the motor housing at the drive end (DE) and non-drive end (NDE), sampled at 12,000 Hz. The test system included a 2-horsepower motor coupled to a load via a torque sensor, controlled by a variable-frequency drive to generate signals at three rotational speeds: 1600 RPM, 1750 RPM, and 1900 RPM. Faults were induced in Model 6205 deep groove ball bearings, including inner race, outer race, roller, and combined defects, with diameters ranging from 0.007 inches to 0.028 inches to simulate progressively severe damage. Stored in MATLAB (.mat) files with names detailing fault specifics (e.g., “I07_1600.mat” denotes an inner race fault of 0.007-inch diameter at 1600 RPM), the dataset includes time-domain signals and metadata (fault type, size, and speed). It is widely used to validate algorithms for anomaly detection, fault type/severity classification, and prognostic modeling. Available via the university’s official Bearing Data Center and some academic platforms, the dataset remains critical but has limitations: it reflects controlled lab conditions, potentially missing real-world variabilities like temperature or load changes, and focuses on specific fault scenarios rather than all possible bearing failures. 3.2.1. Metadata Features (Inherent to the Dataset) These are pre-defined attributes included in the dataset files to describe experimental conditions: Automation 2025, 6, 91 10 of 25 • • • • • Fault Type: Explicit labels for inner race, outer race, roller, or combined (e.g., inner race + roller “IR_B07_1600”) faults. Fault Size: Diameter of artificially induced faults: 0.007”, 0.014”, 0.021”, or 0.028” (progressively severe). Rotational Speed (RPM): Motor speeds at which data was collected: 1600 RPM, 1750 RPM, or 1900 RPM. Sensor Location: Bearing vibration measured at the drive end (DE) or non-drive end (NDE) of the motor. Time-Domain Signal Features (Extracted from Raw Data). 3.2.2. Statistical and Waveform Characteristics Derived Directly from Time-Series Vibration Signals • • • • • • • • • Root Mean Square (RMS): Measures signal energy; often correlates with fault severity. Mean Value: Average amplitude of the signal. Variance: Spread of signal amplitudes around the mean. Peak Value: Maximum amplitude in the time series. Peak-to-Peak Value: Difference between maximum and minimum amplitudes. Kurtosis: Indicates the presence of impulsive signals (high kurtosis suggests faults). Skewness: Measures asymmetry in the signal distribution. Crest Factor: Ratio of peak value to RMS; highlights transient impulses. Zero-Crossing Rate: Number of times the signal crosses zero, reflecting frequency content. 3.2.3. Frequency-Domain & Time–Frequency Features (Extracted via Signal Processing) • • • • • • Transformations (e.g., FFT and wavelet) and analysis of the frequency content to identify fault-related patterns. FFT Magnitude Spectrum: Converts time-domain signals to the frequency domain, revealing dominant fault-related frequencies (e.g., inner race, outer race, and roller pass frequencies). Fault-Specific Frequencies: Calculated using bearing geometry. Spectral Kurtosis: Identifies non-stationary impulsive components in frequency bands (useful for fault localization). Envelope Spectrum: Extracted via Hilbert transform to isolate high-frequency impacts (enhances fault signature detection). Wavelet Transform Coefficients: Time–frequency analysis capturing transient faults across scales. 3.3. Comparative Model Evaluation The first phase involved a comprehensive performance benchmark of the six selected ML and DL models. Each model was trained on the scaled training data and evaluated on the unseen scaled test data. The key hyperparameters for each model were set based on common practices to ensure a fair comparison: • • • • • • Random Forest: n_estimators = 300, max_features = ‘sqrt’. SVM: Radial basis function (RBF) kernel with default gamma = ‘scale’. Gradient Boosting: n_estimators = 300. XGBoost: n_estimators = 300, eval_metric = ‘mlogloss’. CNN: A 1D-CNN architecture with two convolutional layers followed by dense layers, trained for 50 epochs. LSTM: A network with two LSTM layers followed by dense layers, trained for 50 epochs. Automation 2025, 6, 91 11 of 25 Model performance was assessed using a suite of standard classification metrics: Accuracy, Precision (macro-averaged), Recall (macro-averaged), and F1-Score (macro-averaged). In addition to predictive performance, computational efficiency was evaluated by measuring both the total Training Time and the average per-sample Inference Time, which are critical considerations for deployment in real-time systems. While the pursuit of maximum accuracy, as demonstrated by [27], remains a vital research direction, the practical deployment of predictive maintenance systems in dynamic, real-world industrial environments necessitates a broader set of evaluation criteria. The present study builds upon the foundational use of RF on the CWRU dataset but proposes a paradigm shift in the optimization focus—from achieving the highest possible static accuracy to ensuring long-term operational viability. For an adaptive system designed to handle the inevitable challenge of concept drift through periodic retraining, the definition of an “optimal” model is fundamentally different. 3.4. Phase 2: Statistical Validation To ensure the reliability of the performance metrics and move beyond the potential bias of a single train–test split, the top-performing traditional ML models (Random Forest, SVM, and Gradient Boosting) were subjected to a more rigorous statistical validation process. • • • 10-Fold Stratified Cross-Validation: The training dataset was partitioned into 10 folds. The models were trained 10 times, each time using 9 folds for training and the remaining fold for validation. This process yields 10 independent accuracy scores, providing a more robust estimate of the model’s generalization performance. The mean accuracy and standard deviation across the 10 folds were calculated. Confidence Intervals: A 95% confidence interval for the mean accuracy was computed from the cross-validation scores. This interval provides a probable range for the model’s true performance on unseen data. Paired t-tests: To determine if the performance differences between the top models were statistically significant or merely due to random chance, paired t-tests were conducted on their cross-validation scores. A p-value below a significance level of (alpha = 0.05) indicates a statistically significant difference in performance. 3.5. Phase 3: Dynamic Environment Simulation To assess the framework’s suitability for deployment in dynamic industrial settings, two simulations were conducted. • • Real-Time Streaming: Real-time data stream was simulated by processing the test set in batches of 50 samples consecutively. The model’s processing latency and total accuracy for each batch were recorded. The test evaluates the model’s ability to maintain capability under an unbroken flow of data and if its inference rate meets the requirements of real-time applications. Concept Drift: A concept drift scenario was simulated by introducing Gaussian noise to the feature values of the test data after a certain number of batches. The performance of two models was compared: a static model trained only once on the initial training data and an adaptive model that was incrementally retrained on new batches of data. This simulation tests the framework’s resilience to changes in the underlying data distribution, a common challenge in long-term industrial deployments [28]. Automation 2025, 6, 91 12 of 25 3.6. ML-Enhanced BWM Prioritization The final phase integrates the validated ML model into the BWM framework. The feature importances—a measure of how much each input feature contributes to the model’s predictions—are extracted from the best-performing fully validated model. These objective, data-derived importance scores are then used directly as the criteria weights in the BWM. This replaces the subjective expert opinions of the traditional BWM with an empirical, data-driven foundation. The framework then calculates a priority score for each fault instance and aggregates these scores to produce a ranked list of bearings, identifying which equipment requires the most urgent maintenance attention. 4. Results 4.1. In Predictive Accuracy Process The clear performance hierarchy of the six predictive models evaluated on the holdout test set is given below. As shown in Table 2, the gradient boosting-based algorithms achieved the top accuracy, with a score of 95.22% both for Gradient Boosting and for XGBoost. These thus outperformed the Random Forest baseline of 94.78% identified in the preliminary study. Deep learning models gave competitive performances, though slightly lower: CNN stood at 94.35%, while LSTM reached 92.75%; the last among the approaches was the SVM model, with its score of 91.88%. Table 2. Model Comparison Summary. Rank Model 1 2 3 4 5 6 Gradient Boosting XGBoost Random Forest CNN LSTM SVM Accuracy Precision Recall F1-Score Training Inference ParameTime (s) Time (ms) ters 0.952174 0.954804 0.952174 0.952386 30.609346 0.053608 900 0.952174 0.955118 0.952174 0.952420 1.577739 0.015943 300 0.947826 0.951354 0.947826 0.948112 1.507195 0.267425 35,358 0.943478 0.949047 0.943478 0.944134 0.927536 0.928836 0.927536 0.927571 0.918841 0.925225 0.918841 0.917529 19.634354 38.539642 0.384532 0.600775 1.884990 0.145292 42,762 32,074 658 While both Gradient Boosting and XGBoost produced identical accuracy values, their training efficiency varied widely. XGBoost completed training in only 1.58 s, compared with 1.51 s for Random Forest, while standard Gradient Boosting required a whopping 30.61 s, almost 20 times longer. The contrast makes clear the computational optimizations inside the XGBoost framework. In terms of the inference performance, all models had extremely low prediction latency and hence are suitable for real-time applications. XGBoost was the fastest, at an average of 0.016 milliseconds per sample. Table 2 summarizes detailed performance and efficiency metrics: Accuracy, Precision, Recall, F1-Score, Training Time, Inference Time per sample, and the number of model parameters. These metrics are also shown in Figure 2. Automation 2025, 6, 91 13 of 25 Figure 2. Model comparison results. 4.2. Statistical Validation Confirms Robustness of Ensemble Methods While Gradient Boosting excelled on the single test set, the 10-fold cross-validation provided a more nuanced and robust picture of model performance. As detailed in Table 3, Random Forest exhibited the highest mean accuracy during cross-validation (96.96%), slightly surpassing Gradient Boosting (96.27%). This suggests that while Gradient Boosting may have performed better on one specific data partition, Random Forest demonstrates greater stability and consistency across multiple different partitions of the data. The low standard deviation for both models (0.0163 for RF, 0.0155 for GB) indicates that their performance is highly reliable. The 95% confidence intervals provide a tight range for their expected performance, further bolstering confidence in their capabilities. Table 3 and Figure 3 present the results of the 10-fold stratified cross-validation for the top three traditional machine learning models, showing their mean accuracy, standard deviation, and the 95% confidence interval for the mean. Table 3. Cross-Validation Results with Confidence Intervals. Model Random Forest SVM Gradient Boosting Mean Accuracy 0.9696 0.9385 0.9627 Std Dev 0.0163 0.0172 0.0155 CI Lower CI Upper (95%) (95%) 0.9573 0.9819 0.9255 0.9515 0.9511 0.9744 CV Range [0.9573, 0.9819] [0.9255, 0.9515] [0.9511, 0.9744] Automation 2025, 6, 91 14 of 25 Figure 3. Cross-validation results with confidence intervals. The paired t-tests, summarized in Table 4, provided the final piece of statistical evidence. The tests confirmed that both Random Forest and Gradient Boosting are statistically superior to SVM (p-values of 0.0001 and 0.0006, respectively). Crucially, the comparison between Random Forest and Gradient Boosting yielded a p-value of 0.084. As this value is greater than the significance threshold of 0.05, it indicates that there is no statistically significant difference between the performance of these two models. This finding is highly consequential: given their statistically equivalent accuracy, the choice between them can be made based on other factors, such as computational efficiency. Table 4 shows the results of the paired t-tests on the cross-validation scores between the models. A pvalue less than 0.05 indicates a statistically significant performance difference. Table 4. Pairwise Statistical Significance Tests. Comparison t-Statistic Random Forest vs. SVM Random Forest vs. Gradient Boosting SVM vs. Gradient Boosting 6.577935 1.941176 −5.185934 Signifip-Value cant (α = Interpretation 0.05) 0.000102 Yes Significantly Different Not Significantly Differ0.084150 No ent 0.000575 Yes Significantly Different Based on the combined evidence from the cross-validation (higher mean accuracy) and the t-test (no significant difference from GB), coupled with its 20× faster training time, Random Forest was selected as the optimal model for the subsequent phases of the analysis. This decision reflects a mature engineering trade-off, prioritizing stability and efficiency—critical for an adaptive system—once a high level of accuracy has been statistically confirmed. 4.3. Granular Error Analysis and Misclassification Patterns While the cost analysis quantifies the financial impact of errors, a more granular, perclass performance analysis reveals the specific misclassification patterns that drive these costs. As detailed in Table 5 and Figure 4, the model exhibits varied performance across different fault types. Table 5 provides a detailed breakdown of performance metrics for each fault class, including Sensitivity (Recall), Specificity, Precision, F1-Score, False Positive Rate (FPR), False Negative Rate (FNR), and Matthews Correlation Coefficient (MCC). Automation 2025, 6, 91 15 of 25 Table 5. Comprehensive Per-Class Performance Metrics for Random Forest Model. SensitivFault Type ity (Recall) Ball_007_1 0.8841 Ball_014_1 0.8551 Ball_021_1 0.8261 IR_007_1 1.0000 IR_014_1 1.0000 IR_021_1 1.0000 Normal_1 1.0000 OR_007_6_1 1.0000 OR_014_6_1 0.9130 OR_021_6_1 1.0000 Specificity Precision F1-Score FPR (%) FNR (%) MCC 0.9984 0.9968 0.9887 1.0000 1.0000 0.9936 1.0000 1.0000 0.9694 0.9952 0.9839 0.9672 0.8906 1.0000 1.0000 0.9452 1.0000 1.0000 0.7683 0.9583 0.9313 0.9077 0.8571 1.0000 1.0000 0.9718 1.0000 1.0000 0.8344 0.9787 0.16 0.32 1.13 0.00 0.00 0.64 0.00 0.00 3.06 0.48 11.59 14.49 17.39 0.00 0.00 0.00 0.00 0.00 8.70 0.00 0.9257 0.9002 0.8427 1.0000 1.0000 0.9691 1.0000 1.0000 0.8181 0.9766 Figure 4. Error analysis by fault type. The model demonstrates exceptional strength in identifying several fault types, achieving perfect F1-scores (1.0000) for IR_007_1, IR_014_1, Normal_1, and OR_007_6_1, indicating no misclassifications for these conditions. However, the analysis also pinpoints specific weaknesses: High False Negative Rates in Ball Faults: The three rolling element fault classes (Ball_007_1, Ball_014_1, and Ball_021_1) exhibit the lowest sensitivity (recall), with scores ranging from 0.8261 to 0.8841. This is a direct result of high False Negative Rates (FNRs), which peak at a concerning 17.39% for Ball_021_1. This means the model fails to detect this severe fault in nearly one out of every five instances. High False Positive Rate in Outer Race Fault: The OR_014_6_1 class suffers from the lowest precision at 0.7683, caused by a high number of False Positives (19) and the highest False Positive Rate (FPR) of 3.06%. This makes it the primary source of false alarms in the system. A deeper analysis of the misclassification patterns reveals a systemic confusion between these two groups. The vast majority of false negatives from the ball fault classes are incorrectly predicted as OR_014_6_1. For instance, 100% of the missed Ball_007_1 faults (8 instances) were mislabeled as OR_014_6_1. Conversely, the 19 false positives for OR_014_6_1 were almost entirely composed of misclassified ball faults. This indicates that the model struggles to distinguish the vibration signatures of rolling element faults from this specific outer race fault, leading to both missed detections and false alarms. This insight is critical for maintenance teams, as it suggests that an alert for OR_014_6_1 should prompt a closer inspection of the rolling elements, and vice versa, to mitigate the highcost risks identified previously. Automation 2025, 6, 91 16 of 25 4.4. Adaptive Framework Maintains High Accuracy Under Concept Drift The simulations of a dynamic operational environment confirmed the framework’s real-time viability and, crucially, the necessity of an adaptive learning approach. The streaming simulation processed the test set in 13 batches of 50 samples each, achieving a final cumulative accuracy of 94.77% with a low mean processing latency of 50.99 ms per batch. This demonstrates that the model is more than capable of handling high-frequency data streams in real time. The concept drift simulation provided the most compelling evidence for adaptability. As shown in Table 6, before the introduction of data drift (batches 1–10), both the static and adaptive models performed identically, with an average accuracy of 94.00%. However, after noise was introduced to simulate a change in operating conditions (batches 11+), the performance of the static model degraded to 92.67%. In contrast, the adaptive model, which was incrementally retrained on the new, noisy data, was able to maintain its performance at 92.67%. While in this specific simulation the adaptive model did not improve upon the static one after drift, it demonstrated perfect resilience by not degrading further, mitigating the performance loss entirely. This result underscores a critical point: in real-world industrial settings where data distributions inevitably change over time, a static, “train-once” model is destined for obsolescence. An adaptive framework capable of incremental learning is essential for maintaining long-term predictive accuracy and reliability. Table 6 compares the performance of a static model versus an adaptive model before and after the introduction of simulated concept drift, quantifying the performance degradation. Table 6. Concept Drift Handling Analysis. Model Type Static Model (No Adaptation) Adaptive Model (Incremental Learning) Avg AccuAvg Accuracy Before racy After Drift (Batch Drift (Batch 1–10) 11+) 0.9400 0.9267 0.9400 0.9267 Perfor- Drift ReRemarks mance silience Degradation (%) 1.42% Poor Because it is brittle and cannot adapt to changes in the data. It was trained only once on the initial, clean data. When the data characteristics changed (simulated by adding noise), its performance dropped 1.42% Excellent Because it avoided the initial performance drop, but because it successfully adapted to the new conditions. Through incremental learning, the model was retrained on the new, “noisy” data. Drift Resilience (Poor vs. Excellent): This metric evaluates how the model responds to the degradation over the long term. The Static Model is rated “Poor” because its performance dropped and stayed degraded. If the data were to change even more, its accuracy would likely fall further. It is brittle and cannot cope with a changing environment. The Adaptive Model is rated “Excellent” not because it avoided the initial accuracy drop, but because it successfully adapted to the new conditions. By retraining on the noisy data, it learned the “new normal” and was able to maintain a stable accuracy of 92.67% in the more difficult environment. Its resilience lies in its ability to stop performance from degrading further and to continue performing reliably on the new data distribution. 4.5. ML-Enhanced BWM for Priority Scoring As per the proposal features, importance is used as the weight for the Best–Worst Method (BWM) to calculate the priority score for each record, which is required for the ranking process in the testing set. The obtained priority score is considered as the weighted sum of the feature values, providing a quantitative measure of the severity or Automation 2025, 6, 91 17 of 25 importance of each fault. This score is then used to rank the faults within each bearing piece, helping to identify which faults require the most urgent attention. The classification model provides feature importances, which indicate the contribution of each feature to the model’s predictions. These importances are normalized such that they sum to 1. 𝐹𝑒𝑎𝑡𝑢𝑟𝑒 𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒 = [𝑖𝑚𝑝1, 𝑖𝑚𝑝2, … … , 𝑖𝑚𝑝𝑛, ] (1) BWM Weights: The feature importances are directly used as weights in BWM. These weights are applied to the feature values to compute a priority score for each record. 𝑖𝑚𝑝𝑖 𝐵𝑊𝑀 𝑊𝑒𝑖𝑔ℎ𝑡𝑖 = ∑𝑛 (2) 𝑗=1 𝑖𝑚𝑝𝑗 where 𝐵𝑊𝑀 𝑊𝑒𝑖𝑔ℎ𝑡𝑖 : This represents the weight of the i-th criterion; 𝑖𝑚𝑝𝑖 : the importance score of the i-th criterion; ∑𝑛𝑗=1 𝑖𝑚𝑝𝑗 : the sum of the importance scores of all criteria from j = 1 to j = n, where n is the total number of criteria. Since the feature importances already sum to 1, the BWM weights are essentially the same as the feature importances. Priority Score Calculation: The priority score for each record is calculated as the weighted sum of the feature values. 𝑃𝑟𝑖𝑜𝑟𝑖𝑡𝑦 𝑠𝑐𝑜𝑟𝑒 = ∑𝑛𝑖=1(𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑣𝑎𝑙𝑢𝑒𝑖 × 𝐵𝑊𝑀 𝑤𝑖𝑔ℎ𝑡𝑖 ) (3) The ML-enhanced Best–Worst Method (BWM) framework, defined by Equations (1)– (3), was validated for both Random Forest (RF) and Gradient Boosting (GB) models. For any ML model, feature importances are extracted and normalized to sum to 1.0 (Equation (1)), directly serving as BWM weights (Equation (2)). Priority scores (Equation (3)) are computed as weighted sums of feature values, enabling quantitative fault severity assessment. Both models adhered to this framework, ensuring consistent weight derivation and score calculation. 4.6. Gradient Boosting (GB)-BWM Implementation 4.6.1. Feature Importance Extraction Using the GB model, 9 features were evaluated, with normalized importances summing to 1.0. Unlike RF, which prioritized energy-related metrics (sd and RMS), GB emphasized kurtosis (24.32%) and mean (23.38%) as the most critical features, reflecting sensitivity to non-linear, impulsive vibration patterns. Minor contributors included min (2.68%), max (1.36%), skewness (1.32%), and crest (1.21%), as presented in Table 7 and Figure 5. 4.6.2. Priority Score Calculation GB-BWM priority scores for the test set (690 samples) exhibited a broader range and higher variance compared to RF-BWM. Scores spanned 0.43 to 35.71, with a mean of 3.77, median of 1.87, and standard deviation of 3.89. This indicates enhanced sensitivity to fault severity. Table 7. Top Features (GB)—Feature Importance Extraction. Rank 1 2 3 Feature kurtosis mean rms GB_Importance 0.24319 0.233794 0.18707 GB_Percentage (%) 24.318996 23.379357 18.707025 Automation 2025, 6, 91 18 of 25 4 5 6 7 8 9 sd form min max skewness crest 0.159986 0.110324 0.026796 0.013556 0.013188 0.012097 15.998584 11.032353 2.679634 1.355618 1.318771 1.209663 Figure 5. Top 20 feature importances. 4.6.3. Fault Ranking Within Bearings Bearing risk assessment using GB-BWM aggregated scores maintained the same ranking order as RF-BWM but with elevated total scores. Bearing 7 retained the highest risk (GB total score: 42.67), followed by Bearing 8 (40.36), Bearing 4 (38.36), and so on, up to Bearing 5 (10th rank, 34.79), as illustrated in Table 8 and Figure 6. Table 8. Table 4.2: Bearing Risk Assessment (GB-BWM-Based). Bearing_ID 7 8 4 6 10 3 2 1 9 5 GB_Total BWM_Score 42.672972 40.357009 38.360003 38.11094 37.807037 37.684326 37.475734 36.01023 34.858651 34.786336 Total Samples 74 66 63 70 65 69 63 85 66 69 GB_BWM Bearing_Rank 1 2 3 4 5 6 7 8 9 10 Automation 2025, 6, 91 19 of 25 Figure 6. Priority sore distribution. 4.7. Comprehensive Comparison: RF-BWM vs. GB-BWM We selected two of the best-performing models and the second place in accuracy results for comparison. 4.7.1. Feature Importance Correlation Feature importances from RF and GB models showed a significant positive correlation (Pearson’s r = 0.7331, p < 0.05; Spearman’s ρ = 0.7500, p < 0.05). While GB prioritized kurtosis and mean, RF emphasized sd and rms. These differences highlight model-specific sensitivities but confirm overlapping insights, as illustrated in Table 9. Table 9. Feature Importance Comparison (RF vs. GB). Feature mean rms kurtosis sd form min max crest skewness RF_Importance 0.164986 0.195682 0.116552 0.199533 0.070369 0.088606 0.090695 0.058993 0.014585 GB_Importance 0.233794 0.18707 0.24319 0.159986 0.110324 0.026796 0.013556 0.012097 0.013188 Importance_Diff 0.068808 0.008612 0.126638 0.039547 0.039955 0.061809 0.077139 0.046897 0.001397 4.7.2. Bearing Ranking Consistency RF-BWM and GB-BWM produced identically ordered bearing rankings (Spearman’s ρ = 1.00, Kendall’s τ = 1.00, p < 0.05), with 100% rank agreement and zero average difference. Notably, GB-BWM scores were universally higher (e.g., Bearing 7’s score increased from 29.10 to 42.67), reflecting enhanced severity detection, as illustrated in Table 10. Table 10. Bearing Ranking Comparison (RF-BWM vs. GB-BWM). Bearing ID 7 8 4 6 10 RF_BWM Rank 1 2 3 4 5 GB_BWM Rank 1 2 3 4 5 RF_BWM Score 29.100207 27.595598 26.36109 26.291297 26.056776 GB_BWM Score 42.672972 40.357009 38.360003 38.11094 37.807037 Rank Difference 0 0 0 0 0 Automation 2025, 6, 91 20 of 25 3 2 1 9 5 6 7 8 9 10 6 7 8 9 10 25.857331 25.765898 24.856029 24.090734 24.075875 37.684326 37.475734 36.01023 34.858651 34.786336 0 0 0 0 0 4.7.3. Fault-Level Ranking Agreement At the fault level, RF-BWM and GB-BWM rankings agreed for 90% of fault bearing combinations, with an average rank difference of 0.10. Disagreements (Δ rank = 1) were limited to non-critical fault types (e.g., Ball_014_1 and IR_007_1) in Bearings 2, 4, and 10, indicating minimal impact on maintenance prioritization, as illustrated in Table 11. Table 11. Top 10 Largest Fault-Level Ranking Disagreements (RF-BWM vs. GB-BWM). Bearing ID 2 2 2 2 4 4 10 10 10 10 Fault Type Ball_014_1 Ball_021_1 IR_021_1 OR_021_6_1 Ball_021_1 IR_007_1 Ball_014_1 IR_007_1 IR_014_1 OR_014_6_1 RF_BWM Rank 6 5 2 3 5 6 4 5 7 8 GB_BWM Rank 5 6 3 2 6 5 5 4 8 7 Rank Difference 1 1 1 1 1 1 1 1 1 1 5. Discussion The results of this comprehensive study provide significant insights into the development and deployment of effective predictive maintenance systems. The discussion moves beyond the interpretation of individual results to synthesize the findings and explore their broader implications for industrial practice and academic research. 5.1. From a Model to a Methodology: The Importance of Comprehensive Validation A primary finding of this research is that selecting the “best” model is a nuanced process that extends far beyond a single accuracy score. While Gradient Boosting achieved the highest accuracy on the initial holdout test set, the more rigorous 10-fold cross-validation revealed that Random Forest had a slightly higher mean accuracy and, therefore, greater stability. The subsequent t-test confirmed that their performance was statistically indistinguishable. This confluence of results leads to a more sophisticated conclusion: for a real-world dynamic system that requires periodic retraining to combat concept drift, the model with the 20-fold-faster training time (Random Forest) is the superior engineering choice. This highlights that the principal contribution of this paper is not merely a model, but a validation methodology. The multi-phase funnel—moving from a broad benchmark to rigorous statistical testing, operational cost analysis, and finally dynamic simulation— provides a holistic and trustworthy assessment of a model’s suitability for industrial deployment. This approach prevents “overfitting” to a single metric and instead promotes a balanced decision-making process that considers accuracy, stability, efficiency, and adaptability. It serves as a blueprint for practitioners seeking to move beyond academic proofs of concept to build genuinely reliable and effective PdM systems. Automation 2025, 6, 91 21 of 25 The validation of BWM with both RF and GB models underscores the framework’s adaptability and reliability across machine learning paradigms. While GB and RF assigned distinct feature weights—GB prioritizing impulsiveness (kurtosis) and central tendency (mean) and RF emphasizing variability (sd) and energy (RMS)—their aggregated bearing rankings remained fully consistent. This alignment indicates that total BWM scores effectively capture cumulative fault severity, serving as a stable metric for risk assessment regardless of the underlying ML model. At the fault level, near-perfect agreement (90%) between RF-BWM and GB-BWM rankings reinforces the framework’s robustness. Minor rank swaps (Δ = 1) are negligible for practical maintenance decisions, suggesting that BWM’s logic is resilient to modelspecific feature nuances. These findings imply that BWM can be integrated with diverse ML pipelines, expanding its utility for real-world condition monitoring. Notably, GB-BWM’s higher priority scores (e.g., Bearing 7’s score increased from 29.10 to 42.67) reflect its enhanced sensitivity to complex fault patterns, potentially improving early detection of subtle defects. However, the correlation between RF and GB feature importances (p < 0.05) suggests overlapping insights, reducing the risk of critical omissions in fault prioritization. 5.2. The Imperative of Adaptability in Real-World Systems The concept drift simulation provides empirical validation for a reality well-understood in industrial settings: operating conditions are not static. Over time, machines wear, production schedules change, raw materials vary, and environmental factors fluctuate. Each of these changes can alter the underlying statistical properties of the sensor data, causing the performance of a static, pre-trained ML model to degrade. The results clearly showed that an adaptive model capable of incremental learning is essential to maintain high accuracy in such a dynamic environment. This finding positions adaptability not as an optional feature but as a fundamental requirement for the long-term success of any industrial PdM implementation. It challenges the common practice of deploying static models and underscores the need for MLOps (Machine Learning Operations) frameworks that can monitor performance degradation and trigger automated retraining pipelines. The demonstrated resilience of the adaptive model makes a strong case that future PdM systems must be designed as living systems that evolve with the equipment they monitor. 5.3. The SECI Knowledge Cycle in Practice: From Tacit Expertise to Explicit, Actionable Intelligence This research provides a practical demonstration of the SECI (Socialization, Externalization, Combination, Internalization) knowledge creation model, effectively bridging the gap between the tacit knowledge of human experts and the explicit, data-driven outputs of the ML-BWM framework [21]. The Externalization phase—converting tacit knowledge into an explicit form—is the cornerstone of this framework. An experienced maintenance engineer develops an intuitive, tacit understanding of which machine signals indicate trouble. The ML model codifies this intuition by analyzing historical data and producing explicit, quantifiable outputs. A prime empirical example is the model’s identification of sd (standard deviation) and rms (root mean square) as the most critical features for fault detection. This result externalizes the engineer’s “feel” for vibration anomalies into a ranked, numerical format that can be stored, shared, and audited. Furthermore, the granular error analysis in Section 4.3, which revealed the model’s confusion between ball faults and the OR_014_6_1 fault, is another form of externalization. It makes the tacit difficulty of distinguishing similar fault signatures an explicit documented pattern. Automation 2025, 6, 91 22 of 25 The Combination phase, where explicit knowledge is synthesized to create new knowledge, is demonstrated in the final prioritization step. The explicit feature importances (knowledge set 1) are mathematically combined with the BWM framework (knowledge set 2) to generate a new explicit output: the final priority scores and the ranked list of at-risk bearings shown in Table 9. This ranked list represents a higher order of explicit knowledge that is directly actionable for maintenance planning. Finally, the framework facilitates Internalization and Socialization, completing the knowledge spiral. When engineers use the system’s prioritized recommendations (explicit knowledge) to guide their work, they observe the outcomes and begin to internalize the data-driven patterns. For instance, an engineer who repeatedly acts on an OR_014_6_1 alert and discovers a ball fault will internalize this pattern, refining their own tacit expertise. This enhanced intuition can then be shared with colleagues through on-the-job training and collaborative problem-solving (Socialization), creating a continuous cycle of organizational learning where human expertise is augmented by AI and vice versa. 5.4. Equating Feature Importance with BWM Weights The methodological choice to utilize machine learning feature importance scores as weights for the Best–Worst Method (BWM) directly is one that needs to be argued for because it constitutes a departure from the standard expert-driven approach. The core justification for this equivalence is the pursuit of an objective, scalable, and data-driven decision-making process. In predictive maintenance, the “importance” of an attribute (a sensor property) is its ability to predict an impending fault. The Random Forest model’s feature importance values are a qualitative, direct measure of such predictive capability. They reflect the amount of contribution of each attribute to the model’s classification accuracy learned from previous instances. By comparing these scores against BWM weights, we ground the prioritization model in empirical evidence ,in which sensor readings are the most predictive of a machine’s state of health. This empirically driven approach has several advantages over expert judgment, such as consistency, repeatability, and the potential to find non-obvious patterns in the data that a human expert will not notice. Thus, this framework can be viewed as creating a sound, objective baseline for BWM. Data-driven weights eliminate the initial subjectivity and inconsistency of exclusive dependence on human evaluations, creating a firm, evidence-based foundation. This baseline can be then critiqued and, if necessary, modulated by domain experts who might superimpose contextual knowledge of operational risk, safety, and cost—aspects not explicitly captured in the fault classification model. This positions the framework as a strong data-enhanced tool that supports expert judgment and renders the BWM process more scalable, transparent, and empirically based. 5.5. Practical Considerations for Deployment: Data Integrity and Robustness A crucial aspect of deploying any predictive maintenance framework in a real-world industrial setting is its ability to handle imperfect data. The simulations in this study have already demonstrated the framework’s resilience to noisy inputs through its adaptive learning capability, which maintained high accuracy even after the introduction of statistical noise to the data stream. However, sensor failures and intermittent data loss represent additional challenges that must be addressed to ensure operational robustness. To handle missing data and sensor failures, the framework can be enhanced with a preliminary data integrity module. This module would employ a multi-stage strategy: 1. Detection: The first step involves identifying missing values. This includes not only null entries but also invalid placeholder values, such as a zero reading for a feature like standard deviation, which should be programmatically marked as missing. Automation 2025, 6, 91 23 of 25 2. Imputation: Once identified, missing values must be imputed to maintain a complete dataset for the model. The choice of imputation method depends on the extent of the missing data: • • For sporadic or small gaps (e.g., less than 5% of a feature’s data), computationally efficient methods like mean or median imputation can be used. For timeseries data, forward-fill or backward-fill methods are often preferable as they preserve the temporal sequence. For more significant data gaps, advanced techniques such as K-Nearest Neighbors (KNN) imputation would be employed. KNN imputation estimates a missing value based on the values of its closest neighbors in the feature space, thereby preserving the complex relationships between different sensor readings. By incorporating these data imputation strategies as a preprocessing step, the framework can mitigate the impact of common data quality issues, ensuring that the machine learning model receives a clean and complete dataset. This enhancement significantly strengthens the framework’s reliability and readiness for deployment in dynamic and often imperfect industrial environments. 6. Conclusions This study demonstrates a transformative approach to predictive maintenance, shifting industrial practices from reactive responses to proactive, data-driven strategies. By integrating Machine Learning (ML) with the Best–Worst Method (BWM), the proposed framework enables organizations to make timely, informed decisions that significantly enhance operational efficiency, reliability, and asset longevity. Leveraging real-time insights from IoT-enabled sensor networks, the system rapidly detects early indicators of potential faults, prioritizes interventions, and recommends remedial actions through a synergistic blend of expert knowledge and computational intelligence. Compared to traditional expert-dependent approaches, the ML-enhanced BWM framework exhibits superior adaptability to changing operational conditions and delivers improved predictive accuracy. Empirical validation using the CWRU bearing fault dataset achieved a detection accuracy of 96%, underscoring its practical efficacy. Beyond accuracy, the proposed approach offers tangible benefits in reducing downtime, maintenance costs, and resource inefficiencies. Given its scalability, the framework holds strong potential for adoption across diverse industrial sectors, contributing to the development of resilient, sustainable, and self-optimizing Industry 4.0 ecosystems. 7. Future Research Directions Future work should advance the integration of ML with BWM by exploring hybrid multi-criteria decision-making (MCDM) frameworks, such as combining BWM with fuzzy methodologies, Analytic Hierarchy Process (AHP), or fuzzy TOPSIS. These combinations could enhance decision-making robustness, particularly under uncertainty, and provide complementary perspectives for prioritizing maintenance actions or overcoming implementation barriers. Incorporating explainable AI (XAI) is essential to improve transparency, interpretability, and stakeholder trust in algorithm-driven recommendations. Additionally, the deployment of advanced sensor technologies, edge computing architectures, and real-time optimization algorithms will be critical for handling high-velocity, high-volume industrial data streams. Industrial-scale pilot programs should be undertaken to assess the robustness and scalability of the proposed framework in varied operational environments. Finally, embedding ML-enhanced BWM frameworks into enterprise resource planning (ERP) Automation 2025, 6, 91 24 of 25 systems and knowledge management infrastructures will facilitate seamless integration into organizational workflows, thereby accelerating adoption and maximizing operational impact in the era of Industry 4.0. Author Contributions:. Conceptualization, E.M. and K.S.; methodology, E.M.; software, E.M.; validation, E.M. and A.M.; formal analysis, E.M.; investigation, E.M.; resources, K.S.; data curation, E.M.; writing—original draft preparation, E.M.; writing—review and editing, A.M. and K.S.; visualization, E.M.; supervision, K.S.; project administration, K.S. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: Case Western Reserve University Bearing Data Center at https://engineering.case.edu/bearingdatacenter. Conflicts of Interest: The authors declare no conflict of interest References 1. Zheng, X.; Lu, J.; Kiritsis, D. The emergence of cognitive digital twin: Vision, challenges and opportunities. Int. J. Prod. Res. 2022, 60, 7610–7632. 2. Case Western Reserve University. Bearing Data Center. Case School of Engineering. 2004. Available online: https://engineering.case.edu/bearingdatacenter (accessed on 25 July 2025). 3. Ren, S.; Liang, J.; Lu, H.; Wang, J.; Wu, H.; Lu, H.; Bao, Y.; Chen, H. Personalized Intelligent Knowledge Management Design of Electric Power Professional Technology Based on Domain Characteristic Knowledge Map. In Proceedings of the 2023 International Conference on Industrial IoT, Big Data and Supply Chain (IIoTBDSC) 2023, Wuhan, China, 22–24 September 2023; IEEE: New York, NY, USA, 2023; pp. 186–191. 4. Padeli, W.; Mustafa, W.A.; Pangil, F.; Abd Kadir, K.; Dwita, V. Knowledge management and the Fourth Industrial Revolution (4IR): A recent systematic review. J. Adv. Res. Appl. Sci. Eng. Technol. 2025, 51, 18–33. 5. Andronie, M.; Lăzăroiu, G.; Karabolevski, O.L.; Ștefănescu, R.; Hurloiu, I.; Dijmărescu, A.; Dijmărescu, I. Remote big data management tools, sensing and computing technologies, and visual perception and environment mapping algorithms in the Internet of Robotic Things. Electronics 2022, 12, 22. 6. Bessaoudi, M.; Habbouche, H.; Benkedjouh, T.; Mesloub, A. A hybrid approach for gearbox fault diagnosis based on deep learning techniques. Int. J. Adv. Manuf. Technol. 2024, 133, 2861–2874. 7. Çalışkan, M.; Cicioğlu, M.; Çalhan, A.; Dirik, A.E. Machine Learning-Based Failure Detection Using Internet of Things: A Case Study for Proactive Maintenance. 2024. Available online: https://ssrn.com/abstract=5038202 (accessed on 25 July 2025). 8. Olajiga, O.K.; Ari, E.C.; Olulawal, K.A.; Montero, D.J.P.; Adeleke, A.K. Intelligent monitoring systems in manufacturing: Current state and future perspectives. Eng. Sci. Technol. J. 2024, 5, 750–759. https://doi.org/10.51594/estj.v5i3.870 . 9. Adimulam, T.; Bhoyar, M.; Reddy, P. AI-Driven Predictive Maintenance in IoT-Enabled Industrial Systems. Iconic Res. Eng. J. 2019, 2, 398–410. 10. Sun, D.; Hu, J.; Wu, H.; Wu, J.; Yang, J.; Sheng, Q.Z.; Dustdar, S. A comprehensive survey on collaborative data-access enablers in the IIoT. ACM Comput. Surv. 2023, 56, 1–37 11. Zero, E.; Sallak, M.; Sacile, R. Predictive Maintenance in IoT-Monitored Systems for Fault Prevention. J. Sens. Actuator Netw. 2024, 13, 57. 12. Garcés-Jiménez, A.; Rodrigues, A.; Gómez-Pulido, J.M.; Raposo, D.; Gómez-Pulido, J.A.; Silva, J.S.; Boavida, F. Industrial Internet of Things embedded devices fault detection and classification. A case study. Internet Things 2024, 25, 101042. 13. Hala EMellouli Anwar, M.; Abdelhamid, Z. Enhancing Industrial Decision-Making Through ML-Integrated Frameworks and Multi-Criteria Decision-Making Approach. 2024. Available online: https://assets-eu.researchsquare.com/files/rs- 4125064/v1/d0d40d08-2e3f-448a-b150-10d30ff96f40.pdf (accessed on 25 July 2025). 14. Lajimi, H.F.; Haeri, S.A.S.; Sorouni, Z.J.; Salimi, N. Supplier selection based on multi-stakeholder Best-Worst Method. J. Supply Chain. Manag. Sci. 2021, 2, 19–32. 15. Sadjadi, S.J.; Karimi, M. Best worst multi criteria decision making method: A robust approach. Decis. Sci. Lett. 2018, 7, 323–340. https://doi.org/10.5267/j.dsl.2018.3.003. Automation 2025, 6, 91 16. 25 of 25 Yucesan, M.; Gul, M.; Celik, E. A holistic FMEA approach by fuzzy-based Bayesian network and best–worst method. Complex Intell. Syst. 2021, 7, 1547–1564. 17. Parab, G.U. Manufacturing 4.0: AI-Driven analytics for predictive maintenance. Int. J. Multidiscip. Res. 2024, 6. Available online: https://www.ijfmr.com/papers/2024/6/33539.pdf (accessed on 25 July 2025). 18. Deters, J.K.; Janus, S.; Silva, J.A.L.; Wörtche, H.J.; Zuidema, S.U. Sensor-based agitation prediction in institutionalized people with dementia A systematic review. Pervasive Mob. Comput. 2024, 98, 101876. 19. Wellsandt, S.; Klein, K.; Hribernik, K.; Lewandowski, M.; Bousdekis, A.; Mentzas, G.; Thoben, K.D. Hybrid-augmented intelligence in predictive maintenance with digital intelligent assistants. Annu. Rev. Control. 2022, 53, 382–390. 20. De Silva, P.; Gunarathne, N.; Kumar, S. Exploring the impact of digital knowledge, integration and performance on sustainable accounting, reporting and assurance. Meditari Account. Res. 2024, 33, 497–552. 21. Liu, Y.; Guo, B.; Li, N.; Ding, Y.; Zhang, Z.; Yu, Z. CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community. IEEE Commun. Surv. Tutor. 2024, 27, 1191–1237. 22. Kumar, R.; Agrawal, N. Analysis of multi-dimensional Industrial IoT (IIoT) data in Edge-Fog-Cloud based architectural frameworks: A survey on current state and research challenges. J. Ind. Inf. Integr. 2023, 35, 100504. 23. Nonaka, I.; Takeuchi, H. The Knowledge Creating Company: How Japanese Companies Create the Dynamics of Innovation; Oxford University Press: Oxford, UK, 1995. 24. Lorenc, A.; Iwaszczuk, N. How to find disruptions in logistics processes in the cold chain and avoid waste of products? Appl. Sci. 2024, 14, 255. 25. Lorenc, A.; Szarata, J.; Czuba, M. Real-time location system (RTLS) based on the Bluetooth technology for internal logistics. Sustainability 2023, 15, 4976. 26. Lorenc, A.; Kuźnar, M.; Lerher, T. Solving product allocation problem (PAP) by using ANN and clustering. FME Trans. 2021, 49, 206–213. 27. Yang, Y.; Zhai, J.; Wang, H.; Xu, X.; Hu, Y.; Wen, J. An Improved Fault Diagnosis Method for Rolling Bearing Based on ReliefF and Optimized Random Forests Algorithm. Machines 2025, 13, 183. 28. Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 1–37. Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

(PDF) Machine Learning-Driven Best-Worst Method for Predictive Maintenance in Industry 4.0