Federated Learning · Tool Condition Monitoring

Federated learning for tool condition monitoring in machining

Scalable, data-sovereign learning across machines and sites. Without transferring raw data.

Use case
Tool wear monitoring in milling
Learning mode
Federated: local training, shared models
Data flow
Raw data stays on the shop floor

Status quo: tool-wear models in machining

Single-site models reach good lab accuracy but rarely transfer to new machines and jobs.

Typical ML models (single site / single setup)

Accuracy
CNN / LSTM
F1 ~ 0.78
Gradient Boosting
F1 ~ 0.74
Autoencoder
AUC ~ 0.70
Hybrid
F1 ~ 0.81

Good results in the lab or on one dataset, but often unstable when transferred to new machines or jobs.

Why "one model per use case" rarely scales

Obstacles
  • Domain shift: a different machine, tool, material, coolant or parameter set changes the signal statistics.
  • Label scarcity: wear labels require measurement and machine downtime, so ground truth is expensive.
  • Rare failure cases: chatter and breakage are rare, so the data is heavily imbalanced.
  • Data silos and governance: production data often cannot be merged into one central store.
  • Drift: bearings, spindle and sensors change over time, so models age in operation.

A locally perfect model generalizes poorly, and maintenance and retraining effort increases sharply.

What is federated learning?

Federated learning trains a shared model across many data sources without moving the raw data.

Federated Learning (FL)

What it is
  • Decentralized training across several data sources (machines or sites).
  • Training happens locally on each machine or site.
  • Only model updates (weights or gradients) are shared.
  • An aggregation server builds a global model (for example via FedAvg).
  • Suited to data restrictions (IP, privacy, network or IT constraints).
  • Goal: better generalization across heterogeneous production environments.

Transfer Learning (TL)

How it differs
  • A model is pre-trained on a source domain and fine-tuned on the target.
  • Typically the data (or features) must be available at the target.
  • It does not protect data sovereignty on its own.

Which problems does FL solve?

  • Generalization across many machines without centralizing raw data.
  • Continuous learning under drift and changing jobs.
  • Fast rollout: one global model for the whole fleet.

Obstacles and pitfalls

  • Non-IID data: differing distributions slow convergence.
  • System heterogeneity: compute power and cycle times vary.
  • Communication: bandwidth, outages and round coordination.
  • Privacy and security: updates can leak information, so use secure aggregation or differential privacy.

Transfer learning means "transfer knowledge and adapt locally"; federated learning means "learn together without sharing raw data".

The federated learning process

Local training at the source machines, aggregation in the cloud, and a distributed, improved global model.

Collect Data
Each site collects data from its machines
Local Training
Each site trains a local model on its own data
Upload Local Models
Upload the local models to the cloud
Model Aggregation
Aggregate the local models into a global model
Download Global Models
Download the global models from the cloud
Step 0 of 0

Key building blocks

  • Client training on local data.
  • Aggregation of model updates into a global model.
  • Versioning of models and training rounds.
  • Monitoring of model and data drift.

Security

  • Secure aggregation of model updates.
  • Transport encryption (TLS).
  • Access control for participants and assets.
  • Auditing of rounds and artifacts.

Data generation at a single machine

Each machine is instrumented to turn raw process signals into feature streams that can be labeled and learned from.

Milling process · video

Machine M1 · raw data → feature streams

Live

Process recording coming soon

Recording of the cutting process, used to illustrate where the sensor data originates.

Live telemetry · Grafana

Spindle current · vibration · feed

Streaming

Live dashboard available on the internal network

Telemetry such as spindle current, vibration and feed is streamed and feature-extracted per time window.

Scaling: data generation across three machines

The same instrumentation and pipeline building block extends from one machine to a fleet.

M1 · 3-axis milling

Signals: vibration (XYZ), spindle current, PLC. Features: windowing, RMS, FFT bands. Labelling: tool-life proxy and periodic measurement.

Edge agent → MQTT

M2 · 5-axis / aluminum

Signals: acoustic emission, vibration, torque. Features: same API and time base. Local storage: ring buffer plus batch upload.

Edge agent → MQTT

M3 · robot cell

Signals: force/torque, motor currents. Features: same pipeline modules. Events: anomaly flags and process windows.

Edge agent → MQTT

What matters for scaling

  • A unified time base (clock sync), the same window definition and the same units.
  • Schema and metadata per job: tool ID, material, cutting parameters, sensor position.
  • "Same code, different config": the pipeline as a template per machine type.

Application: condition monitoring with a trained FL model

A federated global model is more stable across machines and jobs than a single-site baseline.

Before: single-site models

Baseline
M1 → M1 (seen)
F1 ~ 0.82
M1 → M2 (unseen)
F1 ~ 0.61
M1 → M3 (unseen)
F1 ~ 0.57

Typical strong overfitting to one setup and process, leading to poor transfer.

After: federated global model

FL
Global → M1
F1 ~ 0.86
Global → M2
F1 ~ 0.81
Global → M3
F1 ~ 0.79

More stable performance across machines and jobs, with less local retraining and faster commissioning.

Operating mode

  • Inference at the edge: condition indicator, remaining tool life or anomaly score in real time.
  • Round-based training: locally during idle times or adaptively on drift and job changes.
  • Rollout: the global model is signed, versioned and deployed to the fleet.

Outlook: challenges and pitfalls

Federated learning avoids raw-data centralization, but its success depends on standardized data, reliable orchestration and protected aggregation.

Data and semantics

Standardization
  • A unified data description: units, sensor position, sampling, job and tool metadata.
  • Label definitions: what counts as a "wear state" or an "anomaly"?
  • Quality: missing data, synchronization, drift, sensor changes.

MLOps and operations

Orchestration
  • Orchestration of model deployment (versioning, rollback, canary).
  • Round control: when to train, who participates, how to handle outages?
  • Monitoring: model drift, data drift, performance per machine and job.

Transferability of the knowledge

  • How well does global knowledge fit new tools and materials (out-of-distribution)?
  • Strategies: personalized FL, adapter layers, cluster-FL by machine type.
  • Security and privacy: secure aggregation, differential privacy, robust aggregation against poisoning.

Take away

  • Federated learning fits deployments where raw data must not be centralized.
  • Standardized data, reliable orchestration and secured aggregation remain prerequisites in production.

Explore further