Gathering and Download of New Training Data
You can design your digital twin model so that when instances identify a problem, they report an anomaly to the system. The system will store the values along with the label.
Digital Twin Code Changes
Whether you use TensorFlow or ML.NET algorithms, if you followed the Making predictions from a C# digital twin model tutorial, you will need the same modifications to your code to start reporting anomalies.
To report anomalies, you will use the other API from the IAnomalyDetectionProvider interface: ReportAnomalyDataAsync
namespace Scaleout.Streaming.DigitalTwin.Core
{
/// <summary>
/// Encapsulates the capabilities of a ScaleOut real-time digital twin
/// anomaly detection provider
/// </summary>
public interface IAnomalyDetectionProvider
{
/// <summary>
/// Detects anomalies by using the trained algorithm and the provided property values
/// </summary>
/// <param name="properties">A dictionary of the properties to use for the prediction</param>
/// <returns>True if an anomaly is detected, False otherwise</returns>
bool DetectAnomaly(Dictionary<string, float> properties);
/// <summary>
/// Add anomaly data. Used for retraining ML algorithms, anomaly data consists in
/// a collection of properties and their associated values, along with a label
/// to indicate whether these properties constitute an anomaly.
/// </summary>
/// <param name="properties">A collection of properties and their values</param>
/// <param name="isAnomaly">True if the values constitute an anomaly</param>
Task ReportAnomalyDataAsync(Dictionary<string, float> properties, bool isAnomaly);
}
}
Say that you have a way to determine that a problem happened (let’s call it ProblemHappened()), you can add a call to ReportAnomalyDataAsync in your MessageProcessor:
public override ProcessingResult ProcessMessages(ProcessingContext context, SensorsRTModel digitalTwin, IEnumerable<DigitalTwinMessage> newMessages)
{
// Look up the Anomaly Detection Provider by name from the context
var algorithm = context.AnomalyDetectionProviders["Overheating"];
// ...
if (ProblemHappened() == true)
{
// Build the dictionary of properties with the current values
Dictionary<string, Single> properties = new Dictionary<string, float>();
properties[nameof(SensorsRTModel.Temperature)] = digitalTwin.Temperature;
properties[nameof(SensorsRTModel.RPM)] = digitalTwin.RPM;
properties[nameof(SensorsRTModel.Friction)] = digitalTwin.Friction;
// Report the anomaly using current values
await algorithm.ReportAnomalyDataAsync(properties, isAnomaly: true);
Provide the values for the tracked properties along with the appropriate label, this new data will be stored for later use.
Deploying a Digital Twin Model for Data gathering
When you deploy a digital twin model that uses a machine learning algorithm, select the check box to display machine learning options:
When deploying a digital twin model that will gather new training data, pick the “Gather data only” option in the Deploy page:
Track and Download Training Data
You can track how many new data points have been gathered by navigating to the Machine Learning tab of the digital twin model.
On the same page, you can also download a CSV file containing the new training data as reported by the digital twin instances.
Note
This file will have the same format as the file that was used to train your original algorithm, with the same column order, making it easy to add to your existing training dataset.
Retrain the Algorithm
In cases where the algorithm itself does not support incremental retraining, you may need to train the algorithm again from scratch by adding the new training data to the original dataset.
For ML.NET algorithms that don’t support retraining, build your extended dataset and use the ScaleOut Machine Learning Training Tool to train a new algorithm and generate the zip file.
For TensorFlow algorithms, you would also retrain it using your Python scripts and then use the ScaleOut Machine Learning Training Tool to generate the zip file.
In either case, you will need to manually upload the new algorithm.