Gathering and Download of New Training Data

You can design your digital twin model so that when instances identify a problem, they report an anomaly to the system. The system will store the values along with the label.

Digital Twin Code Changes

Whether you use TensorFlow or ML.NET algorithms, if you followed the Making predictions from a C# digital twin model tutorial, you will need the same modifications to your code to start reporting anomalies.

To report anomalies, you will use the other API from the IAnomalyDetectionProvider interface: ReportAnomalyDataAsync

namespace Scaleout.Streaming.DigitalTwin.Core
{
  /// <summary>
  /// Encapsulates the capabilities of a ScaleOut real-time digital twin
  /// anomaly detection provider
  /// </summary>
  public interface IAnomalyDetectionProvider
  {
      /// <summary>
      /// Detects anomalies by using the trained algorithm and the provided property values
      /// </summary>
      /// <param name="properties">A dictionary of the properties to use for the prediction</param>
      /// <returns>True if an anomaly is detected, False otherwise</returns>
      bool DetectAnomaly(Dictionary<string, float> properties);

      /// <summary>
      /// Add anomaly data. Used for retraining ML algorithms, anomaly data consists in
      /// a collection of properties and their associated values, along with a label
      /// to indicate whether these properties constitute an anomaly.
      /// </summary>
      /// <param name="properties">A collection of properties and their values</param>
      /// <param name="isAnomaly">True if the values constitute an anomaly</param>
      Task ReportAnomalyDataAsync(Dictionary<string, float> properties, bool isAnomaly);
  }
}

Say that you have a way to determine that a problem happened (let’s call it ProblemHappened()), you can add a call to ReportAnomalyDataAsync in your MessageProcessor:

public override ProcessingResult ProcessMessages(ProcessingContext context, SensorsRTModel digitalTwin, IEnumerable<DigitalTwinMessage> newMessages)
{
  // Look up the Anomaly Detection Provider by name from the context
  var algorithm = context.AnomalyDetectionProviders["Overheating"];

  // ...

  if (ProblemHappened() == true)
  {
      // Build the dictionary of properties with the current values
      Dictionary<string, Single> properties = new Dictionary<string, float>();
      properties[nameof(SensorsRTModel.Temperature)] = digitalTwin.Temperature;
      properties[nameof(SensorsRTModel.RPM)] = digitalTwin.RPM;
      properties[nameof(SensorsRTModel.Friction)] = digitalTwin.Friction;

      // Report the anomaly using current values
      await algorithm.ReportAnomalyDataAsync(properties, isAnomaly: true);

Provide the values for the tracked properties along with the appropriate label, this new data will be stored for later use.

Deploying a Digital Twin Model for Data gathering

When you deploy a digital twin model that uses a machine learning algorithm, select the check box to display machine learning options:

ml_deploy

When deploying a digital twin model that will gather new training data, pick the “Gather data only” option in the Deploy page:

ml_deploy_gather

Track and Download Training Data

You can track how many new data points have been gathered by navigating to the Machine Learning tab of the digital twin model.

ml_tab

ml_track_data

On the same page, you can also download a CSV file containing the new training data as reported by the digital twin instances.

ml_download_data

Note

This file will have the same format as the file that was used to train your original algorithm, with the same column order, making it easy to add to your existing training dataset.

Retrain the Algorithm

In cases where the algorithm itself does not support incremental retraining, you may need to train the algorithm again from scratch by adding the new training data to the original dataset.

For ML.NET algorithms that don’t support retraining, build your extended dataset and use the ScaleOut Machine Learning Training Tool to train a new algorithm and generate the zip file.

For TensorFlow algorithms, you would also retrain it using your Python scripts and then use the ScaleOut Machine Learning Training Tool to generate the zip file.

In either case, you will need to manually upload the new algorithm.