Hosting Digital Twins Using In-Memory Computing

In order to achieve the goals of low message-processing latency, high throughput, transparent scalability to simultaneously handle thousands of instances, and high availability, the ScaleOut Digital Twin Streaming Service hosts digital twins using the ScaleOut StreamServer in-memory computing platform. This platform incorporates an in-memory data grid (IMDG) for hosting digital twin instances as memory-based objects and an integrated in-memory compute engine for running message-processing and simulation code defined by digital twin models.

When a digital twin model is deployed to the ScaleOut Digital Twins™ service, its associated class definition and code are loaded into the IMDG’s compute engine and await incoming messages or simulation time events. When used for streaming analytics in a live system, the ScaleOut Digital Twins UI lets the user connect to physical data sources by establishing connections between the IMDG and popular event hubs, such Azure IoT Hub, AWS IoT Core, and Azure Kafka; messages can also be sent to the IMDG using a REST service. Using these connections, the IMDG receives messages from data sources and directs them to real-time digital twin instances running within the IMDG. It automatically creates a new instance when the first message arrives from its corresponding data source. (Alternatively, a CSV can specify the inital set of instances created by each model.) For each instance, the in-memory compute engine runs the real-time digital twin model’s message-processing code, which can send messages back to the data sources or to other hierarchical instances within the IMDG.

As an example, the following diagram shows three real-time digital twin instances processing messages for corresponding rental cars. The arrows indicate that the IMDG directs incoming messages from each car to its corresponding instance for message processing.

car_instances_image

The IMDG implements a software-based, key-value store of serialized objects that spans a cluster of commodity servers (or cloud instances). Its architecture provides cost-effective scalability and high availability while hiding the complexity of distributed in-memory storage from the applications which use them. This makes an IMDG an excellent fit for hosting digital twin instances. The IMDG’s integrated compute engine can take full advantage of a cluster’s computing power to run application code within the IMDG — where the data lives — to maximize performance and avoid network bottlenecks. This enables the IMDG to run message-processing code with low latency (typically just a few milliseconds) and transparently scale by adding servers to deliver high throughput for large numbers of instances. The IMDG’s ability to scale ensures that message-processing latency stays low as workloads grow. Since incoming messages are delivered to their associated real-time digital twin instances, no network overhead or database request is required to access the instance’s state information.

As part of message-processing, real-time digital twin instances can create alerts for human attention and/or feedback directed at their corresponding data sources. For example, the rental car application could alert managers when a driver repeatedly exceeds the speed limit according to criteria specific to the driver’s age and driving history, and it could allow a manager to query the status of any specific car to examine dynamic information. These data flows are illustrated in the following diagram:

alerts_and_queries_image

Although the in-memory state of a digital twin instance consists of only the in-memory data needed for processing, the application can reference (or update) historical data from external database servers to broaden its context, as shown below. For example, the rental car application could access driving history only when incoming telemetry indicates a driving pattern that creates the need for it. It also could store past events in a database for archival purposes.

persist_to_db_image

In addition to providing fast message processing, the IMDG’s compute engine can perform continuous, data-parallel calculations (e.g., MapReduce) on the state of all digital twin instances to implement real-time, aggregate analytics that complete every few seconds. Moving this state information to an offline store for batch analysis using big data platforms such as Apache Spark can create significant delays (minutes to hours). Instead, the IMDG can perform this analysis in place and in real time to immediately identify aggregate trends and thereby maximize situational awareness. A key benefit of the digital twin model is its ability to define state information that can be processed in this manner and offer additional feedback for both message-processing and alerting.

In summary, the ScaleOut Digital Twins IMDG provides a fast, scalable platform for hosting digital instances and running their message-processing code while simultaneously performing real-time aggregate analytics to maximize situational awareness. Its ability to automatically correlate messages by data sources and let applications analyze these messages within the context of a digital twin’s dynamic state information, enable the IMDG to deliver important new capabilities for stream processing and consistently high performance.