The NamedMap API

ScaleOut Software NamedCache API

This topic contains a conceptual overview of the NamedMap API.

This topic contains the following sections:

The NamedMap class allows you to efficiently store large numbers of objects (typically millions) in the distributed data grid in such a way that they can be analyzed in bulk very efficiently.

Populating a NamedMap

To efficiently add a large number of objects, use the Soss.Client.ConcurrentBulkLoaderTKey, TValue class, an instance of which can be obtained by calling the NamedMapTKey, TValueCreateBulkLoader method. The BulkLoader class allows multiple map operations to be grouped together prior to being sent to the ScaleOut service, resulting in fewer round trips and higher overall throughput.

Client Cache

To maximize access performance, the NamedMap maintains an internal near cache that contains deserialized versions of recently accessed objects. When reading objects, this client-side cache reduces access response time by eliminating data motion and deserialization overhead.

A configurable number of recently read values can be stored in the client application's memory using this client cache. On subsequent reads, cached values for keys are returned if they are not older than what is allowed by a configurable coherency interval. A coherency interval of 0 disables the client cache and causes read operations to access the StateServer service.

The client cache size can be configured by using the NamedMapTKey, TValueClientCacheSize property (the default size is 10000 entries). The client cache's coherency interval can be adjusted by using the NamedMapTKey, TValueCoherencyIntervalMilliseconds property (the default coherency time is set to 0 milliseconds, which disables the client cache).

If the client cache is enabled, object changes will be visible to all threads using the same NamedMap instance. The client cache is an in-process reference cache, so, if the client cache is enabled, the application may need to make deep copies of objects in the map if the objects will be simultaneously modified by multiple threads, otherwise corruption of the objects in the client cache may occur.

Parallel Method Invocations

The ScaleOut StateServer service hosts a powerful parallel execution engine that allows you to efficiently analyze all of the objects in a NamedMap by using all of the hosts in your data grid. Each host simultaneously performs operations only on its local subset of keys, which boosts performance by avoiding the movement of data across the network. An invocation is initiated by calling the NamedMapTKey, TValueInvoke method.

Prior to calling the Invoke method, a Parallel Method Invocation (PMI) operation must be defined using the Soss.Client.ConcurrentNamedMapInvokableTKey, TValue, TParam, TResult class, whose Eval and Merge callbacks need to be specified at construction. PMI calls require an invocation grid (Soss.ClientInvocationGrid) to be assigned to the named map instance via the NamedMapTKey, TValueInvocationGrid property.

Queries

Queries are parallel operations that run simultaneously on all hosts in the data grid and return a list of matching keys. To query a map, use NamedMapTKey, TValueExecuteParallelQuery with a Soss.Client.ConcurrentQueryConditionTKey, TValue implementation as an argument. If this query condition argument is null then all of the map's keys will be returned.

Query operations require an invocation grid (Soss.ClientInvocationGrid) to be assigned to the named map instance via the NamedMapTKey, TValueInvocationGrid property.

MapReduce operations

The NamedMap API allows you to run MapReduce operations on data stored in a NamedMap. This built-in, in-memory MapReduce framework does not require Hadoop, which makes developing MapReduce applications quick and easy. To get started with Simple MapReduce, you will need to override one method on each of the following three abstract classes:

The NamedMap API offers two RunMapReduce method signatures:
  1. The first one, NamedMapTKey, TValueRunMapReduceMK, MV, OK, OV(NamedMapOK, OV, MapperTKey, TValue, MK, MV, CombinerMK, MV, ReducerMK, MV, OK, OV, TimeSpan) is an instance-based method which runs the MapReduce task using locally stored key/value pairs.

  2. The second overload, NamedMapTKey, TValueRunMapReduceMK, MV, OK, OV(NamedMapTKey, TValue, NamedMapOK, OV, MapperTKey, TValue, MK, MV, CombinerMK, MV, ReducerMK, MV, OK, OV, TimeSpan), is a static method that gets the input named map data collection and outputs reduced values into a separate named map.

If a combiner is specified, the returned key must be the same as the parameter key.

The standard word count MapReduce application can be easily written using MapReduce. See the code example at the end of the NamedMapTKey, TValueRunMapReduceMK, MV, OK, OV(NamedMapOK, OV, MapperTKey, TValue, MK, MV, CombinerMK, MV, ReducerMK, MV, OK, OV, TimeSpan) method's documentation on how to implement a simple word count application.

Custom Serialization

Custom serialization can be used to efficiently store keys and values in memory. To use a custom serializer, the user must inherit from the Soss.Client.ConcurrentCustomSerializerT class and implement the desired serialization and deserialization logic. Concrete CustomSerializer subclasses can then be provided to the NamedMapTKey, TValueNamedMapTKey, TValue constructor.

Every instance of a client application across the grid should use the same set of custom serializers assigned to a given named map, otherwise serialization errors may occur.

For Parallel Method Invocation (PMI) operations, if a PMI parameter object or a result object needs a custom serializer then it can be set through the NamedMapInvokableTKey, TValue, TParam, TResultNamedMapInvokableTKey, TValue, TParam, TResult constructor when defining the invocation operation.

Usage Notes