Deployment Considerations

DEP-1: Provision Sufficient Physical Memory

To maintain high performance and stability, it is important that the SOSS service always runs in physical memory. Otherwise, the service will start paging to disk, which will cause the distributed cache to become unresponsive. SOSS’s memory requirements for each host in the distributed cache include the following different components:

  • memory required for the host’s portion of stored objects
  • memory required for object replicas
  • extra memory to handle dynamic load-balancing
  • extra memory to handle recovery from host failures
  • extra memory for other SOSS data structures

The formula to provision memory for objects and replicas is as follows. First, determine a maximum expected storage requirement for the distributed cache by multiplying the average object size by the maximum number of objects expected to be stored at any one time; call that number M bytes. The total memory required for object storage in the SOSS cache will then be R*M, where R is the number of replicas per object (1 or 2). The memory needed per host is then R*M/N, where N is the number of hosts in the server farm. Note that the memory required per host decreases as hosts are added for a given total cache size.

For example, if you need to store 300MB of object data on a 6-server farm, the distributed cache will need to store 600MB total if one replica is used per object, and each host then needs 100MB of available RAM for stored objects.

We strongly recommend that you provision servers with at least 50% more memory than the above calculated minimum requirement to handle the "extra" memory needs listed above. In addition, the amount of extra memory you need to handle host failures depends on the number of failures that the distributed cache must handle and the number of servers in the farm. The amount of required extra capacity decreases as you add servers to the farm. If you have four hosts, the surviving three hosts together have to handle an extra 25% of the total workload (about an extra 8.4% per host) after the first host failure. Two surviving servers would have to handle 50% of the workload after a second failure.

The basic memory overhead of the ScaleOut StateServer service itself is about 2MB for each host running the SOSS service. Also, a small amount of additional memory is used per-object.

You also can elect to run SOSS in a fixed amount of memory per host by setting the configuration options to trigger memory reclamation when a specific memory limit is reached. This usage model must match the requirements of your application so that objects are not unexpectedly removed from the distributed cache. Even in this usage model, extra memory needs to be provisioned to handle load-balancing and failover.

DEP-2: Make sure that the CPU is not overloaded

The SOSS service typically uses less than 5-8% of the CPU under moderate load. If the CPU becomes overloaded, SOSS may be unable to maintain service. Other SOSS hosts may detect this problem by missing heartbeat messages from the overloaded server and may then remove the overloaded server from the distributed cache.

Also, it is important to provision extra CPU capacity to handle host failures. To ensure the continued operation of SOSS when a host failure occurs, the remaining hosts in the farm need to be able to take up the load. For example, if each host in a three server farm has a CPU utilization of 90% and one host fails, the other two will likely become overloaded. However, if each server’s CPU were being utilized at 30%, then the remaining two should each be able to absorb the additional 15% load and still maintain service.

DEP-3: Make sure that the network is not overloaded

Because of the need to replicate objects during updates and access remotely stored objects, SOSS generates enough network traffic to warrant the use of gigabit Ethernet to interconnect SOSS hosts. If you are using virtual servers, a fast network is especially important. For large computational grids with high access rates, a 10 Gigabit Ethernet or an Infiniband network will avoid network saturation and allow SOSS to maintain linear scalability.

SOSS typically uses less than 10% of network bandwidth. Any sustained network usage that is over 40% should be investigated. In practice, networks probably can only sustain about 50% of their "rated" bandwidth. For example, a 100 Mbps network probably can sustain only about 50 Mbps.

You can estimate the maximum required network bandwidth, B bytes/sec, for an SOSS distributed cache on the network that interconnects the SOSS hosts as follows:

B = (reads/sec * object size) + (updates/sec * object size * (R+1))

where R is the number of replicas per object (which can be set to 1 or 2). Additional network bandwidth may be required to connect remote clients to the distributed cache.

For example, a distributed cache with 125 reads/sec and 125 updates/sec for 100KB objects, would use about 37.5 Mbytes/sec, or about 375 Mbits/sec allowing for data encoding overhead. (This is beyond the theoretical saturation point for Fast Ethernet.)

Note that this is a worst case bandwidth estimate based on a pattern of repeated read/update pairs. SOSS’s internal client and server caches optimize network usage by eliminating repeated reads. However, SOSS also uses a small amount of network bandwidth for cache validation checks, point-to-point heartbeats, and multicast discovery.

DEP-4: Use a separate back-end subnet for best security

If your "front-end" network subnet is connected to the Internet (for example, in a Web farm), you should consider configuring ScaleOut StateServer for use on a separate, secure network subnet, typically a firewalled, "back-end" network used to connect your servers to an internal database server. This will enhance security for the data in the SOSS cache.

DEP-5: Use a single network switch for SOSS when possible

You should connect all servers to the same network switch when this is feasible and matches your bandwidth needs. If SOSS servers are connected across two or more switches and a connection between the switches fails, the distributed cache will partition itself into two separate caches. To maintain data integrity after this "split brain" situation is corrected, SOSS usually has to restart the hosts connected to one of the two switches; this can lead to the loss of the latest updates to the distributed cache.

For the same reason, you also should avoid splitting a single SOSS distributed cache across two physical sites, such as two data centers in different cities. In addition to the risk of the above "split brain" issue should a link between sites fail, a single WAN link usually causes performance problems due to its relatively low bandwidth and high latency in comparison to a LAN subnet. Instead, consider using ScaleOut GeoServer® to couple two or more distributed caches and replicate updates across WAN links. This product was designed specifically to solve the problem of maintaining cached data that is shared across two or more sites.

DEP-6: Use a dedicated caching tier with SOSS Remote Client

While the ScaleOut StateServer service can be deployed directly to web servers in your front-end tier, it is usually advantageous to run a dedicated caching tier. Web and application servers can then access ScaleOut hosts over the network by installing ScaleOut’s Remote Client libraries. This offers several benefits:

  • The ScaleOut service process will not compete with your application’s worker process for memory and CPU resources.
  • Since web server machines tend to be stopped, started, and generally more actively managed than back-end tiers, the use of a dedicated caching farm reduces disruptions to the ScaleOut data grid and greatly simplifies overall management.
  • Hardware can be allocated more effectively to address the role of each tier. For example, a small number of high-memory systems can be used for the caching tier, and a larger number of low-memory systems can be deployed on your web tier. Also, the caching tier optionally can be provisioned with a dedicated gigabit or Infiniband network to increase throughput.

DEP-7: Use banks of web farms instead of a large monolithic farm

To maintain flexibility of operations and higher overall availability for web applications, consider using a topology that includes multiple banks of server farms rather than a single large farm. In this configuration, an IP load-balancer would maintain affinity of Web or application clients to each bank, and each bank would run a separate SOSS distributed cache. This topology allows an entire bank to be taken down for maintenance or taken offline when traffic is slow. It also avoids a networking bottleneck and improves availability in case of networking failures which would otherwise result in an outage of the entire farm.

DEP-8: Use a VPN for connections over public networks

For high performance, SOSS does not encrypt communications between SOSS hosts and clients. You should consider using a virtual private network (VPN) to connect SOSS stores using the ScaleOut GeoServer option. This provides secure communications between sites and allows you to use gateway addresses that are only routable across the VPN.

When using ScaleOut’s Remote Client libraries to access a data grid, you should also use a virtual private network (VPN) to connect remote clients to an SOSS store if the remote clients access the SOSS hosts over a public network. This provides secure communications from remote clients located outside the SOSS store’s data center and allows you to use gateway addresses that are only routable across the VPN.

DEP-9: If you use NLB, make sure SOSS is on a separate NIC

ScaleOut StateServer is designed to work seamlessly with load balancers such as Network Load Balancing (NLB) in the Windows Server operating system. However, you should not use ScaleOut StateServer on the same network interface that has NLB installed and enabled. NLB filters incoming multicast network traffic, and this blocks ScaleOut StateServer’s multicast management messages. Instead, be sure to configure ScaleOut StateServer to use a different network interface.

DEP-10: Guidelines for sharing sessions in an ASP.NET farm

After installing ScaleOut StateServer to store ASP.NET session objects, the following additional steps must be taken to ensure that session objects are visible to all servers in a server farm. These steps are required whenever any "out of process" session provider (e.g., a database server or a distributed cache) is used to store ASP.NET session objects; they enable ASP.NET to use a common mechanism for identifying session objects across the server farm.

  1. In your .NET configuration, confirm that the <machineKey> setting is identical on all of the servers, and make sure that it is not set to "autoGenerate". Be sure to use an explicitly defined validationKey and decryptionKey. Strictly speaking, the machine key is not used for session state, but it must be synchronized for ASP.NET Forms authentication and ASP.NET view state to work in a web farm. See Microsoft TechNet’s Generate a Machine Key topic for more information.
  2. Confirm that the path to the application in IIS is exactly the same on all of your servers. Folder names must be capitalized identically, too.
  3. Is the Web application hosted somewhere other than the default Web site in IIS? If not, applications that run out of other sites will need to have their application paths synchronized across web servers in the IIS metabase (for example, "\LM\W3SVC\3"). A site’s ID (the final numeric element in its metabase path) can be manually configured in the IIS Manager by opening your web site’s "Advanced Settings" dialog and setting the "ID" field.

DEP-11: Be sure to enable event handling’s configuration parameter

If your application needs to catch asynchronous events, such as object timeouts, be sure to enable event handling on every SOSS host by setting the max_event_tries configuration parameter to a non-zero value; a typical value would be 3. Otherwise, the SOSS host will not deliver events to your application. By default, this parameter is set to 1 to minimize eventing overhead, but a higher value may be desired to account for transient unavailability of client applications (for example, if an ASP.NET worker process recycles). For more information, please see the Configuration Parameters topic.

DEP-12: Configure host gateways when using the ScaleOut Remote Client libraries

When using the ScaleOut Remote Client option, it is important to properly configure the Gateway Information on every SOSS host. By default, these gateways are set to the IP addresses used by each SOSS host to communicate with its peers on the selected network interface. If remote clients need to access SOSS hosts using a different network subnet, failing to configure the SOSS host gateways will cause loss of connectivity to the SOSS distributed cache.

For example, assume two SOSS hosts communicate with each other on a back-end 10.0.1.x network and that they have the IP addresses 10.0.1.1 and 10.0.1.2 respectively. If remote clients access the SOSS hosts using the 10.0.1.x subnet, no gateway reconfiguration is needed. However, if these SOSS hosts are also connected to a separate front-end network, for example, 192.168.1.x using IP addresses 192.168.1.1 and 192.168.1.2 respectively, and remote clients access the SOSS hosts using these IP addresses, then the SOSS gateways must be configured with the front-end IP addresses instead of the default back-end addresses.

In the above example, the remote clients also would be configured to populate their client configuration files with the 192.168.1.x IP addresses so that the clients can find the SOSS hosts. After initially connecting to an SOSS host, the client libraries automatically download the host gateways for client access to the distributed cache. If the gateways are configured with back-end IP addresses not reachable from the remote clients, the clients will then lose connectivity to the SOSS hosts.

Please see the section Configuring the Remote Client Option for full details on configuring remote clients.

DEP-13: Standalone hosts can use a loopback adapter if a network is not available or to keep hosts from discovering each other

ScaleOut StateServer requires a network connection and must be bound to an IP address or subnet for its normal operations. This enables the caching service to detect if a NIC has failed or if the network switch has lost power. If you are running a single-host SOSS cache for evaluation purposes, you can use the Microsoft loopback adapter to create a virtual network environment on standalone servers or laptops that do not have network connectivity. The loopback adapter can also be used to prevent SOSS hosts running on development machines from discovering each other.

  1. Install the loopback adapter:

    1. In Windows Device Manager, click on the top-level computer node in the tree and select "Add legacy hardware".
    2. Moving through the wizard, select the "Install hardware that manually select from a list (Advanced)" option.
    3. In the list of hardware types, select "Network adapters".
    4. When selecting a network adapter, choose "Microsoft" as the manufacturer, and "Microsoft KM-TEST Loopback Adapter" (Windows 8/2012 and higher) or "Microsoft Loopback Adapter" (older versions of Windows) as the adapter.
  2. Once the loopback adapter is installed, assign a static IP address to the new connection using Windows' TCP/IP properties dialog.
  3. Use the SOSS Console application to configure the ScaleOut StateServer service to use the new network interface.

DEP-14: When running SOSS on VMware guests, follow these guidelines for best performance

  1. Provision VMs without overloading the available CPU or network bandwidth on the physical server.
  2. Ensure that multicast is enabled in the virtual environment if automatic discovery is configured (the "Use Multicast" checkbox in the SOSS Console’s Host Configuration tab)--some virtual environments block multicast, which SOSS uses for dynamic host discovery.
  3. While SOSS can run on multiple VMs on a single physical machine, this configuration is not highly available, because a single hardware failure could cause the loss of multiple SOSS servers, thus potentially causing a loss of data. When high availability is a concern, we recommend running only one SOSS server per physical machine.
  4. Avoid dynamically moving VMs (using VMware’s vMotion feature, for example) that are running the SOSS service, as the pause in point-to-point network traffic may be interpreted by the SOSS service as a failure scenario.
  5. Avoid taking and/or consolidating snapshots of VMs, as snapshot activity may quiesce the guest system for an extended period, which may be detected by other instances of the SOSS service as an outage.
  6. In general, ScaleOut servers should leave the distributed store before any planned maintenance which could lead to a pause or drop in network traffic or guest responsiveness.
  7. If possible, use VMware’s latest virtual network adapter (VMXNET 3) for the best possible networking performance in your VMware guests. See VMware KB article 1001805 for details on choosing a network adapter for your virtual machines.
  8. ScaleOut’s net_perf_factor parameter allows servers to tune their sensitivity to network delays. For example, reducing net_perf_factor may allow VMotion to run without interfering with SOSS. If you decide to tune net_perf_factor, we recommend starting at 75 or 85, and moving downward if there are still issues. Please do not tune net_perf_factor lower than 50, as this would increase recovery time by more than a factor of 2 and delay recovery from legitimate outages. net_perf_factor is described in more detail in the Configuration Parameters topic.

DEP-15: When configuring SOSS with multicast disabled, follow these guidelines

When configuring a newly installed host for use without multicast, you should copy the soss_params.txt file from one of the currently running hosts to the newly installed host prior to connecting to the host group. Otherwise, the currently running store could obtain incorrect global parameters from the newly installed host. (This problem arises because SOSS uses multicast by default, and a newly installed host automatically forms a functional, singleton host group once its network interface is selected. This may allow it to propagate global parameters to other hosts once it connects to the host group.)

Copying the soss_params.txt to the newly installed host prior to first starting the StateServer service also simplifies installation of the new host. It preconfigures the network interface, license key, and the other group hosts which make up the host group so that the host can automatically connect to the group when it is first started. It also ensures that the new host uses the correct global parameters.