Management Considerations

MGT-1: Track memory usage

To maintain high performance and stability, it is important that the SOSS service always runs in physical memory. Otherwise, the service will start paging to disk, which will cause the distributed cache to become unresponsive and to display the yellow “not ready” icon in the SOSS console for one or more hosts (i.e., servers). On Windows, you also can track the amount of available physical memory and the “page fault delta” for the SOSS service process (soss_svr.exe) using the Windows Task Manager; on Red Hat Linux, you can use the System Monitor.

If the amount of available physical memory falls below about 200 MB or if the page fault delta remains non-zero for more than a second or two at a time, the SOSS service has insufficient physical memory. Corrective action needs to be taken immediately. You should provision more physical memory, and/or you can add more hosts to the distributed cache (so that the memory load is spread across more servers). Please see the above section on provisioning memory for an SOSS cache.

Note that SOSS currently does not return all dynamically allocated memory back to the operating system after a period of peak usage based on the assumption that this memory will be needed again. (This will be offered in a future release.) You can control SOSS’s peak memory usage and eviction policy using SOSS’s configuration parameters. This will keep the service’s memory usage constrained to a desired limit.

MGT-2: Use sequential shut down process

Never shut down a server running the SOSS service or restart the service unless you have first issued the SOSS Leave Command (“soss leave” from the command line) and wait until this command has fully completed. Full completion is signaled by the SOSS console when the corresponding host’s icon turns to a red circle (with an embedded square) and the host’s status is marked as inactive. Allow up to two minutes for a Leave command to complete on a heavily loaded SOSS store. If you are shutting down multiple servers, you should allow enough time for each server to fully complete its leave operation before shutting down the next one unless you run the “Leave all hosts” command to stop the entire SOSS store.

If an active SOSS service process is stopped prematurely, SOSS will invoke its recovery mechanisms, which affect cache performance and unnecessarily disrupt normal operations. Using the “Leave” command allows SOSS to synchronize the load rebalancing and membership change with ongoing cache accesses. This ensures smooth rebalancing of the distributed cache with minimum impact on performance.

MGT-3: Avoid SOSS restart as first step in recovery

If you should encounter a problem with an SOSS server, such as seeing a persistent yellow icon in the SOSS console indicating a host that is not ready, do not restart all SOSS hosts as the first step in remedying the problem. Under normal operations, the SOSS console temporarily displays a yellow icon if network congestion and/or memory paging occur when the distributed store is adding replicas or rebalancing the load. If you are using the SOSS console, make sure to highlight the “Local Store” icon in the left-hand tree list. This ensures that the host and store status refreshes every few seconds and that you are seeing the latest indications.

If SOSS detects a server or network outage, the distributed cache automatically recovers in most cases, and this recovery usually completes within several seconds (up to a minute in some circumstances). If a problem persists, first wait several seconds to see if the SOSS console’s “not ready” condition clears on its own. If necessary, next try killing the service process (soss_svr.exe) only on the suspect server and give SOSS several seconds to self-heal. If this fails to resolve the problem and another SOSS host persists in a “not ready” condition, try killing the service process for that host and let the distributed cache self-heal. If the problem persists, it may be necessary to restart the SOSS service process on all hosts; this is rarely necessary.

In no case do you need to reboot the operating system on an affected host; it is always sufficient to kill and restart the SOSS service process. (Note that the SOSS console’s “Restart” command may not be sufficient to kill and restart an SOSS host because it relies on the host to first successfully leave the store before a service restart is attempted.) On Windows, you can kill the service process with the Task Manager and restart it with the “net start soss” command or by using the Windows SCM. On Red Hat Linux, you can use the Services Configuration Tool to kill and restart the sossd daemon. After restarting and rejoining the SOSS service, you may have to restart the local host’s client application. You can use the iisreset command on Windows to restart the IIS Web server.

MGT-4: Avoid simultaneous management changes on multiple machines

You should not simultaneously make changes to configuration parameters by running the management tools on different hosts. By doing so, it is possible that multiple hosts could record inconsistent parameter values. Changes to ScaleOut StateServer’s configuration should only be made using a single management tool at a time.

MGT-5: Avoid dynamically moving virtual servers running SOSS

SOSS supports the use of virtual servers, and many customers make wide use of them with SOSS. However, SOSS’s mechanisms to maintain its membership may be disrupted if a virtual server is dynamically moved to another physical server, for example, by using VMWare’s VMotion utility. SOSS uses point-to-point heartbeat messages to detect membership changes due to server or networking outages, and live virtual server migration can introduce delays that SOSS detects as outages. This may trigger recovery actions and eventually could lead to data loss.

MGT-6: Use a rolling upgrade when upgrading minor and hot fix releases

Starting with version 3.0, SOSS ensures that minor and hot fix releases are backwards compatible with previous minor releases within the same major release. For example, version 3.1.6 is backwards compatible with version 3.1.5. This lets you maintain service to applications while upgrading individual SOSS hosts one at a time.

To perform a rolling upgrade to the next SOSS version, take the following actions for each host in turn, one host at a time. Make sure that you fully complete all steps and then verify that the distributed cache is running normally before upgrading the next host:

  1. Open the SOSS management console on the host, select the local host from the list on the left, and go to the “Host Status” tab.
  2. Click “Leave” to cause the host to leave the distributed cache.
  3. Wait for the host to completely finish leaving. Its icon will change from green with a triangle, through yellow with a minus, to red with a square)
  4. Uninstall SOSS on this host.
  5. Install the new version of SOSS on this host.
  6. Open the SOSS management console and configure the host as necessary. Note that the host should automatically update its license key entry from other SOSS hosts when the SOSS service process restarts and detects an SOSS distributed cache.
  7. In the SOSS console, go to the “Host Status” tab and click the “Join” command to join the distributed cache.
  8. Wait for the host to finish joining. Its icon will change from red with a square, through yellow with a plus, to green with a triangle.
  9. Verify that load-balancing has completed and that the distributed cache is running normally before upgrading the next host.

MGT-7: Follow this process to update a license key for the same major version

SOSS applies a newly installed license key at the time the SOSS service starts up on each server. Also, starting with version 5.4.4, whenever a valid license key is installed using the management tools (i.e., the management console or the command-line control program), the key is propagated to all hosts in the host group, and all active hosts immediately switch to the use of the newly installed key. In earlier version 5 releases, only a newly installed license key with a newer creation date is propagated to all hosts in the host group. It is important to avoid applying an incorrect or expired key which could improperly affect all servers in an active membership.

For version 5.4.4 and later, you can upgrade a license key on all hosts by using the following procedure:

  1. Have the first server leave the store, then update the license key on the first SOSS server using the SOSS management console. When using the command-line control program, it is not necessary for the host to leave the store.
  2. Verify that the license key has the desired properties.
  3. For version 5.4.4 and later, just rejoin the first server to the store if necessary. For earlier version 5 releases, restart the server and then rejoin it to the store.
  4. Verify that the management console’s Host Configuration tab on all servers now reflects the new license key and its associated properties.

You can also use the above procedure for earlier version 5 releases if the license key has a newer creation date than the currently installed key. Otherwise, please use the following rolling upgrade procedure to install an older license key. (This situation could occur when reverting to an older permanent key after installing a temporary key for testing purposes.)

  1. Have the first server leave the store, then update the license key on the first SOSS server using the SOSS management tools.
  2. Verify that the license key has the desired properties.
  3. Restart SOSS on the first server.
  4. When the first server rejoins the store and begins to take object load, restart SOSS on the next server (there is no need to insert the new license key, it will automatically propagate to the restarted server).
  5. Repeat step 3 on each server until all servers have restarted and rejoined the store.
  6. Verify that the Host Configuration tab on each server now reflects the new license properties on each server.

Each service restart should be momentary, so this shouldn’t impact your application any more than a leave/join without a service restart.

Note that when license keys are being changed in connection with a version upgrade of SOSS software, different procedures may be required. Also note that major version 5 license keys are not compatible with major version 4 or earlier keys.

MGT-8: Upgrading a SOSS data grid to Version 5

Upgrading from an earlier version to Version 5 requires a clean install. Your SOSS store needs to be brought completely down to accomplish this. Due to the significant improvements in Version 5, a rolling upgrade is not available and a new license key is required. The steps to upgrade are as follows:

  1. Open the SOSS Management Console on one of the hosts in your store and select “Leave All Hosts” from the Store menu.
  2. Wait for all hosts to completely finish leaving. The icons will change from green with a triangle, through yellow with a minus, to red with a square.
  3. Close the SOSS Console.
  4. Uninstall SOSS on each host in the store.
  5. Install version 5.0 of SOSS on the first host.
  6. Configure the first host as necessary being sure to add the 5.0 license key.
  7. Install 5.0 and configure each of the remaining hosts one at a time. Note that the hosts should automatically update the license key entry from the other SOSS hosts when the SOSS service process restarts and detects the SOSS distributed cache.
  8. In the SOSS Management Console, go to the Store menu and select “Join All Hosts”.
  9. Wait for the hosts to finish joining. The icons will change from red with a square, through yellow with a plus, to green with a triangle.
  10. In the SOSS Management Console, select the “Test Store” command from the Store menu.