D2 Zookeeper Configuration Properties

Tiers of Configuration
Cluster Level Properties
partitionProperties Level Properties
Service Level Properties
transportClient Level Properties
degraderProperties Level Properties
loadBalancerStrategy Level Properties

Tiers of Configuration

There are tiers of configuration in D2. This is how we structure our configuration.

List of all clusters

cluster A

cluster level configuration (see below for more info)
services (all the services that belong to cluster A)

service A-1

service level properties (see below for more info)
loadBalancerStrategyProperties

"loadBalancerStrategy" level properties
http.loadBalancer.updateIntervalMs
http.loadBalancer.globalStepDown
other load balancer properties

transportClientProperties

"transportClient" level properties
http.maxResponseSize
http.shutdownTimeout
other transport client properties

degraderProperties

"degraderProperties" level properties
degrader.lowLatency
degrader.maxDropDuration
other degrader properties

service A-2

service level properties

other services under cluster A

partitionProperties for cluster A

"partitionProperties" level properties
partitionType
partitionKeyRegex
other partitionProperties level properties

cluster B

partitionProperties for cluster B (optional)

services (all the services that belong to cluster B)

service B-1
service B-2
etc

cluster C
cluster D
etc

As you can see, there are multiple tiers for configuration. Next we'll enumerate all the levels and the configurations that belong to that level.

Cluster Level Properties

Property Name	Description
partitionProperties	A map containing all the properties to partition the cluster. (See below for more details)
services	A list of d2 services that belong to this cluster.

partitionProperties Level Properties

Property Name	Description
partitionType	The type partitioning your cluster use. Valid values are RANGE and HASH.
partitionKeyRegex	The regex pattern used to extract key out of URI.
partitionSize	Only if you choose partitionType RANGE. The size of the partition i.e. what the is the size of the RANGE in one partition
partitionCount	How many partition in the clusters
keyRangeStart	Only if you choose partitionType RANGE. This is the number where the key starts. Normally we start at 0.
hashAlgorithm	Only if you choose partitionType HASH. You have to give the type of hash. Valid values are MODULE and MD5.

Service Level Properties

Property Name	Description
loadBalancerStrategyList	The list of Strategies that you want to use in your LoadBalancer. Valid values are random, degraderV2, degraderV3. Only degraderV3 support partitioning. Random load balancer just choose any random server to send the request to. So you can't do sticky routing if you choose random load balancer.
path	The context path of your service
loadBalancerStrategyProperties	The properties of D2 LoadBalancer.
transportClientProperties	A map of all properties related on the creation transport client
degraderProperties	Properties of D2 Degrader. Basically it's a map of all properties related to how D2 perceives a single server's health so D2 can redirect traffic to healthier server. Contrast this to LoadBalancer properties which is used to determine the health of the entire cluster. The difference is, if the health of cluster deteriorate, d2 will start dropping requests instead of redirecting traffic.
banned	A list of all the servers that shouldn't be used.

transportClient Level Properties

Properties used to create a client to talk to a server.

Property Name	Description
http.queryPostThreshold	The max length of a URL before we convert GET into POST because the server buffer header size maybe limited. Default is Integer.MAX_VALUE (a.k.a not enabled).
http.poolSize	Maximum size of the underlying HTTP connection pool. Default is 200.
http.requestTimeout	Timeout, in ms, to get a connection from the pool or create one, send the request, and receive a response (if applicable). Default is 10000.
http.idleTimeout	Interval, in ms, after which idle connections will be automatically closed. Default is 25000.
http.shutdownTimeout	Timeout, in ms, the client should wait after shutdown is initiated before terminating outstanding requests. Default is 10000.
http.maxResponseSize	Maximum response size, in bytes, that the client can process. Default is 2 MB.

degraderProperties Level Properties

Note that each degrader is used to represent a server among many servers in a cluster.

Property Name	Description
degrader.name	Name that will show up in the logs (make debugging easier)
degrader.logEnabled	Whether or not logging is enabled in degrader
degrader.latencyToUse	What kind of latency to use for our calculation. We support AVERAGE (default), PCT50, PCT90, PCT95, PCT99
degrader.overrideDropDate	What fraction of the call should be dropped. A value larger than 0 means this client will permanenty drop that fraction of the calls. Default is -1.0.
degrader.maxDropRate	The maximum fraction of calls that can be dropped. A value of greater or equal than 0 and less than 1 means we cannot degrade the client to drop all calls if necessary. Default is 1.0.
degrader.maxDropDuration	The maximum duration, in ms, that is allowed when all requests are dropped. For example if maxDropDuration is 1 min and the last request that should not be dropped is older than 1 min, then the next request should not be dropped. Default is 60000.
degrader.upStep	The drop rate incremental step every time a degrader crosses the high water mark. Default is 0.2.
degrader.downStep	The drop rate decremental step every time a degrader recover below the low water mark. Default is 0.2.
degrader.minCallCount	The minimum number of calls needed before we use the tracker statistics to determine whether a client is healthy or not. Default is 5.
degrader.highLatency	If the latency of the client exceeds this value then we'll increment the computed drop rate. The higher the computed drop rate, the less the traffic that will go to this server. Default is 3000.
degrader.lowLatency	If the latency of the client is less than this value then we'll decrement the computed drop rate. The lower the computed drop rate, the more the traffic will go to this server. Default is 500
degrader.highErrorRate	If the error rate is higher than this value then we'll increment the computed drop rate which cause less traffic to this server.
degrader.lowErrorRate	If the error rate is lower that this value then we'll decrement the computed drop rate which in turn will cause more traffic to this server.
degrader.highOutstanding	If the number of outstanding call is higher than this value then we'll increment the computed drop rate which causes less traffic to this server. Default is 10000.
degrader.lowOutstanding	If the number of outstanding call is lower than this value then we'll decrement the computed drop rate which causes more traffic to this server. Default is 500.
degrader.minOutstandingCount	The number of outstanding calls sohuld be greater or equal than this value for the degrader to use the average outstanding latency to determine if high and low watermark condition has been met. High and low water mark conditions are any of these: errorRate, latency and outstandingCount. Default is 5.
degrader.overrideMinCallCount	If overriden, we will use this value as the minimum number of calls needed before we compute drop rate. Default is -1.

loadBalancerStrategy Level Properties

Properties for load balancers. This affects all servers in a cluster.

Property Name	Description
http.loadBalancer.hashMethod	What kind of hash method we should use (this is relevant to stickiness). Valid values are none or uriRegex
http.loadBalancer.hashConfig	If you declare this, you need to define the regexes list that we need to use to parse the URL
http.loadBalancer.updateIntervalMs	Time interval that the load balancer will update the state (meaning should load balancer, rebalance the traffic, should it increase the drop rate, etc). Default value is 5000.
http.loadBalancer.pointsPerWeight	The max number of points a client get in a hashring per 1.0 of weight. Default is 100. Increasing this number will increase the computation needed to create a hashring but lead to more even-ness in the hashring.
http.loadBalancer.lowWaterMark	If the cluster average latency, in ms, is lower than this, we'll reduce the entire cluster drop rate. (This will affect all the clients in the same cluster regardless whether they are healthy or not). Default value is 500.
http.loadBalancer.highWaterMark	If the cluster average latency is higher than this, in ms, we'll increase the cluster drop rate.(This will affect all the clients in the same cluster regardless whether they are healthy or not). Default value is 3000.
http.loadBalancer.initialRecoveryLevel	Once a cluster gets totally degraded, this is the baseline that the cluster use to start recovering. Let's say a healthy client has 100 points in a hashring. At a complete degraded state, it has 0 point. Let's say the initial recovery level is 0.005, that means the client get 0.5 point not enough to be reintroduced (because a client need at least 1 point). Default value is 0.01.
http.loadBalancer.ringRampFactor	Once a cluster is in the recovery mode, this is the multiplication factor that we use to increase the number of point for a client in the ring. For example: a healthy client has 100 points in a hashring. It's completely degraded now with 0 points. The initialRecoveryLevel is set to 0.005 and ringRampFactor is set to 2. So during the #1 turn of recovery we get 0.5 point. Not enough to be reintroduced into the ring. But at #2 turn, because ringRampFactor is 2, then we get 1 point. Turn #3 we get 2 points, etc. Default value is 1.
http.loadBalancer.globalStepUp	The size of step function when incrementing drop rate in the cluster. Default value is 0.2. Example if globalStepUp = 0.2 drop rate is 0.0 then becomes 0.2 then becomes 0.4 etc as the cluster gets more degraded
http.loadBalancer.globalStepDown	Same as http.loadBalancer.globalStepUp except this is for decrementing drop rate