Performance

Performance Benchmark

Environment Setup

(1) Machine Setup

In this test, we use the server with the following setup:

Operating System and Software:

	Red Hat Enterprise Linux workstation
	Release 6.1 (Santiago)
	Kernel Linux 2.6.32-131.4.1.el6.x86_64
	GNOME 2.28.2
	MySQL Server version: 5.1.52

Hardware:

	8-core Intel Xeon CPU  E5620 @ 2.40 GHz
	Memory: 64GB
	Disksize: 350GB

(2) System Startup Configuration

We used default startup configuration for both Sensei and MySQL. However, Sensei requires custom property file for each application. So to start Sensei, checkout the sensei code from https://github.com/linkedin/sensei/downloads, Sensei comes with a car demo setup, so replace the car demo configuration file with this one: https://github.com/sguo/search-perf/tree/master/configs/mysql/cars/conf, and replace the original car demo data with a much bigger 3 million dataset(we will talk about how the data source is generated in the following section). change directory to Sensei home directory and start sensei as below,

  $ bin/start-sensei-node.sh  example/cars/conf/

( By default, Sensei will start up with 1GB memory: HEAP_OPTS="-Xmx1g -Xms1g -XX:NewSize=256m" )

(3) Data Source

We use car data to generate the testing dataset, and the testing dataset will be used by both MySQL and Sensei. To do this, firstly download the car demo dataset from: https://raw.github.com/linkedin/sensei/master/example/cars/data/cars.json then we replicated car data 200 times to get to 3M docs. The python script to do this could be downloaded here: https://github.com/kwei/search-perf/tree/master/data. To generate the large 3 million testing dataset, simply run:

  python expandData.py cars.json cars3m.json 3000000

It will create the testing data which contains 3 million lines. we also created a MySql table:

         CREATE TABLE car(
             id INT not null AUTO_INCREMENT,
             category VARCHAR(30),
             city VARCHAR(255),
             mileage INT,
             tags VARCHAR(255),
             color VARCHAR(15),
             price FLOAT,
             groupid LONG,
             makemodel VARCHAR(255),
             year INT,
             contents VARCHAR(255),
             PRIMARY KEY (id),
             fulltext (contents)
         )
mysql> desc cars;
	+-----------+--------------+------+-----+---------+----------------+
	| Field     | Type         | Null | Key | Default | Extra          |
	+-----------+--------------+------+-----+---------+----------------+
	| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
	| category  | varchar(30)  | YES  |     | NULL    |                |
	| city      | varchar(255) | YES  |     | NULL    |                |
	| mileage   | int(11)      | YES  |     | NULL    |                |
	| tags      | varchar(255) | YES  |     | NULL    |                |
	| color     | varchar(15)  | YES  |     | NULL    |                |
	| price     | float        | YES  |     | NULL    |                |
	| groupid   | mediumtext   | YES  |     | NULL    |                |
	| makemodel | varchar(255) | YES  |     | NULL    |                |
	| year      | int(11)      | YES  |     | NULL    |                |
	| contents  | varchar(255) | YES  | MUL | NULL    |                |
	+-----------+--------------+------+-----+---------+----------------+
	11 rows in set (0.00 sec)

We insert the 3 million testing dataset into MySQL database. Note that columns color and price have indexes, and contents is a fulltext field containing tokens from: color,category,tags, makemodel, and city.

To create index for the two columns of the table, we can execute the following two SQL statements:

	mysql> create index index_color on car (color);
	Query OK, 3000000 rows affected (2 min 42.32 sec)
	Records: 3000000  Duplicates: 0  Warnings: 0
	mysql> create index index_price on car (price);
	Query OK, 3000000 rows affected (2 min 47.14 sec)
	Records: 3000000  Duplicates: 0  Warnings: 0

Data Inserting Test

Inserting data results (3M records):

    MySQL:  Time spent:1469924 ms ( 2040.92 rows per second)
    Sensei: Time spent:449413 ms  ( 6675.37 rows per second)

To insert data into MySQL server, we used the MySQL java connector, and write the custom code to measure the inserting time.

To insert data into Sensei, we have configured the Gateway in the configuration file, so when Sensei is started, it will automatically insert data into the system.

Both Sensei and MySQL insert the data one by one for time measurement convenience.

Query Latency Test

We evaluated Sensei query latency in two ways. First we tested the latency while no inserting data statement is executed, then we tested the query latency while another thread keeps inserting data into the table.

(1) Query latency without inserting data

We did the following 10 types of query latency test. Each result covers a certain type of queries.

Summary

# Query Type MySQL Sensei (JSON API) Sensei (BQL)
1 Simple select with a range query, including result count 1.19 s 34 ms 33 ms
2 Sorting on one field with no index 2.35 s 40 ms 37 ms
3 Sorting on one indexed field 0.00 s 40 ms 39 ms
4 Sorting on two indexed fields 2.31 s 45 ms 49 ms
5 Sorting on three fields, 2 index, 1 not indexed 2.66 s 46 ms 45 ms
6 Group By on a field with an index 5.07 s 72 ms / 46 ms (facet-only) 60 ms / 47 ms (facet-only)
7 Group By on a field without an index 3.59 s 49 ms / 40 ms (facet-only) 41 ms / 44 ms (facet-only)
8 With text search 1.11 s 44 ms 45 ms
9 With text search, with sort 3.78 s 46 ms 46 ms
10 With text search, with sort and group by 3.71 s 65 ms / 52 ms (facet-only) 64 ms / 52 ms (facet-only)

Details

(2) Query latency while inserting data

We also did the following 10 types of query latency test while keep inserting another 3 million data items. Each result also covers a certain type of queries.

Since the testing MySQL query, Sensei JSON and BQL statements are the same, we only list the latency values in the following summary table.

# Query Type MySQL Sensei (JSON API) Sensei (BQL)
1 Simple select with a range query, including result count 1.40 s 28 ms 28 ms
2 Sorting on one field with no index 2.78 s 37 ms 36 ms
3 Sorting on one indexed field 0.00 s 40 ms 38 ms
4 Sorting on two indexed fields 3.92 s 56 ms 57 ms
5 Sorting on three fields, 2 index, 1 not indexed 4.59 s 63 ms 63 ms
6 Group By on a field with an index 14.91 s 114 ms / 65 ms (facet-only) 94 ms / 64 ms (facet-only)
7 Group By on a field without an index 6.58 s 78 ms / 69 ms (facet-only) 56 ms / 70 ms (facet-only)
8 With text search 8.44 s 79 ms 82 ms
9 With text search, with sort 10.09 s 82 ms 82 ms
10 With text search, with sort and group by 10.72 s 122 ms / 98 ms (facet-only) 123 ms / 95 ms (facet-only)

Sensei Multi-Thread Performance Test

We also conducted the multi-thread performance test for Sensei. The following pictures show our experimental results:

Latency test (Multi-thread):

Latency test (Multi-thread)

QPS test (Multi-thread):

QPS test (Multi-thread)

The testing query in JSON format is:

	{
	    "partitions":[
	        
	    ],
	    "facetInit":{
	        
	    },
	    "facets":{
	        "tags":{
	            "max":100000,
	            "order":"hits",
	            "minHit":0,
	            "expand":true
	        },
	        "category":{
	            "max":100000,
	            "order":"hits",
	            "minHit":0,
	            "expand":true
	        },
	        "price":{
	            "max":100000,
	            "order":"hits",
	            "minHit":0,
	            "expand":true
	        }
	    },
	    "fetchStored":false,
	    "selections":[
	        {
	            "terms":{
	                "category":{
	                    "values":[
	                        "sub-compact",
	                        "luxury",
	                        "station wagon",
	                        "compact"
	                    ],
	                    "excludes":[
	                        "sedan",
	                        "mini-van",
	                        "sports car"
	                    ],
	                    "operator":"or",
	                    "_noOptimize":false
	                }
	            }
	        },
	        {
	            "terms":{
	                "tags":{
	                    "values":[
	                        "automatic",
	                        "chick magnet",
	                        "reliable",
	                        "cool"
	                    ],
	                    "excludes":[
	                        "expensive",
	                        "electric",
	                        "highend"
	                    ],
	                    "operator":"or",
	                    "_noOptimize":false
	                }
	            }
	        },
	        {
	            "range":{
	                "price":{
	                    "to":"6700",
	                    "include_lower":true,
	                    "include_upper":true,
	                    "from":"*"
	                }
	            }
	        }
	    ],
	    "explain":false,
	    "sorts":[
	        
	    ],
	    "termVectors":[
	        
	    ]
	}