Getting Started

Building the Library

It is recommended to use Scala 2.12 and Spark 3.1.1. To build, run the following:

./gradlew build

This will produce a JAR file in the ./dualip/build/libs/ directory.

Tests typically run with the test task. If you want to force-run all tests, you can use:

./gradlew cleanTest test --no-build-cache

Add a DuaLip Dependency to Your Project

Please check Artifactory for the latest artifact versions.

Gradle Example

The artifacts are available in LinkedIn's Artifactory instance and in Maven Central, so you can specify either repository in the top-level build.gradle file.

repositories {
    mavenCentral()
    maven {
        url "https://linkedin.jfrog.io/artifactory/open-source/"
    }
}

Add the DuaLip dependency to the module-level build.gradle file. Here are some examples for multiple recent Spark/Scala version combinations:

dependencies {
    compile 'com.linkedin.dualip:dualip_3.1.1_2.12:2.4.8'
}

dependencies {
    compile 'com.linkedin.dualip:dualip_3.1.1_2.12:2.4.6'
}

dependencies {
    compile 'com.linkedin.dualip:dualip_3.1.1_2.12:2.4.2'
}

Using the JAR File

Depending on the mode of usage, the built JAR can be deployed as part of an offline data pipeline, depended upon to build jobs using its APIs, or added to the classpath of a Spark Jupyter notebook or a Spark Shell instance. For example:

$SPARK_HOME/bin/spark-shell --jars target/dualip_2.12.jar

Usage Examples

Currently the library supports two different solvers:

1. MooSolver: This solves multi-objective optimization problems, which include a few global or cohort-level constraints and are characterized by small number of rows in \(A\) (usually less than one hundred)

2. MatchingSolver: This solves matching problems, where we have a large number of per-item constraints. The number of rows of \(A\) here is quite large and can range up to 1 million.

Both the solvers support a wide range of constraints \(\mathcal{C}_i\) as seen here as well as a wide variety of first-order optimization methods.

There is a unified driver implementation com.linkedin.dualip.solver.LPSolverDriver for both of these problems which serves as the primary entry point.

We currently support a parallel version of the MooSolver, which can solve many separate small Moo problems in parallel. The number of such small Moo problems can range up to tens of millions.

Call com.linkedin.dualip.solver.ParallelLPSolverDriver to leverage the extreme-scale parallelism power of our DuaLip solver.

For detailed usage please see the Parameters and the Demo.