Managing Jobs

Once a job is submitted, it has its own lifecycle on the cluster which is distinct from the submitter. To manage the lifecycle of jobs, you can use either SQL or CLI commands to list, cancel, suspend, resume, and restart them.

Listing Jobs

Use the list-jobs command to get a list of all jobs running in the cluster:

bin/hz-cli list-jobs

Example result:

ID                  STATUS             SUBMISSION TIME         NAME
0401-9f77-b9c0-0001 RUNNING            2020-03-07T15:59:49.234 hello-world

You can also list completed jobs by adding the -a parameter:

bin/hz-cli list-jobs -a

Example result:

ID                  STATUS             SUBMISSION TIME         NAME
0402-de9d-35c0-0001 RUNNING            2020-03-08T15:14:11.439 hello-world-v2
0402-de21-7f00-0001 FAILED             2020-03-08T15:12:04.893 hello-world

SHOW JOBS;

Example result:

+--------------------+
|name                |
+--------------------+
|hello-world         |
+--------------------+

For more details about this statement, see the SQL reference documentation.

Canceling Jobs

Streaming jobs run indefinitely until canceled. To stop a job, you must cancel it.

bin/hz-cli cancel hello-world

Example result:

Canceling job id=0402-de21-7f00-0001, name=hello-world, submissionTime=2020-03-08T15:12:04.893
Job canceled.

When a job is canceled, the snapshot for the job is lost and the job can’t be resumed. Canceled jobs have a failed status.

DROP JOB IF EXISTS hello-world;

Result:

OK

When a job is canceled, the snapshot for the job is lost and the job can’t be resumed. Canceled jobs have a failed status.

To save a snapshot of the job, use the WITH SNAPSHOT clause.

For more details about this statement, see the SQL reference documentation.

Suspending and Resuming Jobs

Suspending and resuming jobs can be useful for example when you need to perform maintenance on a data source or sink without disrupting a running job.

When a job is suspended, all the metadata about the job is kept in the cluster. A snapshot of the job’s computational state is taken during a suspend operation and then once resumed, the job is gracefully started from the same snapshot.

To suspend and resume a job, it must be configured with a processing guarantee. To learn more about setting a processing guarantee, see Configuring Jobs.

Use the suspend <job_name_or_id> and resume <job_name_or_id> commands to suspend and resume jobs:

bin/hz-cli suspend hello-world

Example result:

Suspending job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job suspended.

bin/hz-cli resume hello-world

Example result:

Resuming job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job resumed.

To configure a job to be suspended automatically if its execution fails, see JobConfig.setSuspendOnFailure.

Use the ALTER JOB statement to suspend and resume jobs:

ALTER JOB hello-world SUSPEND;

Result:

OK

ALTER JOB hello-world RESUME;

For more details about this statement, see the SQL reference documentation.

Restarting Jobs

Restarting a job allows you to suspend and resume it in one step. This can be useful when you want to have control over when the job should be scaled. For example, if a job’s auto-scaling option is disabled and you add 3 nodes to a cluster you can manually restart the job at the desired point to make sure that all the new nodes can run it.

bin/hz-cli restart hello-world

Example result:

Restarting job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...

ALTER JOB hello-world RESTART;

Result:

OK

For more details about this statement, see the SQL reference documentation.

Managing Jobs

Listing Jobs

Canceling Jobs

Suspending and Resuming Jobs

Restarting Jobs

Help and support