Running Data Pipelines
Data Pipelines allow you to process data stored in one location and send the result to another, such as from a data lake to an analytics database or into a payment processing system. You can also use the same source and sink so the pipeline only processes data.
With the Hazelcast Platform Operator, you can run Data Pipelines from existing JAR files for processing data. Data Pipelines, depending on the data source, can be used for stream or batch processing. To create a Data Pipeline using JetJob CR, the Jet Engine in the Hazelcast CR must be configured. It is required to set enable and resourceUploadEnabled to true.  To understand the Data Pipelines and the Jet Engine, refer to Platform documentation.
Configuring the JetJob Resource
Below are the configuration options for the JetJob resource. You can find more detailed information in API Reference page.
| Field | Description | 
|---|---|
| 
 | Name of the Jet Job to be created. If empty, the CR name will be used. It cannot be updated after the Jet Job is created successfully. | 
| 
 | HazelcastResourceName defines the name of the Hazelcast resource. | 
| 
 | State is used to manage the job state. The default value is 'Running' and its value must be  | 
| 
 | JarName specify the name of the Jar to run that is present on the member | 
| 
 | MainClass is the name of the main class that will be run on the submitted job | 
| 
 | JAR file that is specified in the  
 | 
| 
 | URL from where the file will be downloaded. | 
Providing the JAR file for Data Pipeline
To run the Data Pipeline, you need to provide a JAR file that contains the Pipeline. The JAR file can be pre-downloaded before the cluster starts by configuring the jet.bucketConfig, jet.remoteURLs, or jet.configMaps in the Hazelcast CR. This way, all the files in the bucket will be accessible to the member when the cluster starts.
Another option is to configure bucketConfig or remoteURL in the JetJob CR. This way, only the JAR file specified in the jarName parameter will be downloaded in the runtime before starting the Data Pipeline.
JetJob state management
Once the job is created, you can use state field to manage its lifecycle.
The following state values are available:
- 
Running. All the jobs must be created with theRunningstate. It will run the newly created job or will start theSuspendedjob.
- 
Suspended. Gracefully suspends theRunningjob.
- 
Canceled. Gracefully stops the job.
- 
Restarted. Suspends and resumes the job in one step.
Deleting the JetJob resource will forcefully cancel the job.
Example Configuration
The following JetJob resource runs the Data Pipeline for the Hazelcast resources on the source Hazelcast cluster from my-data-pipeline.jar.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  jet:
    enabled: true
    resourceUploadEnabled: true
    bucketConfig:
      secretName: br-secret-gcp
      bucketURI: "gs://your-bucket/path/to/jars"
  licenseKeySecretName: hazelcast-license-key
---
apiVersion: hazelcast.com/v1alpha1
kind: JetJob
metadata:
  name: jet-job-sample
spec:
  name: my-test-jet-job
  hazelcastResourceName: hazelcast
  state: Running
  jarName: my-data-pipeline.jar