Pipelines

Instructions

From within the directory of each pipeline, you can run it via:

python pipeline.py <arguments>

train pipeline

The train pipeline deployment script accepts the following command line arguments for publishing the training pipeline:

Argument Description
--pipeline_name Name of the pipeline that will be deployed
--build_number (Optional) The build number
--dataset References the default dataset by name in the AML workspace that should be used for training
--runconfig Path to the runconfig that configures the training
--source_directory Path to the source directory containing the training code

Example:

python pipeline.py --pipeline_name training_pipeline --dataset german-credit-dataset --runconfig pipeline.runconfig --source_directory ../../models/model1/

The published pipeline can be called via its REST API, so it can be triggered on demand, when you wish to retrain. Furthermore, you can use an orchestrator of your choice to trigger them, e.g., you could directly trigger it from Azure Data Factory when new data got processed. You may follow this tutorial.

batch-inference pipeline

The batch-inference pipeline deployment scripts accepts the following command line arguments for publishing the batch inferencing pipeline:

Argument Description
--pipeline_name Name of the pipeline that will be deployed
--build_number (Optional) The build number
--dataset References the default dataset by name in the AML workspace that should be used for batch inferencing
--model_name References the model by name which should be used for batch inferencing
--runconfig Path to the runconfig that configures the training

Example:

python pipeline.py --pipeline_name batch_inferencing_pipeline --dataset german-credit-batch --model_name german-credit --runconfig pipeline.runconfig

The published pipeline can be called via its REST API, so it can be triggered on demand, when you wish to retrain. The destination where to store the results of the batch scoring process can be changed in the code. Furthermore, you can use an orchestrator of your choice to trigger them, e.g., you could directly trigger it from Azure Data Factory when new data got processed. You may follow this tutorial.