ML Pipelines
ML Pipelines in AML allows you to group multiple parts of your Machine Learning process and group it into one pipeline. For example, a pipeline could consist of feature preprocessing, model training, model evaluation and finally model registration. A pipeline wraps these steps into one self-contained unit, that you can either run on-demand through AML or expose as a RESTful API. The latter will allow other users or application to trigger this pipeline and run it. By adding parameters, this pipeline can be made dynamic, e.g., allowing to feed in new data as it becomes available.
In AML, you can create pipelines in 3 different ways:
- Python-based (covered in this repo)
- YAML-based (covered in this repo)
- Designer-based (graphically, not covered in this repo)
Comparision
Pipeline type | Use Cases | Limitations |
---|---|---|
Python-based pipelines | Recommended for complex use cases with many steps | A bit more code |
YAML-based pipelines | Quick to get started, great for simple use cases with few steps | Less flexible |
YAML-based Pipelines
Executing training in a ML Pipeline
- Execute training in a ML Pipeline
- Open the terminal and navigate to the
pipelines-yaml/train
folder - Open
pipeline.yml
in your editor and adapt the necessary fields, which are most likely:default_compute
- point to the AML Compute cluster created earlierdataset_name
- point to your dataset registerd earlierscript_name
- point to your training scriptarguments
- adapt to point to the data path and add further arguments (similar to the training stage)
- (Optional) Open
runconfig.yml
and adapt if needed (e.g., using a different Docker image or Conda env) - From the command line, you can now run the training in a pipeline (asynchronously):
az ml run submit-pipeline -n training-pipeline-exp -y pipeline.yml
- For more details on how to publish the pipeline, check the README.md
- Open the terminal and navigate to the
Executing batch inferencing in a ML Pipeline
- Execute training in a ML Pipeline
- Open the terminal and navigate to the
pipelines-yaml/batch-inference
folder - Open
pipeline.yml
in your editor and adapt the necessary fields, which are most likely:default_compute
- point to the AML Compute cluster created earlierdataset_name
- point to your batch scoring dataset registerd earlierscript_name
- point to your training scriptarguments
- adapt to point to the data path and add further arguments (similar to the training stage)
- (Optional) Open
runconfig.yml
and adapt if needed - since we are running this job in parallel, the parameters are bit differnt - (Optional) Open the
parallel_run_env
folder and adapt the conda env, etc. if needed - From the command line, you can now run the training in a pipeline (asynchronously):
az ml run submit-pipeline -n batch-inferencing-pipeline-exp -y pipeline.yml
- For more details on how to publish the pipeline, check the README.md
- Open the terminal and navigate to the
Python-based Pipelines
For the Python-based pipelines, check the instructions in the README.md.
Great, we got our training and batch scoring running in a ML pipeline. Let's move on to the next section for automating it.