Running flows with Docker¶
In the Deployments and Storage and Infrastructure tutorials, we looked at creating configuration that enables creating flow runs via the API and with code that was uploaded to a remotely accessible location.
In this tutorial, we'll further configure the deployment so flow runs are executed in a Docker container. We'll run our Docker instance locally, but you can extend this tutorial to run it on remote machines.
In this tutorial we'll:
- Configure a Docker Container infrastructure block that enables creating flow runs in a container.
- Build and apply a new
log_flow.py
deployment that uses the new infrastructure block. - Create a flow run from this deployment that spins up a Docker container and executes, logging a message.
Prerequisites¶
To run a deployed flow in a Docker container, you'll need the following:
- We'll use the flow script and deployment from the Deployments tutorial.
- We'll also use the remote storage block created in the Storage and Infrastructure tutorial.
- You must run a standalone Prefect server (
prefect server start
) or use Prefect Cloud. - You'll need Docker Engine installed and running on the same machine as your agent.
Docker Desktop works fine for local testing if you don't already have Docker Engine configured in your environment.
Run a Prefect server
This tutorial assumes you're already running a Prefect server with prefect server start
, as described in the Deployments tutorial.
If you shut down the server from a previous tutorial, you can start it again by opening another terminal session and starting the Prefect server with the prefect server start
CLI command.
Create an infrastructure block¶
Most users will find it easiest to configure new infrastructure blocks through the Prefect server or Prefect Cloud UI.
You can see any previously configured storage blocks by opening the Prefect UI and navigating to the Blocks page. To create a new infrastructure block, select the + button on this page. Prefect displays a page of available block types. Select run-infrastructure from the Capability list to filter just the infrastructure blocks.
Use these base blocks to create your own infrastructure blocks containing the settings needed to run flows in your environment.
For this tutorial, find the Docker Container block, then select Add + to see the options for a Docker infrastructure block.
To configure this Docker Container block to run the log_flow.py
deployment, we just need to add two pieces of information.
First, give the block a Block Name. We used "log-tutorial".
Second, we need to make sure the container includes any additional files, libraries, or configuration to run log_flow.py
. By default, Prefect uses a preconfigured container that includes installations of Python and Prefect.
In the Storage and Infrastructure tutorial, recall that we needed to pip install s3fs
the library for an S3 storage block. You'll need to include the same command in the configuration of the Docker Container infrastructure block. When the agent spins up a container for a flow run, it will know to install the s3fs
package before starting the flow run.
As a convenience, we can use the EXTRA_PIP_PACKAGES
environment variable to install dependencies at runtime. If defined, pip install ${EXTRA_PIP_PACKAGES}
is executed before the flow run starts.
In the Env (Optional) box, enter the following to specify that the s3fs
package should be installed. Note that we use JSON formatting to specify the environment variable (key) and packages to install (value).
{
"EXTRA_PIP_PACKAGES": "s3fs"
}
Using infrastructure blocks with deployments¶
To use an infrastructure block when building a deployment, the process is similar to using a storage block. You can specify a custom infrastructure block to the prefect deployment build
command with the -ib
or --infra-block
options, passing the type and name of the block in the in the format type/name
, with type
and name
separated by a forward slash.
type
is the type of storage block, such asdocker-container
,kubernetes-job
, orprocess
.name
is the name you specified when creating the block.
The prefect deployment build
command also supports specifying a built-in infrastructure type prepopulated with defaults by using the --infra
or -i
options and passing the name of the infrastructure type: docker-container
, kubernetes-job
, or process
.
Build a deployment with Docker infrastructure¶
To demonstrate using an infrastructure block, we'll create a new variation of the deployment for the log_flow
example from the deployments tutorial. For this deployment, we'll include the following options to the prefect deployment build
command:
- Use the storage block created in the Storage and Infrastructure tutorial by passing
-sb s3/log-test
or--storage-block s3/log-test
. - Use the infrastructure block created earlier by passing
-ib docker-container/log-tutorial
or--infra-block docker-container/log-tutorial
.
$ prefect deployment build ./log_flow.py:log_flow -n log-flow-docker -sb s3/log-test -ib docker-container/log-tutorial -q test -o log-flow-docker-deployment.yaml
Found flow 'log-flow'
Successfully uploaded 4 files to s3://bucket-full-of-sunshine/flows/test
Deployment YAML created at
'/Users/terry/prefect-tutorial/log-flow-docker-deployment.yaml'.
What did we do here? Let's break down the command:
prefect deployment build
is the Prefect CLI command that enables you to prepare the settings for a deployment../log_flow.py:log_flow
specifies the location of the flow script file and the name of the entrypoint flow function, separated by a colon.-n log-flow-docker
specifies a name for the deployment. For ease of identification, the name includes a reference to the Docker infrastructure.-sb s3/log-test
specifies a storage block by type and name. If you used a different storage block type or block name, your command may be different.-ib docker-container/log-tutorial
specifies an infrastructure block by type and name.-q test
specifies a work queue for the deployment. Work pools direct scheduled runs to agents.-o log-flow-docker-deployment.yaml
specifies the name for the deployment YAML file. We do this to create a new deployment file rather than overwriting the previous one.
Apply the deployment¶
Now we can apply the deployment YAML to create the deployment on the API.
$ prefect deployment apply log-flow-docker-deployment.yaml
Successfully loaded 'log-flow-docker'
Deployment 'log-flow/log-flow-docker' successfully created with id
'a52fe285-d646-4e57-affd-257acf92782a'.
To execute flow runs from this deployment, start an agent that pulls work from the 'test'
work queue:
$ prefect agent start -q 'test'
Open the Prefect UI at http://127.0.0.1:4200/ and select the Deployments page. You'll see a list of all deployments that have been created in this Prefect server instance, including the new log-flow/log-flow-docker
deployment.
Edit the deployment in the UI¶
log_flow
expects a runtime parameter for its greeting, and we didn't provide one as part of this deployment yet. We could edit log-flow-docker-deployment.yaml
to add a parameter and apply the edited YAML to update the deployment on the API.
Instead, let's edit the deployment through the Prefect UI. Select log-flow/log-flow-docker to see the deployment's details.
Select the menu next to Run, then select Edit to edit the deployment.
Scroll down to the Parameters section and provide a value for the name
parameter. We used "Ford Prefect" here.
Select Save to save these changes to the deployment.
Create a flow run in Docker¶
When you create flow runs from this deployment, the agent pulls the default Prefect Docker container, pip installs
the prerequisites we specified, retrieves the flow script from remote storage, and starts the Prefect engine to execute the flow run.
Let's create a flow run for this deployment. The flow run will execute in a Docker container on your local machine.
Run a Prefect agent
This tutorial assumes you're already running a Prefect agent with prefect agent start
, as described in the Deployments tutorial.
If you shut down the agent from a previous tutorial, you can start it again by opening another terminal session and starting the agent with the prefect agent start -q test
CLI command. This agent pulls work from the test
work queue created previously.
Note also that the PREFECT_API_URL
setting should be configured to point to the URL of your Prefect server or Prefect Cloud.
If you're running the agent in the same environment or machine as your server, it should already be set. If not, run this command to set the API URL to point at the Prefect instance just started:
$ prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api
Set variable 'PREFECT_API_URL' to 'http://127.0.0.1:4200/api'
Updated profile 'default'
You can check the settings for your environment with the prefect config view
CLI command.
# View current configuration
$ prefect config view
PREFECT_PROFILE='default'
PREFECT_API_URL='http://127.0.0.1:4200/api' (from profile)
On the deployment details page, select Run, then select Now with defaults. This creates a new flow run using the default parameters and other settings.
Go to the terminal session running the Prefect agent. You should see logged output showing:
- The agent submitting the flow run.
- The Docker container being created.
- Installation of the storage library.
- The task run creating log messages.
- The flow run completing.
- The Docker container closing down.
23:19:52.252 | INFO | prefect.agent - Submitting flow run '2d520993-3697-4105-987f-70398e2a65fe'
23:19:52.449 | INFO | prefect.infrastructure.docker-container - Creating Docker container 'woodoo-peacock'...
23:19:53.034 | INFO | prefect.agent - Completed submission of flow run '2d520993-3697-4105-987f-70398e2a65fe'
23:19:53.065 | INFO | prefect.infrastructure.docker-container - Docker container 'woodoo-peacock' has status 'running'
+pip install s3fs
Collecting s3fs
Downloading s3fs-2022.7.1-py3-none-any.whl (27 kB)
...
03:20:02.773 | INFO | Flow run 'woodoo-peacock' - Created task run 'log_task-99465d2b-0' for task 'log_task'
03:20:02.774 | INFO | Flow run 'woodoo-peacock' - Executing 'log_task-99465d2b-0' immediately...
03:20:02.808 | INFO | Task run 'log_task-99465d2b-0' - Hello Ford Prefect!
03:20:02.808 | INFO | Task run 'log_task-99465d2b-0' - Prefect Version = 2.2.0 🚀
03:20:02.837 | INFO | Task run 'log_task-99465d2b-0' - Finished in state Completed()
03:20:02.869 | INFO | Flow run 'woodoo-peacock' - Finished in state Completed('All states completed.')
23:20:03.410 | INFO | prefect.infrastructure.docker-container - Docker container 'woodoo-peacock' has status 'exited'
In the Prefect UI, go to the Flow Runs page and select the flow run. You should see the "Hello Ford Prefect!" log message created by the flow running in the Docker container!
Cleaning up¶
When you're finished, just close the Prefect UI tab in your browser, and close the terminal sessions running the Prefect server and agent.