Running Commands on Google Compute Engine Using Apache Airflow
Somewhere in Taichung. Shot with Pentax KX and Agfa APX 400.
This'll probably be a short one.
I was implementing an Apache Airflow DAG to automate the execution of machine learning models on Google Compute Engine instances, out team decided that a command sent to the instance via a SSH connection is the easiest way to do this. However, I was having trouble establishing the connection, and the official documentation didn't provide too much help.
I eventually was able to get things to work by piecing together informations from several StackOverflow posts. I thought I'll create this page so that whoever would need to implement the same feature could save some headache.
On The Airflow Side
We are mainly relying on two Airflow operators: SSHOperator()
and ComputeEngineSSHHook()
. To run a command via SSH, you would need to setup the operator like this:
<name of task> = SSHOperator(
task_id="<descriptive name for the UI>",
ssh_hook=ComputeEngineSSHHook(
instance_name='<GCE instance name>',
zone='<zone the machine is in>', // e.g. 'us-east4-c'
project_id=<GCP project id>,
use_oslogin=True, // IMPORTANT
use_iap_tunnel=False
),
command="<some command>",
dag=dag
)
Note that we are setting use_oslogin
to True
here. You can leave the rest of the parameter to use the default value.
On the Compute Engine Side
The thing that you really need to pay attention to is to set a custom metadata field "enable-oslogin" when you are creating the instance.
You could also change this value after the machine is created. To do this, open the dashboard for the Google Compute Instances and click on the instance you wish to connect to.
Then click the "Edit" button on the top.
Then scroll down to the metadata section and add the key "use_oslogin", and set the value to "True".
With these, you should be able to run ssh commands from your Airflow DAGs!