Solving the frequent Fernet Key cryptography error for containerized Apache-Airflow
The Issue
For security purposes, sensitive connection and administrative information is encrypted with a Fernet key before being stored in Airflow's backend database. This includes any passwords for your connection objects as well as service account keys for e.g. Google Cloud.
However, if you have built Airflow webserver as a containerized service, then every time you modify and rebuild your container you run the risk of invalidating your Fernet key and losing access to your connections.
Airflow finds the Fernet key you would like to use from the config file, which by default gets generated and added to
airflow/airflow.cfg
when you first run the airflow initdb
command. There is some insecurity built into this approach, since the key gets hard-coded into the file. For this diagnosis and If you're using the puckel/docker-airflow repository's Dockerfile or docker-compose.yaml as a base for building your Airflow service, then the point at which your Fernet key gets generated is here, in the
scripts/entrypoint.sh
file:: "${AIRFLOW__CORE__FERNET_KEY:=${FERNET_KEY:=$(python -c "from cryptography.fernet \
import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")}}"
That's a pretty sexy, maybe brilliant, one-liner as far as bash-scripted one-liner's go. If you're not familiar with bash script, the breakdown is as follows: :
at the start of the line allows you to define a variable in a script with a default value. Here, entrypoint.sh
creates the AIRFLOW__CORE__FERNET_KEY variable for the script, if the variable does not already exist in the environment. So, you could override it by specifying this variable in your Dockerfile or docker-compose.yaml file with something like ENV AIRFLOW__CORE__FERNET_KEY='some string you generated or made up'
. =${FERNET_KEY:=
This portion assigns the value of FERNET_KEY to AIRFLOW__CORE__FERNET_KEY if it already exists in the environment (maybe you decided to pass it from somewhere else); if it does not exist, then the :=
part here tells bash to make a default value with the one-line call to Python's cryptography
library. python -c ...
you can probably understand if you know at least some Python--the string gets passed as a -c (command) to python, and the print(FERNET_KEY) call prints out the random Fernet Key to stdOUT. $( )
tells bash to evaluate the entire expression in a sub-process and to return the stdOUT output. So, the output of this portion is the Fernet key itself. entrypoint.sh
, but I realized that it is necessary to have two places where the user can manually define it, depending on their use case. AIRFLOW__CORE__FERNET_KEY
is the environment variable the airflow initdb
command will look for when creating the back-end database, and so if the user wants to change it and uses docker-compose, she should set it in the docker-compose file. FERNET_KEY
in the Dockerfile, because that is accessible to entrypoint.sh
, which gets executed at the end of the Dockerfile. $FERNET_KEY
gets assigned to the airflow.cfg
file in line 122:# Secret key to save connection passwords in the db
fernet_key = $FERNET_KEY
The Solution
So, now that we understand exactly what is going on, we can troubleshoot the
cryptography.fernet
error that might appear in your DAG task execution logs when a task fails, as it tries to access the back-end database for connection and other runtime data. This will likely happen every time you rebuild your airflow_webserver container, unless you rebuild the database and it's data dump each time (if you're like me, you prefer to keep it as a persistent volume so you have some permanency to your Airflow execution and scheduling data). The easiest thing to do is just re-enter your connections and other entries that use the Fernet key for cryptographic encoding in the Airflow UI, though if you have many connections, that will become very tedious.
The second easiest thing is to create a task that recreates connections and other database entries you need, scheduled to run
@once
so that you can just trigger it after rebuilding your webserver container.The Python task would look something like this:
from airflow.models import Connection, Variable, Session
import airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow import models, settings
dag = DAG( .... )
def set_connection(**config):
for k,v in config.items():
conn = Connection()
setattr(conn, k, v)
session = settings.Session()
session.add(conn)
session.commit()
session.close()
task = PythonOperator(
dag = dag,
task_id = 'set-connections',
python_callable = set_connection,
...........
)
Of course, try to avoid hard-coding the config for your connections directly into your file. You can store it in a more secure place, such as a dedicated database with encryption, in a gcloud bucket, etc., and pull that connection configuration data into your script. Also, for added security, Airflow connection
objects have a rotate_fernet_key
attribute you can explore to change the encryption in the backend database regularly!Sources
Reference
이 문제에 관하여(Solving the frequent Fernet Key cryptography error for containerized Apache-Airflow), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/ctivan/items/068a26fc6ba25110a87a텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)