Solving the frequent Fernet Key cryptography error for containerized Apache-Airflow

8823 단어 Python Docker cryptography airflow ETL

The Issue

For security purposes, sensitive connection and administrative information is encrypted with a Fernet key before being stored in Airflow's backend database. This includes any passwords for your connection objects as well as service account keys for e.g. Google Cloud.
However, if you have built Airflow webserver as a containerized service, then every time you modify and rebuild your container you run the risk of invalidating your Fernet key and losing access to your connections.
Airflow finds the Fernet key you would like to use from the config file, which by default gets generated and added to airflow/airflow.cfg when you first run the airflow initdb command. There is some insecurity built into this approach, since the key gets hard-coded into the file.
For this diagnosis and If you're using the puckel/docker-airflow repository's Dockerfile or docker-compose.yaml as a base for building your Airflow service, then the point at which your Fernet key gets generated is here, in the scripts/entrypoint.sh file:

: "${AIRFLOW__CORE__FERNET_KEY:=${FERNET_KEY:=$(python -c "from cryptography.fernet \
import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")}}"

That's a pretty sexy, maybe brilliant, one-liner as far as bash-scripted one-liner's go. If you're not familiar with bash script, the breakdown is as follows:

the : at the start of the line allows you to define a variable in a script with a default value. Here, entrypoint.sh creates the AIRFLOW__CORE__FERNET_KEY variable for the script, if the variable does not already exist in the environment. So, you could override it by specifying this variable in your Dockerfile or docker-compose.yaml file with something like ENV AIRFLOW__CORE__FERNET_KEY='some string you generated or made up' .

=${FERNET_KEY:= This portion assigns the value of FERNET_KEY to AIRFLOW__CORE__FERNET_KEY if it already exists in the environment (maybe you decided to pass it from somewhere else); if it does not exist, then the := part here tells bash to make a default value with the one-line call to Python's cryptography library.

the python -c ... you can probably understand if you know at least some Python--the string gets passed as a -c (command) to python, and the print(FERNET_KEY) call prints out the random Fernet Key to stdOUT.

Placing the whole Python section into $( ) tells bash to evaluate the entire expression in a sub-process and to return the stdOUT output. So, the output of this portion is the Fernet key itself.

At first I wondered why Puckel defined two different variables for the key in entrypoint.sh , but I realized that it is necessary to have two places where the user can manually define it, depending on their use case.

AIRFLOW__CORE__FERNET_KEY is the environment variable the airflow initdb command will look for when creating the back-end database, and so if the user wants to change it and uses docker-compose, she should set it in the docker-compose file.

If she just wants to build the webserver by itself, she can set FERNET_KEY in the Dockerfile, because that is accessible to entrypoint.sh , which gets executed at the end of the Dockerfile.

In either case, the final value of $FERNET_KEY gets assigned to the airflow.cfg file in line 122:

# Secret key to save connection passwords in the db
fernet_key = $FERNET_KEY

The Solution

So, now that we understand exactly what is going on, we can troubleshoot the cryptography.fernet error that might appear in your DAG task execution logs when a task fails, as it tries to access the back-end database for connection and other runtime data. This will likely happen every time you rebuild your airflow_webserver container, unless you rebuild the database and it's data dump each time (if you're like me, you prefer to keep it as a persistent volume so you have some permanency to your Airflow execution and scheduling data).
The easiest thing to do is just re-enter your connections and other entries that use the Fernet key for cryptographic encoding in the Airflow UI, though if you have many connections, that will become very tedious.
The second easiest thing is to create a task that recreates connections and other database entries you need, scheduled to run @once so that you can just trigger it after rebuilding your webserver container.
The Python task would look something like this:

from airflow.models import Connection, Variable, Session

import airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow import models, settings

dag = DAG( .... )

def set_connection(**config):
    for k,v in config.items():
        conn = Connection()
        setattr(conn, k, v)
    session = settings.Session()
    session.add(conn)
    session.commit()
    session.close()

task = PythonOperator(
    dag = dag,
    task_id = 'set-connections',
    python_callable = set_connection,
    ...........
)

Of course, try to avoid hard-coding the config for your connections directly into your file. You can store it in a more secure place, such as a dedicated database with encryption, in a gcloud bucket, etc., and pull that connection configuration data into your script. Also, for added security, Airflow connection objects have a rotate_fernet_key attribute you can explore to change the encryption in the backend database regularly!

Sources

Reference

이 문제에 관하여(Solving the frequent Fernet Key cryptography error for containerized Apache-Airflow), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/ctivan/items/068a26fc6ba25110a87a

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

[Windows] stemmer를 설치할 수 없습니다.

다중 운영체제가 Docker 개발 환경을 지원하는 제작 방법 [docker-compose편]

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다