Write blog article which describes how to configure remote logging for Airflow using Amazon S3 or MinIO
Outline:
- Why is remote logging needed?
- In a normal deployment of Airflow, logs are kept on worker instances.
- If Airflow is deployed on Kubernetes, the log files will be lost when a pod restarts or gets rescheduled. To prevent this problem, task logs can be stored in a remote storage system such as Amazon S3 or MinIO.
- Airflow S3 Connection
- Create an Amazon connection so that Airflow is able to read and write logs. If you don't have a connection, the system cannot write its logs to the storage instance.
- Amazon S3 connection details are managed from the Airflow web application.
- For MinIO instances: add client connection details so that workers can connect to the correct bucket.
- Airflow task setup
- Specify
logging
options via theairflow.cfg
-
remote_logging = True
: turns on remote logging -
remote_base_log_folder = s3://airflow/logs
: base path to which logs should be written -
remote_log_conn_id = minio
: name of the Airflow connection which should be used for retrieving credentials -
encrypt_s3_logs = False
: controls whether the logs should be encrypted by Airflow before being uploaded to S3
-
- Airflow environment variables:
-
AIRFLOW__LOGGING__REMOTE_LOGGING
: turns on remote logging -
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER
: base path for logs -
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID
: name of the Airflow connection for retrieving credentials -
AIRFLOW__LOGGING__ENCRYPT_S3_LOGS
: encrypt S3 logs
-
- Specify