Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

Refactor HttpAsyncHook to support session-based async HTTP operations and simplify LivyAsyncHook#60458

Merged
dabla merged 76 commits into
apache:mainfrom
dabla:feature/add-session-async-http-hook
Mar 14, 2026
Merged

Refactor HttpAsyncHook to support session-based async HTTP operations and simplify LivyAsyncHook#60458
dabla merged 76 commits into
apache:mainfrom
dabla:feature/add-session-async-http-hook

Conversation

@dabla

@dabla dabla commented Jan 13, 2026

Copy link
Copy Markdown
Contributor

Was generative AI tooling used to co-author this PR?

  • Yes (please specify the tool below)

This PR refactors HttpAsyncHook to natively support session-based async HTTP usage, making it easier and safer to perform multiple HTTP requests within a single aiohttp.ClientSession.

As part of this refactor, LivyAsyncHook (which extends HttpAsyncHook) is simplified to reuse shared logic, removing duplicated code and avoiding internal state mutation during request execution.

Before this change, users who wanted to reuse an aiohttp.ClientSession had to manage the session lifecycle themselves and pass it into HttpAsyncHook.run(). This resulted in boilerplate-heavy task code and made it harder to apply hook-level configuration consistently.

Instead of having to do this:

    @task(
        retries=3,
        retry_delay=timedelta(seconds=60),
        show_return_value_in_logs=True,
    )
    async def fetch_forms(project_id: int):
        from aiohttp import ClientSession
        from functools import partial

        from airflow.providers.http.hooks.http import HttpAsyncHook

        auth_type = partial(BearerTokenAuth, url="https://localhost/v1/sessions")
        hook = HttpAsyncHook(http_conn_id="http_conn_id", auth_type=auth_type, method="GET")

        async with ClientSession() as session:
            response = await hook.run(
                session=session,
                endpoint=f"v1/projects/{project_id}/forms",
            )
            return await response.json()

You will be able to do this instead as the HttpAsyncHook now exposes a session-based API that encapsulates session creation, configuration, authentication, and retry logic:

    @task(
        retries=3,
        retry_delay=timedelta(seconds=60),
        show_return_value_in_logs=True,
    )
    async def fetch_forms(project_id: int):
        from functools import partial

        from airflow.providers.http.hooks.http import HttpAsyncHook

        auth_type = partial(
            BearerTokenAuth, url="https://localhost/v1/sessions"
        )
        hook = HttpAsyncHook(
            http_conn_id="http.odk.acc", auth_type=auth_type, method="GET"
        )

        response = await hook.run(
            endpoint=f"v1/projects/{project_id}/forms",
        )
        return await response.json()

        # or if multiple requests need to be done with the same session:
        async with hook.session() as session:
            response_1 = await session.run(
                endpoint=f"v1/projects/{project_id}/forms",
            )
            return await response_1.json()
            
            response_2 = await session.run(
                endpoint=f"v1/projects/{project_id}/forms",
            )
            return await response_2.json()

This approach:

  • Eliminates manual aiohttp.ClientSession management in tasks (no more boilerplate code needed).
  • Makes multi-request workflows easier and less error-prone thanks to the async context manager.
  • Keeps HTTP configuration and retry behaviour inside the hook.

LivyAsyncHook improvements

  • LivyAsyncHook now reuses the session and request logic provided by HttpAsyncHook session context manager.
  • Duplicate logic previously hidden in _do_api_call_async has been removed.
  • The hook no longer mutates internal state (e.g. self.method) when calling run_method.

Overall behaviour is unchanged, but the implementation is more DRY, predictable, and maintainable

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

dabla added 26 commits January 13, 2026 18:48
dabla added 20 commits February 23, 2026 08:07
@dabla dabla merged commit 49f48e3 into apache:main Mar 14, 2026
136 checks passed
abhijeets25012-tech pushed a commit to abhijeets25012-tech/airflow that referenced this pull request Apr 9, 2026
… and simplify LivyAsyncHook (apache#60458)

* refactor: Refactored HttpAsyncHook to easily support session based run operations

* fix: Fixed import of LoggingMixin

* refactor: LivyAsyncHook now reuses logic from HttpAsyncHook which is more DRY

* refactor: Reformatted HttpAsyncHook

* refactor: Fixed possible None types for merged_headers

* refactor: Changed type of _retryable_error_async method

* refactor: Removed unused import

* refactor: Moved SessionConfig inside AsyncHttpSession

* refactor: Reformatted run method of HttpAsyncHook

* refactor: Removed unused import from LivyHook module

* Revert "refactor: Moved SessionConfig inside AsyncHttpSession"

This reverts commit f9c503d.

* refactor: Added docstring for retry_limit and retry_delay parameters

* refactor: Reformatted docstring in _retryable_error_async method

* refactor: Added docstring for SessionConfig and AsyncHttpSession

* refactor: Added warning logging when run attempt fails

* refactor: Refactored run_method of LivyAsyncHook

* refactor: Refactored unit tests for LivyAsyncHook

* refactor: Reformatted AsyncHttpSession

* refactor: Reformatted run_method of LivyAsyncHook

* refactor: Escape aiohttp.ClientSession in docstring of session contextmanager in HttpAsyncHook

* refactor: Also take into extra_options from connection when building AsyncHttpSession

* refactor: Fixed mocking of test_run_method_success

* refactor: Removed unused imports

* refactor: Reorganized imports

* refactor: Run method of LivyAsyncHook must internally use session from HttpAsyncHook so it doesn't rely on the error handling of the HttpAsyncHook run method

* refactor: Escape reserved words in HttpAsyncHook

* refactor: Mock get_async_connection in TestLivyAsyncHook

* refactor: Mock get_async_connection in TestLivyAsyncHook should be patched on http hook module

* refactor: Mock get_async_connection in TestLivyAsyncHook should be patched on http hook module

* refactor: Make sure get_async_connection is mocked with real Connection

* refactor: Reformatted Livy unit test

* refactor: Add get_async_connection mock in test_run_put_method_with_type_error

* refactor: Make sure http provider dependency is set to next release when livy provider is release

* refactor: Added TODO on asgiref dependency as I can probably be removed as it will be resolved transiently through common-compat provider

* refactor: Removed asgiref dependency in livy provider

* refactor: Removed asgiref reference from docs

* refactor: Fixed assertion of Connection type in test_build_get_hook of TestLivyAsyncHook

* refactor: Don't need to assert connections anymore in test_build_get_hook of TestLivyAsyncHook

---------

Co-authored-by: David Blain <david.blain@b-holding.be>
@NBardelot

NBardelot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Hello @dabla , I notice that in the commit 49f48e3
you modify the http provider's Hook to import the pydantic BaseModel. You modify the providers/apache/livy/pyproject.toml to update the dependencies for the livy provider, but I think that you forgot to declare the new dependency for the http provider that now depends on pydantic being available.

You force "apache-airflow-providers-http>=5.1.0" in providers/apache/livy/pyproject.toml but apache-airflow-providers-http itself does not list pydantic as a dependency (even now in the main branch).

@dabla

dabla commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Hello @dabla , I notice that in the commit 49f48e3 you modify the http provider's Hook to import the pydantic BaseModel. You modify the providers/apache/livy/pyproject.toml to update the dependencies for the livy provider, but I think that you forgot to declare the new dependency for the http provider that now depends on pydantic being available.

You force "apache-airflow-providers-http>=5.1.0" in providers/apache/livy/pyproject.toml but apache-airflow-providers-http itself does not list pydantic as a dependency (even now in the main branch).

You're right that it currently works because pydantic is pulled in transitively through apache-airflow. However, since the apache-airflow-providers-http now directly imports pydantic, it would be cleaner and more robust to declare pydantic as an explicit dependency of the provider rather than relying on a transitive dependency.

@NBardelot

Copy link
Copy Markdown
Contributor

You are absolutely right. I forgot to mention that my case was a yet non-upgraded Airflow v2 instance (version 2.11.2) where the http provider being upgraded without the explicit dependency makes the HTTPHook fail to import pydantic contents. If anyone searches for this issue, the workaround is to add explicitely pydantic to the requirements on top of the apache-airflow-providers-http main requirement.

@dabla

dabla commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

You are absolutely right. I forgot to mention that my case was a yet non-upgraded Airflow v2 instance (version 2.11.2) where the http provider being upgraded without the explicit dependency makes the HTTPHook fail to import pydantic contents. If anyone searches for this issue, the workaround is to add explicitely pydantic to the requirements on top of the apache-airflow-providers-http main requirement.

Thanks for rising this issue @NBardelot. I’ll do a PR to fix this issue and add the dependency to the http provider

@dabla

dabla commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Hello @dabla , I notice that in the commit 49f48e3 you modify the http provider's Hook to import the pydantic BaseModel. You modify the providers/apache/livy/pyproject.toml to update the dependencies for the livy provider, but I think that you forgot to declare the new dependency for the http provider that now depends on pydantic being available.

You force "apache-airflow-providers-http>=5.1.0" in providers/apache/livy/pyproject.toml but apache-airflow-providers-http itself does not list pydantic as a dependency (even now in the main branch).

I've created a PR to address this issue, thanks again for pointing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants