zhenxun_bot/zhenxun/utils/http_utils.py
HibiKier 4e33bf3a50
版本更新 (#1666)
*  父级插件加载

*  添加测试:更新与添加插件 (#1594)

*  测试更新与添加插件

*  Sourcery建议

* 👷 添加pytest

* 🎨 优化代码

* 🐛 bug修复

* 🐛修复添加插件返回403的问题 (#1595)

* 完善测试方法
* vscode测试配置
* 重构插件安装过程

* 🎨 修改readme

* Update README.md

* 🐛 修改bug与版本锁定

* 🐛 修复超级用户对群组功能开关

* 🐛 修复插件商店检查插件更新问题 (#1597)

* 🐛 修复插件商店检查插件更新问题

* 🐛 恶意命令检测问题

* 🐛 增加插件状态检查 (#1598)

*  优化测试用例

* 🐛 更改插件更新与安装逻辑

* 🐛 修复更新群组成员信息

* 🎨 代码优化

* 🚀 更新Dockerfile (#1599)

* 🎨 更新requirements

*  添加依赖aiocache

*  添加github镜像

*  添加仓库目录多获取渠道

* 🐛 修复测试用例

*  添加API缓存

* 🎨 采取Sourcery建议

* 🐛 文件下载逻辑修改

* 🎨 优化代码

* 🐛 修复插件开关有时出现错误

*  重构自检ui

* 🐛 自检html修正

* 修复签到逻辑bug,并使代码更灵活以适应签到好感度等级配置 (#1606)

* 修复签到功能已知问题

* 修复签到功能已知问题

* 修改参数名称

* 修改uid判断

---------

Co-authored-by: HibiKier <45528451+HibiKier@users.noreply.github.com>

* 🎨 代码结构优化

* 🐛 私聊时修改插件时删除私聊帮助

* 🐛 过滤父插件

* 🐛 修复自检在ARM上的问题 (#1607)

* 🐛 修复自检在ARM上的问题

*  优化测试

*  支持mysql,psql,sqlite随机函数

* 🔧 VSCode配置修改

* 🔧 VSCode配置修改

*  添加金币排行

Co-Authored-By: HibiKier <45528451+HibiKier@users.noreply.github.com>

* 📝 修改README

Co-Authored-By: HibiKier <45528451+HibiKier@users.noreply.github.com>

* 🔨 提取GitHub相关操作 (#1609)

* 🔨 提取GitHub相关操作

* 🔨 重构API策略

*  签到/金币排行限制最大数量 (#1616)

*  签到/金币排行限制最大数量

* 🐛 修复超级用户id获取问题

* 🐛 修复路径解压与挂载 (#1619)

* 🐛 修复功能少时zhenxun帮助图片排序问题 (#1620)

* 🐛 签到文本适应 (#1622)

* 🐛 好感度排行提供默认值 (#1624)

* 🎈 优先使用github api (#1625)

*  重构帮助,限制普通用户查询管理插件 (#1626)

* 🐛 修复群权限与插件等级匹配 (#1627)

*  当管理员尝试ban真寻时将被反杀 (#1628)

*  群组发言时间检测提供开关配置 (#1630)

* 🐳 chore: 支持自动修改版本号 (#1629)

* 🎈 perf(github_utils): 支持github url下载遍历 (#1632)

* 🎈 perf(github_utils): 支持github url下载遍历

* 🐞 fix(http_utils): 修复一些下载问题

* 🦄 refactor(http_utils): 部分重构

* chore(version): Update version to v0.2.2-e6f17c4

---------

Co-authored-by: AkashiCoin <AkashiCoin@users.noreply.github.com>

* 🧪 test(auto_update): 修复测试用例 (#1633)

* 🐛 修复商店商品为空时报错 (#1634)

* 🐛 修复群权限与插件等级匹配 (#1635)

*  message_build支持AtAll (#1639)

* 🎈 perf: 使用commit号下载插件 (#1641)

* 🎈 perf: 使用commit号下载插件

* chore(version): Update version to v0.2.2-f9c7360

---------

Co-authored-by: AkashiCoin <AkashiCoin@users.noreply.github.com>

* 🐳 chore: 修改运行检查触发路径 (#1642)

* 🐳 chore: 修改运行检查触发路径

* 🐳 chore: 添加tests目录

*  重构qq群事件处理 (#1643)

* 🐛 签到名称自适应 (#1644)

* 🎨  更新README (#1645)

* 🐛 fix(http_utils): 流式下载Content-Length错误 (#1647)

* 🐛 修复群组中帮助功能状态显示问题 (#1650)

* 🐛 修复群欢迎消息设置 (#1651)

* 🐛 修复webui下载后首次启动错误 (#1652)

* 🐛 修复webui下载后首次启动错误

* chore(version): Update version to v0.2.2-4a8ef85

---------

Co-authored-by: HibiKier <HibiKier@users.noreply.github.com>

*  移除默认图片文件夹:爬 (#1653)

*  安装/移除插件提供插件安装/卸载方法用于插件初始化 (#1654)

*  新增超级用户与管理员帮助模板 (#1655)

*  新增个人信息命令 (#1657)

*  修改个人信息菜单名称 (#1658)

*  新增插件商店api (#1659)

*  新增插件商店api

* chore(version): Update version to v0.2.2-7e15f20

---------

Co-authored-by: HibiKier <HibiKier@users.noreply.github.com>

*  将cd,block,count限制复原配置文件 (#1662)

* 🎨 修改README (#1663)

* 🎨 修改版本号 (#1664)

* 🎨 修改requirements (#1665)

---------

Co-authored-by: AkashiCoin <l1040186796@gmail.com>
Co-authored-by: fanyinrumeng <42991257+fanyinrumeng@users.noreply.github.com>
Co-authored-by: AkashiCoin <i@loli.vet>
Co-authored-by: Elaga <1728903318@qq.com>
Co-authored-by: AkashiCoin <AkashiCoin@users.noreply.github.com>
Co-authored-by: HibiKier <HibiKier@users.noreply.github.com>
2024-10-01 00:42:23 +08:00

509 lines
18 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import time
import asyncio
from pathlib import Path
from typing import Any, Literal, ClassVar
from collections.abc import AsyncGenerator
from contextlib import asynccontextmanager
from asyncio.exceptions import TimeoutError
import rich
import httpx
import aiofiles
from retrying import retry
from playwright.async_api import Page
from nonebot_plugin_alconna import UniMessage
from nonebot_plugin_htmlrender import get_browser
from httpx import Response, ConnectTimeout, HTTPStatusError
from zhenxun.services.log import logger
from zhenxun.configs.config import BotConfig
from zhenxun.utils.message import MessageUtils
from zhenxun.utils.user_agent import get_user_agent
# from .browser import get_browser
class AsyncHttpx:
proxy: ClassVar[dict[str, str | None]] = {
"http://": BotConfig.system_proxy,
"https://": BotConfig.system_proxy,
}
@classmethod
@retry(stop_max_attempt_number=3)
async def get(
cls,
url: str | list[str],
*,
params: dict[str, Any] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
verify: bool = True,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
timeout: int = 30,
**kwargs,
) -> Response:
"""Get
参数:
url: url
params: params
headers: 请求头
cookies: cookies
verify: verify
use_proxy: 使用默认代理
proxy: 指定代理
timeout: 超时时间
"""
urls = [url] if isinstance(url, str) else url
return await cls._get_first_successful(
urls,
params=params,
headers=headers,
cookies=cookies,
verify=verify,
use_proxy=use_proxy,
proxy=proxy,
timeout=timeout,
**kwargs,
)
@classmethod
async def _get_first_successful(
cls,
urls: list[str],
**kwargs,
) -> Response:
last_exception = None
for url in urls:
try:
return await cls._get_single(url, **kwargs)
except Exception as e:
last_exception = e
if url != urls[-1]:
logger.warning(f"获取 {url} 失败, 尝试下一个")
raise last_exception or Exception("All URLs failed")
@classmethod
async def _get_single(
cls,
url: str,
*,
params: dict[str, Any] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
verify: bool = True,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
timeout: int = 30,
**kwargs,
) -> Response:
if not headers:
headers = get_user_agent()
_proxy = proxy if proxy else cls.proxy if use_proxy else None
async with httpx.AsyncClient(proxies=_proxy, verify=verify) as client: # type: ignore
return await client.get(
url,
params=params,
headers=headers,
cookies=cookies,
timeout=timeout,
**kwargs,
)
@classmethod
async def head(
cls,
url: str,
*,
params: dict[str, Any] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
verify: bool = True,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
timeout: int = 30,
**kwargs,
) -> Response:
"""Get
参数:
url: url
params: params
headers: 请求头
cookies: cookies
verify: verify
use_proxy: 使用默认代理
proxy: 指定代理
timeout: 超时时间
"""
if not headers:
headers = get_user_agent()
_proxy = proxy if proxy else cls.proxy if use_proxy else None
async with httpx.AsyncClient(proxies=_proxy, verify=verify) as client: # type: ignore
return await client.head(
url,
params=params,
headers=headers,
cookies=cookies,
timeout=timeout,
**kwargs,
)
@classmethod
async def post(
cls,
url: str,
*,
data: dict[str, Any] | None = None,
content: Any = None,
files: Any = None,
verify: bool = True,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
json: dict[str, Any] | None = None,
params: dict[str, str] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
timeout: int = 30,
**kwargs,
) -> Response:
"""
说明:
Post
参数:
url: url
data: data
content: content
files: files
use_proxy: 是否默认代理
proxy: 指定代理
json: json
params: params
headers: 请求头
cookies: cookies
timeout: 超时时间
"""
if not headers:
headers = get_user_agent()
_proxy = proxy if proxy else cls.proxy if use_proxy else None
async with httpx.AsyncClient(proxies=_proxy, verify=verify) as client: # type: ignore
return await client.post(
url,
content=content,
data=data,
files=files,
json=json,
params=params,
headers=headers,
cookies=cookies,
timeout=timeout,
**kwargs,
)
@classmethod
async def download_file(
cls,
url: str | list[str],
path: str | Path,
*,
params: dict[str, str] | None = None,
verify: bool = True,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
timeout: int = 30,
stream: bool = False,
follow_redirects: bool = True,
**kwargs,
) -> bool:
"""下载文件
参数:
url: url
path: 存储路径
params: params
verify: verify
use_proxy: 使用代理
proxy: 指定代理
headers: 请求头
cookies: cookies
timeout: 超时时间
stream: 是否使用流式下载(流式写入+进度条,适用于下载大文件)
"""
if isinstance(path, str):
path = Path(path)
path.parent.mkdir(parents=True, exist_ok=True)
try:
for _ in range(3):
if not isinstance(url, list):
url = [url]
for u in url:
try:
if not stream:
response = await cls.get(
u,
params=params,
headers=headers,
cookies=cookies,
use_proxy=use_proxy,
proxy=proxy,
timeout=timeout,
follow_redirects=follow_redirects,
**kwargs,
)
response.raise_for_status()
content = response.content
async with aiofiles.open(path, "wb") as wf:
await wf.write(content)
logger.info(f"下载 {u} 成功.. Path{path.absolute()}")
return True
else:
if not headers:
headers = get_user_agent()
_proxy = (
proxy if proxy else cls.proxy if use_proxy else None
)
async with httpx.AsyncClient(
proxies=_proxy, # type: ignore
verify=verify,
) as client:
async with client.stream(
"GET",
u,
params=params,
headers=headers,
cookies=cookies,
timeout=timeout,
follow_redirects=True,
**kwargs,
) as response:
response.raise_for_status()
logger.info(
f"开始下载 {path.name}.. "
f"Path: {path.absolute()}"
)
async with aiofiles.open(path, "wb") as wf:
total = int(
response.headers.get("Content-Length", 0)
)
with rich.progress.Progress( # type: ignore
rich.progress.TextColumn(path.name), # type: ignore
"[progress.percentage]{task.percentage:>3.0f}%", # type: ignore
rich.progress.BarColumn(bar_width=None), # type: ignore
rich.progress.DownloadColumn(), # type: ignore
rich.progress.TransferSpeedColumn(), # type: ignore
) as progress:
download_task = progress.add_task(
"Download",
total=total if total else None,
)
async for chunk in response.aiter_bytes():
await wf.write(chunk)
await wf.flush()
progress.update(
download_task,
completed=response.num_bytes_downloaded,
)
logger.info(
f"下载 {u} 成功.. "
f"Path{path.absolute()}"
)
return True
except (TimeoutError, ConnectTimeout, HTTPStatusError):
logger.warning(f"下载 {u} 失败.. 尝试下一个地址..")
else:
logger.error(f"下载 {url} 下载超时.. Path{path.absolute()}")
except Exception as e:
logger.error(f"下载 {url} 错误 Path{path.absolute()}", e=e)
return False
@classmethod
async def gather_download_file(
cls,
url_list: list[str] | list[list[str]],
path_list: list[str | Path],
*,
limit_async_number: int | None = None,
params: dict[str, str] | None = None,
use_proxy: bool = True,
proxy: dict[str, str] | None = None,
headers: dict[str, str] | None = None,
cookies: dict[str, str] | None = None,
timeout: int = 30,
**kwargs,
) -> list[bool]:
"""分组同时下载文件
参数:
url_list: url列表
path_list: 存储路径列表
limit_async_number: 限制同时请求数量
params: params
use_proxy: 使用代理
proxy: 指定代理
headers: 请求头
cookies: cookies
timeout: 超时时间
"""
if n := len(url_list) != len(path_list):
raise UrlPathNumberNotEqual(
f"Url数量与Path数量不对等Url{len(url_list)}Path{len(path_list)}"
)
if limit_async_number and n > limit_async_number:
m = float(n) / limit_async_number
x = 0
j = limit_async_number
_split_url_list = []
_split_path_list = []
for _ in range(int(m)):
_split_url_list.append(url_list[x:j])
_split_path_list.append(path_list[x:j])
x += limit_async_number
j += limit_async_number
if int(m) < m:
_split_url_list.append(url_list[j:])
_split_path_list.append(path_list[j:])
else:
_split_url_list = [url_list]
_split_path_list = [path_list]
tasks = []
result_ = []
for x, y in zip(_split_url_list, _split_path_list):
for url, path in zip(x, y):
tasks.append(
asyncio.create_task(
cls.download_file(
url,
path,
params=params,
headers=headers,
cookies=cookies,
use_proxy=use_proxy,
timeout=timeout,
proxy=proxy,
**kwargs,
)
)
)
_x = await asyncio.gather(*tasks)
result_ = result_ + list(_x)
tasks.clear()
return result_
@classmethod
async def get_fastest_mirror(cls, url_list: list[str]) -> list[str]:
assert url_list
async def head_mirror(client: type[AsyncHttpx], url: str) -> dict[str, Any]:
begin_time = time.time()
response = await client.head(url=url, timeout=6)
elapsed_time = (time.time() - begin_time) * 1000
content_length = int(response.headers.get("content-length", 0))
return {
"url": url,
"elapsed_time": elapsed_time,
"content_length": content_length,
}
logger.debug(f"开始获取最快镜像,可能需要一段时间... | URL列表{url_list}")
results = await asyncio.gather(
*(head_mirror(cls, url) for url in url_list),
return_exceptions=True,
)
_results: list[dict[str, Any]] = []
for result in results:
if isinstance(result, BaseException):
logger.warning(f"获取镜像失败,错误:{result}")
else:
logger.debug(f"获取镜像成功,结果:{result}")
_results.append(result)
_results = sorted(iter(_results), key=lambda r: r["elapsed_time"])
return [result["url"] for result in _results]
class AsyncPlaywright:
@classmethod
@asynccontextmanager
async def new_page(cls, **kwargs) -> AsyncGenerator[Page, None]:
"""获取一个新页面
参数:
user_agent: 请求头
"""
browser = await get_browser()
ctx = await browser.new_context(**kwargs)
page = await ctx.new_page()
try:
yield page
finally:
await page.close()
await ctx.close()
@classmethod
async def screenshot(
cls,
url: str,
path: Path | str,
element: str | list[str],
*,
wait_time: int | None = None,
viewport_size: dict[str, int] | None = None,
wait_until: (
Literal["domcontentloaded", "load", "networkidle"] | None
) = "networkidle",
timeout: float | None = None,
type_: Literal["jpeg", "png"] | None = None,
user_agent: str | None = None,
**kwargs,
) -> UniMessage | None:
"""截图,该方法仅用于简单快捷截图,复杂截图请操作 page
参数:
url: 网址
path: 存储路径
element: 元素选择
wait_time: 等待截取超时时间
viewport_size: 窗口大小
wait_until: 等待类型
timeout: 超时限制
type_: 保存类型
"""
if viewport_size is None:
viewport_size = {"width": 2560, "height": 1080}
if isinstance(path, str):
path = Path(path)
wait_time = wait_time * 1000 if wait_time else None
if isinstance(element, str):
element_list = [element]
else:
element_list = element
async with cls.new_page(
viewport=viewport_size,
user_agent=user_agent,
**kwargs,
) as page:
await page.goto(url, timeout=timeout, wait_until=wait_until)
card = page
for e in element_list:
if not card:
return None
card = await card.wait_for_selector(e, timeout=wait_time)
if card:
await card.screenshot(path=path, timeout=timeout, type=type_)
return MessageUtils.build_message(path)
return None
class UrlPathNumberNotEqual(Exception):
pass
class BrowserIsNone(Exception):
pass