🎨: Add Douyin Web A_Bogus encryption algorithm Support
This commit is contained in:
parent
5fffdfa7f3
commit
5b72b41d3b
16 changed files with 861 additions and 112 deletions
68
README.md
68
README.md
|
|
@ -45,13 +45,38 @@
|
||||||
## 🔊 V4 版本备注
|
## 🔊 V4 版本备注
|
||||||
|
|
||||||
- 感兴趣一起写这个项目的给请加微信`Evil0ctal`备注github项目重构,大家可以在群里互相交流学习,不允许发广告以及违法的东西,纯粹交朋友和技术交流。
|
- 感兴趣一起写这个项目的给请加微信`Evil0ctal`备注github项目重构,大家可以在群里互相交流学习,不允许发广告以及违法的东西,纯粹交朋友和技术交流。
|
||||||
- 本项目使用的`X-Bogus`算法依旧可以正常调用Douyin以及TikTok的API,`A-Bogus`算法暂时不会开源。
|
- 本项目使用`X-Bogus`算法以及`A_Bogus`算法请求抖音和TikTok的Web API。
|
||||||
- 由于Douyin的风控,部署完本项目后请在**浏览器中获取Douyin网站的Cookie然后在config.yaml中进行替换。**
|
- 由于Douyin的风控,部署完本项目后请在**浏览器中获取Douyin网站的Cookie然后在config.yaml中进行替换。**
|
||||||
- 请在提出issue之前先阅读下方的文档,大多数问题的解决方法都会包含在文档中。
|
- 请在提出issue之前先阅读下方的文档,大多数问题的解决方法都会包含在文档中。
|
||||||
- 本项目是完全免费的,但使用时请遵守:[Apache-2.0 license](https://github.com/Evil0ctal/Douyin_TikTok_Download_API?tab=Apache-2.0-1-ov-file#readme)
|
- 本项目是完全免费的,但使用时请遵守:[Apache-2.0 license](https://github.com/Evil0ctal/Douyin_TikTok_Download_API?tab=Apache-2.0-1-ov-file#readme)
|
||||||
- 本项目有一个闭源的分支版本,包含更多的接口和服务,详情请查看下方的信息。
|
|
||||||
|
## 🔖TikHub.io API
|
||||||
|
|
||||||
|
[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)是一个API平台,提供包括Douyin、TikTok在内的各种公开数据接口,如果您想支持 [Douyin_TikTok_Download_API](https://github.com/Evil0ctal/Douyin_TikTok_Download_API) 项目的开发,我们强烈建议您选择[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)。
|
||||||
|
|
||||||
|
#### 特点:
|
||||||
|
|
||||||
|
> 📦 开箱即用
|
||||||
|
|
||||||
|
省去繁琐的使用流程,使用封装好的SDK快速进行开发,让调用变得更简单,所有API接口都按照OpenAPI规范进行编写,并且附带示例参数。
|
||||||
|
|
||||||
|
> 💰 成本优势
|
||||||
|
|
||||||
|
不预设套餐限制,没有月度使用门槛,所有消费按实际使用量即时计费,并且根据用户每日的请求量进行阶梯式计费,同时可以通过每日签到在用户后台进行签到获取免费的额度,并且这些免费额度不会过期。
|
||||||
|
|
||||||
|
> ⚡️ 快速支持
|
||||||
|
|
||||||
|
我们有一个庞大的Discord社区服务器,管理员和其他用户会在服务器中快速的回复你,帮助你快速解决当前的问题。
|
||||||
|
|
||||||
|
> 🎉 拥抱开源
|
||||||
|
|
||||||
|
TikHub的部分源代码会开源在Github上,并且会赞助一些开源项目的作者。
|
||||||
|
|
||||||
|
#### 链接:
|
||||||
|
|
||||||
- Discord: [TikHub Discord](https://discord.com/invite/aMEAS8Xsvz)
|
- Discord: [TikHub Discord](https://discord.com/invite/aMEAS8Xsvz)
|
||||||
- Free Douyin/TikTok API: [TikHub Beta API](https://beta.tikhub.io/)
|
- Register: [TikHub signup](https://beta-web.tikhub.io/en-us/users/signup)
|
||||||
|
- API Docs: [TikHub API Docs](https://api.tikhub.io/)
|
||||||
|
|
||||||
## 🖥演示站点: 我很脆弱...请勿压测(·•᷄ࡇ•᷅ )
|
## 🖥演示站点: 我很脆弱...请勿压测(·•᷄ࡇ•᷅ )
|
||||||
|
|
||||||
|
|
@ -95,21 +120,21 @@
|
||||||
|
|
||||||
```
|
```
|
||||||
./Douyin_TikTok_Download_API
|
./Douyin_TikTok_Download_API
|
||||||
├─app
|
├─app
|
||||||
│ ├─api
|
│ ├─api
|
||||||
│ │ ├─endpoints
|
│ │ ├─endpoints
|
||||||
│ │ └─models
|
│ │ └─models
|
||||||
│ ├─download
|
│ ├─download
|
||||||
│ └─web
|
│ └─web
|
||||||
│ └─views
|
│ └─views
|
||||||
└─crawlers
|
└─crawlers
|
||||||
├─douyin
|
├─douyin
|
||||||
│ └─web
|
│ └─web
|
||||||
├─hybrid
|
├─hybrid
|
||||||
├─tiktok
|
├─tiktok
|
||||||
│ ├─app
|
│ ├─app
|
||||||
│ └─web
|
│ └─web
|
||||||
└─utils
|
└─utils
|
||||||
```
|
```
|
||||||
|
|
||||||
## ✨支持功能:
|
## ✨支持功能:
|
||||||
|
|
@ -121,6 +146,7 @@
|
||||||
- 完善的API文档([Demo/演示](https://api.douyin.wtf/docs))
|
- 完善的API文档([Demo/演示](https://api.douyin.wtf/docs))
|
||||||
- 丰富的API接口:
|
- 丰富的API接口:
|
||||||
- 抖音网页版API
|
- 抖音网页版API
|
||||||
|
|
||||||
- [x] 视频数据解析
|
- [x] 视频数据解析
|
||||||
- [x] 获取用户主页作品数据
|
- [x] 获取用户主页作品数据
|
||||||
- [x] 获取用户主页喜欢作品数据
|
- [x] 获取用户主页喜欢作品数据
|
||||||
|
|
@ -136,14 +162,15 @@
|
||||||
- [x] 生成verify_fp
|
- [x] 生成verify_fp
|
||||||
- [x] 生成s_v_web_id
|
- [x] 生成s_v_web_id
|
||||||
- [x] 使用接口网址生成X-Bogus参数
|
- [x] 使用接口网址生成X-Bogus参数
|
||||||
|
- [x] 使用接口网址生成A_Bogus参数
|
||||||
- [x] 提取单个用户id
|
- [x] 提取单个用户id
|
||||||
- [x] 提取列表用户id
|
- [x] 提取列表用户id
|
||||||
- [x] 提取单个作品id
|
- [x] 提取单个作品id
|
||||||
- [x] 提取列表作品id
|
- [x] 提取列表作品id
|
||||||
- [x] 提取列表直播间号
|
- [x] 提取列表直播间号
|
||||||
- [x] 提取列表直播间号
|
- [x] 提取列表直播间号
|
||||||
|
|
||||||
- TikTok网页版API
|
- TikTok网页版API
|
||||||
|
|
||||||
- [x] 视频数据解析
|
- [x] 视频数据解析
|
||||||
- [x] 获取用户主页作品数据
|
- [x] 获取用户主页作品数据
|
||||||
- [x] 获取用户主页喜欢作品数据
|
- [x] 获取用户主页喜欢作品数据
|
||||||
|
|
@ -165,7 +192,6 @@
|
||||||
- [x] 获取用户unique_id
|
- [x] 获取用户unique_id
|
||||||
- [x] 获取列表unique_id
|
- [x] 获取列表unique_id
|
||||||
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 📦调用解析库(已废弃需要更新):
|
## 📦调用解析库(已废弃需要更新):
|
||||||
|
|
@ -257,7 +283,6 @@ https://www.tiktok.com/@evil0ctal/video/7156033831819037994
|
||||||
|
|
||||||
***更多演示请查看文档内容......***
|
***更多演示请查看文档内容......***
|
||||||
|
|
||||||
|
|
||||||
## ⚠️部署前的准备工作(请仔细阅读):
|
## ⚠️部署前的准备工作(请仔细阅读):
|
||||||
|
|
||||||
- 你需要自行解决爬虫Cookie风控问题,否则可能会导致接口无法使用。
|
- 你需要自行解决爬虫Cookie风控问题,否则可能会导致接口无法使用。
|
||||||
|
|
@ -267,6 +292,7 @@ https://www.tiktok.com/@evil0ctal/video/7156033831819037994
|
||||||
- https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/30e56e5a7f97f87d60b1045befb1f6db147f8590/crawlers/tiktok/web/config.yaml#L6
|
- https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/30e56e5a7f97f87d60b1045befb1f6db147f8590/crawlers/tiktok/web/config.yaml#L6
|
||||||
- 演示站点的在线下载功能被我关掉了,有人下的视频巨大无比直接给我服务器干崩了,你可以在网页解析结果页面右键保存视频...
|
- 演示站点的在线下载功能被我关掉了,有人下的视频巨大无比直接给我服务器干崩了,你可以在网页解析结果页面右键保存视频...
|
||||||
- 演示站点的Cookie是我自己的,不保证长期有效,只起到演示作用,自己部署的话请自行获取Cookie。
|
- 演示站点的Cookie是我自己的,不保证长期有效,只起到演示作用,自己部署的话请自行获取Cookie。
|
||||||
|
- 需要TikTok Web API返回的视频链接直接访问会发生HTTP 403错误,请使用本项目API中的`/api/download`接口对TikTok 视频进行下载,这个接口在演示站点中已经被手动关闭了,需要你自行部署本项目。
|
||||||
- 这里有一个**视频教程**可以参考:***[https://www.bilibili.com/video/BV1vE421j7NR/](https://www.bilibili.com/video/BV1vE421j7NR/)***
|
- 这里有一个**视频教程**可以参考:***[https://www.bilibili.com/video/BV1vE421j7NR/](https://www.bilibili.com/video/BV1vE421j7NR/)***
|
||||||
|
|
||||||
## 💻部署(方式一 Linux)
|
## 💻部署(方式一 Linux)
|
||||||
|
|
|
||||||
|
|
@ -734,6 +734,48 @@ async def generate_x_bogus(request: Request,
|
||||||
raise HTTPException(status_code=status_code, detail=detail.dict())
|
raise HTTPException(status_code=status_code, detail=detail.dict())
|
||||||
|
|
||||||
|
|
||||||
|
# 使用接口地址生成Abogus参数
|
||||||
|
@router.get("/generate_a_bogus",
|
||||||
|
response_model=ResponseModel,
|
||||||
|
summary="使用接口网址生成A-Bogus参数/Generate A-Bogus parameter using API URL")
|
||||||
|
async def generate_a_bogus(request: Request,
|
||||||
|
url: str = Query(
|
||||||
|
example="https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"),
|
||||||
|
user_agent: str = Query(
|
||||||
|
example="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36")):
|
||||||
|
"""
|
||||||
|
# [中文]
|
||||||
|
### 用途:
|
||||||
|
- 使用接口网址生成A-Bogus参数
|
||||||
|
### 参数:
|
||||||
|
- url: 接口网址
|
||||||
|
- user_agent: 用户代理,暂时不支持自定义,直接使用默认值即可。
|
||||||
|
|
||||||
|
# [English]
|
||||||
|
### Purpose:
|
||||||
|
- Generate A-Bogus parameter using API URL
|
||||||
|
### Parameters:
|
||||||
|
- url: API URL
|
||||||
|
- user_agent: User agent, temporarily does not support customization, just use the default value.
|
||||||
|
|
||||||
|
# [示例/Example]
|
||||||
|
url = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"
|
||||||
|
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
a_bogus = await DouyinWebCrawler.get_a_bogus(url, user_agent)
|
||||||
|
return ResponseModel(code=200,
|
||||||
|
router=request.url.path,
|
||||||
|
data=a_bogus)
|
||||||
|
except Exception as e:
|
||||||
|
status_code = 400
|
||||||
|
detail = ErrorResponseModel(code=status_code,
|
||||||
|
router=request.url.path,
|
||||||
|
params=dict(request.query_params),
|
||||||
|
)
|
||||||
|
raise HTTPException(status_code=status_code, detail=detail.dict())
|
||||||
|
|
||||||
|
|
||||||
# 提取单个用户id
|
# 提取单个用户id
|
||||||
@router.get("/get_sec_user_id",
|
@router.get("/get_sec_user_id",
|
||||||
response_model=ResponseModel,
|
response_model=ResponseModel,
|
||||||
|
|
|
||||||
|
|
@ -19,10 +19,10 @@ with open(config_path, 'r', encoding='utf-8') as file:
|
||||||
config = yaml.safe_load(file)
|
config = yaml.safe_load(file)
|
||||||
|
|
||||||
|
|
||||||
async def fetch_data(url: str):
|
async def fetch_data(url: str, headers: dict = None):
|
||||||
headers = {
|
headers = {
|
||||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
||||||
}
|
} if headers is None else headers.get('headers')
|
||||||
async with httpx.AsyncClient() as client:
|
async with httpx.AsyncClient() as client:
|
||||||
response = await client.get(url, headers=headers)
|
response = await client.get(url, headers=headers)
|
||||||
response.raise_for_status() # 确保响应是成功的
|
response.raise_for_status() # 确保响应是成功的
|
||||||
|
|
@ -68,7 +68,7 @@ async def download_file_hybrid(request: Request,
|
||||||
return FileResponse(path=file_path, media_type='video/mp4', filename=file_name)
|
return FileResponse(path=file_path, media_type='video/mp4', filename=file_name)
|
||||||
|
|
||||||
# 获取视频文件
|
# 获取视频文件
|
||||||
response = await fetch_data(url)
|
response = await fetch_data(url) if platform == 'douyin' else await fetch_data(url, headers=await HybridCrawler.TikTokWebCrawler.get_tiktok_headers())
|
||||||
|
|
||||||
# 保存文件
|
# 保存文件
|
||||||
async with aiofiles.open(file_path, 'wb') as out_file:
|
async with aiofiles.open(file_path, 'wb') as out_file:
|
||||||
|
|
@ -115,6 +115,7 @@ async def download_file_hybrid(request: Request,
|
||||||
|
|
||||||
# 异常处理/Exception handling
|
# 异常处理/Exception handling
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
print(e)
|
||||||
code = 400
|
code = 400
|
||||||
return ErrorResponseModel(code=code, message=str(e), router=request.url.path, params=dict(request.query_params))
|
return ErrorResponseModel(code=code, message=str(e), router=request.url.path, params=dict(request.query_params))
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -103,7 +103,7 @@ description = f"""
|
||||||
#### 备注
|
#### 备注
|
||||||
- 本项目仅供学习交流使用,不得用于违法用途,否则后果自负。
|
- 本项目仅供学习交流使用,不得用于违法用途,否则后果自负。
|
||||||
- 如果你不想自己部署,可以直接使用我们的在线API服务:[Douyin_TikTok_Download_API](https://douyin.wtf/docs)
|
- 如果你不想自己部署,可以直接使用我们的在线API服务:[Douyin_TikTok_Download_API](https://douyin.wtf/docs)
|
||||||
- 如果你需要更稳定以及更多功能的API服务,可以使用付费API服务:[TikHub API](https://beta.tikhub.io/)
|
- 如果你需要更稳定以及更多功能的API服务,可以使用付费API服务:[TikHub API](https://api.tikhub.io/)
|
||||||
|
|
||||||
### [English]
|
### [English]
|
||||||
|
|
||||||
|
|
@ -116,7 +116,7 @@ description = f"""
|
||||||
#### Note
|
#### Note
|
||||||
- This project is for learning and communication only, and shall not be used for illegal purposes, otherwise the consequences shall be borne by yourself.
|
- This project is for learning and communication only, and shall not be used for illegal purposes, otherwise the consequences shall be borne by yourself.
|
||||||
- If you do not want to deploy it yourself, you can directly use our online API service: [Douyin_TikTok_Download_API](https://douyin.wtf/docs)
|
- If you do not want to deploy it yourself, you can directly use our online API service: [Douyin_TikTok_Download_API](https://douyin.wtf/docs)
|
||||||
- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://beta.tikhub.io)
|
- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://api.tikhub.io)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
docs_url = config['API']['Docs_URL']
|
docs_url = config['API']['Docs_URL']
|
||||||
|
|
|
||||||
|
|
@ -30,8 +30,8 @@ API:
|
||||||
Redoc_URL: /redoc # API documentation URL | API文档URL
|
Redoc_URL: /redoc # API documentation URL | API文档URL
|
||||||
|
|
||||||
# API Information
|
# API Information
|
||||||
Version: V4.0.0 # API version | API版本
|
Version: V4.0.2 # API version | API版本
|
||||||
Update_Time: 2024/04/22 # API update time | API更新时间
|
Update_Time: 2024/06/14 # API update time | API更新时间
|
||||||
Environment: Demo # API environment | API环境
|
Environment: Demo # API environment | API环境
|
||||||
|
|
||||||
# Download Configuration
|
# Download Configuration
|
||||||
|
|
|
||||||
559
crawlers/douyin/web/abogus.py
Normal file
559
crawlers/douyin/web/abogus.py
Normal file
|
|
@ -0,0 +1,559 @@
|
||||||
|
"""
|
||||||
|
Original Author:
|
||||||
|
This file is from https://github.com/JoeanAmier/TikTokDownloader
|
||||||
|
And is licensed under the GNU General Public License v3.0
|
||||||
|
If you use this code, please keep this license and the original author information.
|
||||||
|
|
||||||
|
Modified by:
|
||||||
|
And this file is now a part of the https://github.com/Evil0ctal/Douyin_TikTok_Download_API open-source project.
|
||||||
|
This project is licensed under the Apache License 2.0, and the original author information is kept.
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
This file is used to generate the `a_bogus` parameter for the Douyin Web API.
|
||||||
|
|
||||||
|
Changes Made:
|
||||||
|
1. Changed the ua_code to compatible with the current config file User-Agent string in https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/main/crawlers/douyin/web/config.yaml
|
||||||
|
"""
|
||||||
|
|
||||||
|
from random import randint
|
||||||
|
from random import random
|
||||||
|
from re import compile
|
||||||
|
from time import time
|
||||||
|
from urllib.parse import urlencode, quote
|
||||||
|
|
||||||
|
|
||||||
|
class ABogus:
|
||||||
|
__filter = compile(r'%([0-9A-F]{2})')
|
||||||
|
__arguments = [0, 1, 14]
|
||||||
|
__end_string = "cus"
|
||||||
|
__version = [1, 0, 1, 5]
|
||||||
|
__env = [
|
||||||
|
49,
|
||||||
|
53,
|
||||||
|
51,
|
||||||
|
54,
|
||||||
|
124,
|
||||||
|
55,
|
||||||
|
52,
|
||||||
|
50,
|
||||||
|
124,
|
||||||
|
49,
|
||||||
|
53,
|
||||||
|
51,
|
||||||
|
54,
|
||||||
|
124,
|
||||||
|
56,
|
||||||
|
54,
|
||||||
|
52,
|
||||||
|
124,
|
||||||
|
48,
|
||||||
|
124,
|
||||||
|
48,
|
||||||
|
124,
|
||||||
|
48,
|
||||||
|
124,
|
||||||
|
48,
|
||||||
|
124,
|
||||||
|
49,
|
||||||
|
53,
|
||||||
|
51,
|
||||||
|
54,
|
||||||
|
124,
|
||||||
|
56,
|
||||||
|
54,
|
||||||
|
52,
|
||||||
|
124,
|
||||||
|
49,
|
||||||
|
53,
|
||||||
|
51,
|
||||||
|
54,
|
||||||
|
124,
|
||||||
|
56,
|
||||||
|
54,
|
||||||
|
52,
|
||||||
|
124,
|
||||||
|
49,
|
||||||
|
53,
|
||||||
|
51,
|
||||||
|
54,
|
||||||
|
124,
|
||||||
|
55,
|
||||||
|
52,
|
||||||
|
50,
|
||||||
|
124,
|
||||||
|
50,
|
||||||
|
52,
|
||||||
|
124,
|
||||||
|
50,
|
||||||
|
52,
|
||||||
|
124,
|
||||||
|
87,
|
||||||
|
105,
|
||||||
|
110,
|
||||||
|
51,
|
||||||
|
50]
|
||||||
|
__reg = [
|
||||||
|
1937774191,
|
||||||
|
1226093241,
|
||||||
|
388252375,
|
||||||
|
3666478592,
|
||||||
|
2842636476,
|
||||||
|
372324522,
|
||||||
|
3817729613,
|
||||||
|
2969243214,
|
||||||
|
]
|
||||||
|
__str = {
|
||||||
|
"s0": "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
|
||||||
|
"s1": "Dkdpgh4ZKsQB80/Mfvw36XI1R25+WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=",
|
||||||
|
"s2": "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=",
|
||||||
|
"s3": "ckdp1h4ZKsUB80/Mfvw36XIgR25+WQAlEi7NLboqYTOPuzmFjJnryx9HVGDaStCe",
|
||||||
|
"s4": "Dkdpgh2ZmsQB80/MfvV36XI1R45-WUAlEixNLwoqYTOPuzKFjJnry79HbGcaStCe"}
|
||||||
|
|
||||||
|
def __init__(self, ):
|
||||||
|
self.chunk = []
|
||||||
|
self.size = 0
|
||||||
|
self.reg = self.__reg[:]
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def list_1(cls, random_num=None, a=170, b=85, c=45, ) -> list:
|
||||||
|
return cls.random_list(
|
||||||
|
random_num,
|
||||||
|
a,
|
||||||
|
b,
|
||||||
|
1,
|
||||||
|
2,
|
||||||
|
5,
|
||||||
|
c & a,
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def list_2(cls, random_num=None, a=170, b=85, ) -> list:
|
||||||
|
return cls.random_list(
|
||||||
|
random_num,
|
||||||
|
a,
|
||||||
|
b,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def list_3(cls, random_num=None, a=170, b=85, ) -> list:
|
||||||
|
return cls.random_list(
|
||||||
|
random_num,
|
||||||
|
a,
|
||||||
|
b,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
5,
|
||||||
|
0,
|
||||||
|
)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def random_list(
|
||||||
|
a: float = None,
|
||||||
|
b=170,
|
||||||
|
c=85,
|
||||||
|
d=0,
|
||||||
|
e=0,
|
||||||
|
f=0,
|
||||||
|
g=0,
|
||||||
|
) -> list:
|
||||||
|
r = a or (random() * 10000)
|
||||||
|
v = [
|
||||||
|
r,
|
||||||
|
int(r) & 255,
|
||||||
|
int(r) >> 8,
|
||||||
|
]
|
||||||
|
s = v[1] & b | d
|
||||||
|
v.append(s)
|
||||||
|
s = v[1] & c | e
|
||||||
|
v.append(s)
|
||||||
|
s = v[2] & b | f
|
||||||
|
v.append(s)
|
||||||
|
s = v[2] & c | g
|
||||||
|
v.append(s)
|
||||||
|
return v[-4:]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def from_char_code(*args):
|
||||||
|
return "".join(chr(code) for code in args)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_string_1(
|
||||||
|
cls,
|
||||||
|
random_num_1=None,
|
||||||
|
random_num_2=None,
|
||||||
|
random_num_3=None,
|
||||||
|
):
|
||||||
|
return cls.from_char_code(*cls.list_1(random_num_1)) + cls.from_char_code(
|
||||||
|
*cls.list_2(random_num_2)) + cls.from_char_code(*cls.list_3(random_num_3))
|
||||||
|
|
||||||
|
def generate_string_2(
|
||||||
|
self,
|
||||||
|
url_params: str,
|
||||||
|
user_agent: str,
|
||||||
|
start_time=0,
|
||||||
|
end_time=0,
|
||||||
|
) -> str:
|
||||||
|
a = self.generate_string_2_list(
|
||||||
|
url_params,
|
||||||
|
user_agent,
|
||||||
|
start_time,
|
||||||
|
end_time,
|
||||||
|
)
|
||||||
|
e = self.end_check_num(a)
|
||||||
|
a.extend(self.__env)
|
||||||
|
a.append(e)
|
||||||
|
return self.rc4_encrypt(self.from_char_code(*a), "y")
|
||||||
|
|
||||||
|
def generate_string_2_list(
|
||||||
|
self,
|
||||||
|
url_params: str,
|
||||||
|
user_agent: str,
|
||||||
|
start_time=0,
|
||||||
|
end_time=0,
|
||||||
|
) -> list:
|
||||||
|
start_time = start_time or int(time() * 1000)
|
||||||
|
end_time = end_time or (start_time + randint(4, 8))
|
||||||
|
params_array = self.sum(self.sum(url_params))
|
||||||
|
# TODO: 需要编写一个函数来生成ua_code 2024年6月13日17:13:08
|
||||||
|
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
|
||||||
|
ua_code = [76, 98, 15, 131, 97, 245, 224, 133, 122, 199, 241, 166, 79, 34, 90, 191, 128, 126, 122, 98, 66, 11, 14, 40, 49, 110, 110, 173, 67, 96, 138, 252]
|
||||||
|
return self.list_4(
|
||||||
|
(end_time >> 24) & 255,
|
||||||
|
params_array[21],
|
||||||
|
ua_code[23],
|
||||||
|
(end_time >> 16) & 255,
|
||||||
|
params_array[22],
|
||||||
|
ua_code[24],
|
||||||
|
(end_time >> 8) & 255,
|
||||||
|
(end_time >> 0) & 255,
|
||||||
|
(start_time >> 24) & 255,
|
||||||
|
(start_time >> 16) & 255,
|
||||||
|
(start_time >> 8) & 255,
|
||||||
|
(start_time >> 0) & 255,
|
||||||
|
)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def reg_to_array(a):
|
||||||
|
o = [0] * 32
|
||||||
|
for i in range(8):
|
||||||
|
c = a[i]
|
||||||
|
o[4 * i + 3] = (255 & c)
|
||||||
|
c >>= 8
|
||||||
|
o[4 * i + 2] = (255 & c)
|
||||||
|
c >>= 8
|
||||||
|
o[4 * i + 1] = (255 & c)
|
||||||
|
c >>= 8
|
||||||
|
o[4 * i] = (255 & c)
|
||||||
|
|
||||||
|
return o
|
||||||
|
|
||||||
|
def compress(self, a):
|
||||||
|
f = self.generate_f(a)
|
||||||
|
i = self.reg[:]
|
||||||
|
for o in range(64):
|
||||||
|
c = self.de(i[0], 12) + i[4] + self.de(self.pe(o), o)
|
||||||
|
c = (c & 0xFFFFFFFF)
|
||||||
|
c = self.de(c, 7)
|
||||||
|
s = (c ^ self.de(i[0], 12)) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
u = self.he(o, i[0], i[1], i[2])
|
||||||
|
u = (u + i[3] + s + f[o + 68]) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
b = self.ve(o, i[4], i[5], i[6])
|
||||||
|
b = (b + i[7] + c + f[o]) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
i[3] = i[2]
|
||||||
|
i[2] = self.de(i[1], 9)
|
||||||
|
i[1] = i[0]
|
||||||
|
i[0] = u
|
||||||
|
|
||||||
|
i[7] = i[6]
|
||||||
|
i[6] = self.de(i[5], 19)
|
||||||
|
i[5] = i[4]
|
||||||
|
i[4] = (b ^ self.de(b, 9) ^ self.de(b, 17)) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
for l in range(8):
|
||||||
|
self.reg[l] = (self.reg[l] ^ i[l]) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_f(cls, e):
|
||||||
|
r = [0] * 132
|
||||||
|
|
||||||
|
for t in range(16):
|
||||||
|
r[t] = (e[4 * t] << 24) | (e[4 * t + 1] <<
|
||||||
|
16) | (e[4 * t + 2] << 8) | e[4 * t + 3]
|
||||||
|
r[t] &= 0xFFFFFFFF
|
||||||
|
|
||||||
|
for n in range(16, 68):
|
||||||
|
a = r[n - 16] ^ r[n - 9] ^ cls.de(r[n - 3], 15)
|
||||||
|
a = a ^ cls.de(a, 15) ^ cls.de(a, 23)
|
||||||
|
r[n] = (a ^ cls.de(r[n - 13], 7) ^ r[n - 6]) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
for n in range(68, 132):
|
||||||
|
r[n] = (r[n - 68] ^ r[n - 64]) & 0xFFFFFFFF
|
||||||
|
|
||||||
|
return r
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def pad_array(arr, length=60):
|
||||||
|
while len(arr) < length:
|
||||||
|
arr.append(0)
|
||||||
|
return arr
|
||||||
|
|
||||||
|
def fill(self, length=60):
|
||||||
|
size = 8 * self.size
|
||||||
|
self.chunk.append(128)
|
||||||
|
self.chunk = self.pad_array(self.chunk, length)
|
||||||
|
for i in range(4):
|
||||||
|
self.chunk.append((size >> 8 * (3 - i)) & 255)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def list_4(
|
||||||
|
a: int,
|
||||||
|
b: int,
|
||||||
|
c: int,
|
||||||
|
d: int,
|
||||||
|
e: int,
|
||||||
|
f: int,
|
||||||
|
g: int,
|
||||||
|
h: int,
|
||||||
|
i: int,
|
||||||
|
j: int,
|
||||||
|
k: int,
|
||||||
|
m: int,
|
||||||
|
) -> list:
|
||||||
|
return [
|
||||||
|
44,
|
||||||
|
a,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
24,
|
||||||
|
b,
|
||||||
|
58,
|
||||||
|
0,
|
||||||
|
c,
|
||||||
|
d,
|
||||||
|
0,
|
||||||
|
24,
|
||||||
|
97,
|
||||||
|
1,
|
||||||
|
0,
|
||||||
|
239,
|
||||||
|
e,
|
||||||
|
51,
|
||||||
|
f,
|
||||||
|
g,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
h,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
14,
|
||||||
|
i,
|
||||||
|
j,
|
||||||
|
0,
|
||||||
|
k,
|
||||||
|
m,
|
||||||
|
3,
|
||||||
|
399,
|
||||||
|
1,
|
||||||
|
399,
|
||||||
|
1,
|
||||||
|
64,
|
||||||
|
0,
|
||||||
|
0,
|
||||||
|
0]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def end_check_num(a: list):
|
||||||
|
r = 0
|
||||||
|
for i in a:
|
||||||
|
r ^= i
|
||||||
|
return r
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def decode_string(cls, url_string, ):
|
||||||
|
decoded = cls.__filter.sub(cls.replace_func, url_string)
|
||||||
|
return decoded
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def replace_func(match):
|
||||||
|
return chr(int(match.group(1), 16))
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def de(e, r):
|
||||||
|
r %= 32
|
||||||
|
return ((e << r) & 0xFFFFFFFF) | (e >> (32 - r))
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def pe(e):
|
||||||
|
return 2043430169 if 0 <= e < 16 else 2055708042
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def he(e, r, t, n):
|
||||||
|
if 0 <= e < 16:
|
||||||
|
return (r ^ t ^ n) & 0xFFFFFFFF
|
||||||
|
elif 16 <= e < 64:
|
||||||
|
return (r & t | r & n | t & n) & 0xFFFFFFFF
|
||||||
|
raise ValueError
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def ve(e, r, t, n):
|
||||||
|
if 0 <= e < 16:
|
||||||
|
return (r ^ t ^ n) & 0xFFFFFFFF
|
||||||
|
elif 16 <= e < 64:
|
||||||
|
return (r & t | ~r & n) & 0xFFFFFFFF
|
||||||
|
raise ValueError
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def convert_to_char_code(a):
|
||||||
|
d = []
|
||||||
|
for i in a:
|
||||||
|
d.append(ord(i))
|
||||||
|
return d
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def split_array(arr, chunk_size=64):
|
||||||
|
result = []
|
||||||
|
for i in range(0, len(arr), chunk_size):
|
||||||
|
result.append(arr[i:i + chunk_size])
|
||||||
|
return result
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def char_code_at(s):
|
||||||
|
return [ord(char) for char in s]
|
||||||
|
|
||||||
|
def write(self, e, ):
|
||||||
|
if isinstance(e, str):
|
||||||
|
e = self.decode_string(e + self.__end_string)
|
||||||
|
e = self.char_code_at(e)
|
||||||
|
self.size = len(e)
|
||||||
|
if len(e) <= 64:
|
||||||
|
self.chunk = e
|
||||||
|
else:
|
||||||
|
chunks = self.split_array(e, 64)
|
||||||
|
for i in chunks[:-1]:
|
||||||
|
self.compress(i)
|
||||||
|
self.chunk = chunks[-1]
|
||||||
|
|
||||||
|
def reset(self, ):
|
||||||
|
self.chunk = []
|
||||||
|
self.size = 0
|
||||||
|
self.reg = self.__reg[:]
|
||||||
|
|
||||||
|
def sum(self, e, length=60):
|
||||||
|
self.reset()
|
||||||
|
self.write(e)
|
||||||
|
self.fill(length)
|
||||||
|
self.compress(self.chunk)
|
||||||
|
a = self.reg_to_array(self.reg)
|
||||||
|
self.reset()
|
||||||
|
return a
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_result_unit(cls, n, s):
|
||||||
|
r = ""
|
||||||
|
for i, j in zip(range(18, -1, -6), (16515072, 258048, 4032, 63)):
|
||||||
|
r += cls.__str[s][(n & j) >> i]
|
||||||
|
return r
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_result_end(cls, s, e="s4"):
|
||||||
|
r = ""
|
||||||
|
b = ord(s[120]) << 16
|
||||||
|
r += cls.__str[e][(b & 16515072) >> 18]
|
||||||
|
r += cls.__str[e][(b & 258048) >> 12]
|
||||||
|
r += "=="
|
||||||
|
return r
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_result(cls, s, n, e="s4"):
|
||||||
|
r = ""
|
||||||
|
for i in range(n):
|
||||||
|
b = ((ord(s[i * 3]) << 16) | (ord(s[i * 3 + 1]))
|
||||||
|
<< 8) | ord(s[i * 3 + 2])
|
||||||
|
r += cls.generate_result_unit(b, e)
|
||||||
|
return r
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def generate_args_code(cls):
|
||||||
|
a = []
|
||||||
|
for j in range(24, -1, -8):
|
||||||
|
a.append(cls.__arguments[0] >> j)
|
||||||
|
a.append(cls.__arguments[1] / 256)
|
||||||
|
a.append(cls.__arguments[1] % 256)
|
||||||
|
a.append(cls.__arguments[1] >> 24)
|
||||||
|
a.append(cls.__arguments[1] >> 16)
|
||||||
|
for j in range(24, -1, -8):
|
||||||
|
a.append(cls.__arguments[2] >> j)
|
||||||
|
return [int(i) & 255 for i in a]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def rc4_encrypt(plaintext, key):
|
||||||
|
s = list(range(256))
|
||||||
|
j = 0
|
||||||
|
|
||||||
|
# Key Scheduling Algorithm (KSA)
|
||||||
|
for i in range(256):
|
||||||
|
j = (j + s[i] + ord(key[i % len(key)])) % 256
|
||||||
|
s[i], s[j] = s[j], s[i]
|
||||||
|
|
||||||
|
i = 0
|
||||||
|
j = 0
|
||||||
|
cipher = []
|
||||||
|
|
||||||
|
# Pseudo-Random Generation Algorithm (PRGA)
|
||||||
|
for k in range(len(plaintext)):
|
||||||
|
i = (i + 1) % 256
|
||||||
|
j = (j + s[i]) % 256
|
||||||
|
s[i], s[j] = s[j], s[i]
|
||||||
|
t = (s[i] + s[j]) % 256
|
||||||
|
cipher.append(chr(s[t] ^ ord(plaintext[k])))
|
||||||
|
|
||||||
|
return ''.join(cipher)
|
||||||
|
|
||||||
|
def get_value(self,
|
||||||
|
url_params: dict,
|
||||||
|
user_agent: str,
|
||||||
|
start_time=0,
|
||||||
|
end_time=0,
|
||||||
|
random_num_1=None,
|
||||||
|
random_num_2=None,
|
||||||
|
random_num_3=None,
|
||||||
|
) -> str:
|
||||||
|
string_1 = self.generate_string_1(
|
||||||
|
random_num_1,
|
||||||
|
random_num_2,
|
||||||
|
random_num_3,
|
||||||
|
)
|
||||||
|
string_2 = self.generate_string_2(
|
||||||
|
urlencode(url_params),
|
||||||
|
user_agent,
|
||||||
|
start_time,
|
||||||
|
end_time,
|
||||||
|
)
|
||||||
|
string = string_1 + string_2
|
||||||
|
return self.generate_result(
|
||||||
|
string, 40, "s4") + self.generate_result_end(string, "s4")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
bogus = ABogus()
|
||||||
|
USERAGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
|
||||||
|
url_str = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"
|
||||||
|
# 将url参数转换为字典
|
||||||
|
url_params = dict([param.split("=") for param in url_str.split("?")[1].split("&")])
|
||||||
|
print(f"URL参数: {url_params}")
|
||||||
|
a_bogus = bogus.get_value(url_params, USERAGENT)
|
||||||
|
# 使用url编码a_bogus
|
||||||
|
a_bogus = quote(a_bogus, safe='')
|
||||||
|
print(a_bogus)
|
||||||
|
print(USERAGENT)
|
||||||
|
|
@ -4,7 +4,7 @@ TokenManager:
|
||||||
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
|
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
|
||||||
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
|
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
|
||||||
Referer: https://www.douyin.com/
|
Referer: https://www.douyin.com/
|
||||||
Cookie: odin_tt=deb76f54241001639f1ebbb3bbdd3637c52604632821dea7f6413b1d0527957d;passport_fe_beating_status=false;sid_guard=c7845c8f01865cc93dcee7b32f8e64a3%7C1715033646%7C21600%7CTue%2C+07-May-2024+04%3A14%3A06+GMT;uid_tt=3a85f4bd9ba5573dcf39917c95135faa;uid_tt_ss=3a85f4bd9ba5573dcf39917c95135faa;sid_tt=c7845c8f01865cc93dcee7b32f8e64a3;sessionid=c7845c8f01865cc93dcee7b32f8e64a3;sessionid_ss=c7845c8f01865cc93dcee7b32f8e64a3;sid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;ssid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;passport_assist_user=; ttwid=1%7CbfT5_gVNmSYDxhSIwlPZJhBGSdN6dx98CLMd336o8Cs%7C1715033645%7Ceefdce4479938326bd878311d974fe92c6a0d014b89345b3687ead20e6e68b53
|
Cookie: __ac_nonce=0666b92b000a2c224ac28; __ac_signature=_02B4Z6wo00f01cJo1cwAAIDC-hz88a728VnCWdFAABbzbc; ttwid=1%7C3mHLmtqu19mj4mwynGHoMV69QN2dnPid7GkoF6qMGxg%7C1718325937%7C1175da4da9c5aedc0f298981771e3ceb96bb26b590d93d0c23eaf0bb5ecd2d25; douyin.com; device_web_cpu_core=16; device_web_memory_size=-1; architecture=amd64; IsDouyinActive=true; home_can_add_dy_2_desktop=%220%22; dy_swidth=1835; dy_sheight=1147; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1835%2C%5C%22screen_height%5C%22%3A1147%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A16%2C%5C%22device_memory%5C%22%3A0%2C%5C%22downlink%5C%22%3A%5C%22%5C%22%2C%5C%22effective_type%5C%22%3A%5C%22%5C%22%2C%5C%22round_trip_time%5C%22%3A0%7D%22; strategyABtestKey=%221718325939.224%22; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.5%7D; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A1%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A1%7D%22; xgplayer_user_id=778628299652; csrf_session_id=120d8aacffb06addd01cb40859003c8e; passport_csrf_token=6f9c9a1bc411c0e6b5c8e5bee6622f91; passport_csrf_token_default=6f9c9a1bc411c0e6b5c8e5bee6622f91; s_v_web_id=verify_lxdywd34_SU6sqPg8_fjkN_4ldR_BMvz_wvgDZPXkm5fY; msToken=y09BW1cI9bHiuOMAYN0mqoVkihUmHlKs_YaKQdTxtBCekbSed8UidXPK74QjPNgszAmYDSKy5aF1ns1f3L5GazwXUISTHgj2x9Bne9p2; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; xg_device_score=Infinity; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCTEhjWkJWemp2MUZRbXY2ZHY5dmtGcVN2eHlqa2ZVZU1laXVtaTRzblh5T2VNSHdhbzNWS1pialYxRHN3VjlLYW9iVk1ROEJDMjQvOVRueHhTY0J1Z0k9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; bd_ticket_guard_client_web_domain=2
|
||||||
|
|
||||||
proxies:
|
proxies:
|
||||||
http:
|
http:
|
||||||
|
|
|
||||||
|
|
@ -31,27 +31,25 @@
|
||||||
# - https://github.com/Johnserf-Seed
|
# - https://github.com/Johnserf-Seed
|
||||||
#
|
#
|
||||||
# ==============================================================================
|
# ==============================================================================
|
||||||
|
import asyncio
|
||||||
|
|
||||||
import re
|
|
||||||
import json
|
import json
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
import re
|
||||||
import time
|
import time
|
||||||
|
import urllib
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Union
|
||||||
|
from urllib.parse import urlencode, quote
|
||||||
|
|
||||||
|
import execjs
|
||||||
import httpx
|
import httpx
|
||||||
import qrcode
|
import qrcode
|
||||||
import random
|
|
||||||
import asyncio
|
|
||||||
import yaml
|
import yaml
|
||||||
|
|
||||||
from typing import Union
|
from crawlers.douyin.web.xbogus import XBogus as XB
|
||||||
from pathlib import Path
|
from crawlers.douyin.web.abogus import ABogus as AB
|
||||||
|
|
||||||
from crawlers.utils.logger import logger
|
|
||||||
from crawlers.utils.utils import (
|
|
||||||
gen_random_str,
|
|
||||||
get_timestamp,
|
|
||||||
extract_valid_urls,
|
|
||||||
split_filename,
|
|
||||||
)
|
|
||||||
from crawlers.utils.api_exceptions import (
|
from crawlers.utils.api_exceptions import (
|
||||||
APIError,
|
APIError,
|
||||||
APIConnectionError,
|
APIConnectionError,
|
||||||
|
|
@ -60,11 +58,13 @@ from crawlers.utils.api_exceptions import (
|
||||||
APIUnauthorizedError,
|
APIUnauthorizedError,
|
||||||
APINotFoundError,
|
APINotFoundError,
|
||||||
)
|
)
|
||||||
|
from crawlers.utils.logger import logger
|
||||||
from crawlers.douyin.web.xbogus import XBogus as XB
|
from crawlers.utils.utils import (
|
||||||
|
gen_random_str,
|
||||||
from urllib.parse import quote
|
get_timestamp,
|
||||||
import os
|
extract_valid_urls,
|
||||||
|
split_filename,
|
||||||
|
)
|
||||||
|
|
||||||
# 配置文件路径
|
# 配置文件路径
|
||||||
# Read the configuration file
|
# Read the configuration file
|
||||||
|
|
@ -234,6 +234,8 @@ class VerifyFpManager:
|
||||||
|
|
||||||
|
|
||||||
class BogusManager:
|
class BogusManager:
|
||||||
|
|
||||||
|
# 字符串方法生成X-Bogus参数
|
||||||
@classmethod
|
@classmethod
|
||||||
def xb_str_2_endpoint(cls, endpoint: str, user_agent: str) -> str:
|
def xb_str_2_endpoint(cls, endpoint: str, user_agent: str) -> str:
|
||||||
try:
|
try:
|
||||||
|
|
@ -243,6 +245,7 @@ class BogusManager:
|
||||||
|
|
||||||
return final_endpoint[0]
|
return final_endpoint[0]
|
||||||
|
|
||||||
|
# 字典方法生成X-Bogus参数
|
||||||
@classmethod
|
@classmethod
|
||||||
def xb_model_2_endpoint(cls, base_endpoint: str, params: dict, user_agent: str) -> str:
|
def xb_model_2_endpoint(cls, base_endpoint: str, params: dict, user_agent: str) -> str:
|
||||||
if not isinstance(params, dict):
|
if not isinstance(params, dict):
|
||||||
|
|
@ -262,6 +265,44 @@ class BogusManager:
|
||||||
|
|
||||||
return final_endpoint
|
return final_endpoint
|
||||||
|
|
||||||
|
# 字符串方法生成A-Bogus参数
|
||||||
|
# TODO: 未完成测试,暂时不提交至主分支。
|
||||||
|
@classmethod
|
||||||
|
def ab_str_2_endpoint_js_ver(cls, endpoint: str, user_agent: str) -> str:
|
||||||
|
try:
|
||||||
|
# 获取请求参数
|
||||||
|
endpoint_query_params = urllib.parse.urlparse(endpoint).query
|
||||||
|
# 确定A-Bogus JS文件路径
|
||||||
|
js_path = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
a_bogus_js_path = os.path.join(js_path, 'a_bogus.js')
|
||||||
|
with open(a_bogus_js_path, 'r', encoding='utf-8') as file:
|
||||||
|
js_code = file.read()
|
||||||
|
# 此处需要使用Node环境
|
||||||
|
# - 安装Node.js
|
||||||
|
# - 安装execjs库
|
||||||
|
# - 安装NPM依赖
|
||||||
|
# - npm install jsdom
|
||||||
|
node_runtime = execjs.get('Node')
|
||||||
|
context = node_runtime.compile(js_code)
|
||||||
|
arg = [0, 1, 0, endpoint_query_params, "", user_agent]
|
||||||
|
a_bougus = quote(context.call('get_a_bogus', arg), safe='')
|
||||||
|
return a_bougus
|
||||||
|
except Exception as e:
|
||||||
|
raise RuntimeError("生成A-Bogus失败: {0})".format(e))
|
||||||
|
|
||||||
|
# 字典方法生成A-Bogus参数,感谢 @JoeanAmier 提供的纯Python版本算法。
|
||||||
|
@classmethod
|
||||||
|
def ab_model_2_endpoint(cls, params: dict, user_agent: str) -> str:
|
||||||
|
if not isinstance(params, dict):
|
||||||
|
raise TypeError("参数必须是字典类型")
|
||||||
|
|
||||||
|
try:
|
||||||
|
ab_value = AB().get_value(params, user_agent)
|
||||||
|
except Exception as e:
|
||||||
|
raise RuntimeError("生成A-Bogus失败: {0})".format(e))
|
||||||
|
|
||||||
|
return quote(ab_value, safe='')
|
||||||
|
|
||||||
|
|
||||||
class SecUserIdFetcher:
|
class SecUserIdFetcher:
|
||||||
# 预编译正则表达式
|
# 预编译正则表达式
|
||||||
|
|
|
||||||
|
|
@ -34,16 +34,23 @@
|
||||||
|
|
||||||
|
|
||||||
import asyncio # 异步I/O
|
import asyncio # 异步I/O
|
||||||
|
import os # 系统操作
|
||||||
import time # 时间操作
|
import time # 时间操作
|
||||||
|
from urllib.parse import urlencode, quote # URL编码
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
import yaml # 配置文件
|
import yaml # 配置文件
|
||||||
import os # 系统操作
|
|
||||||
|
|
||||||
# 基础爬虫客户端和抖音API端点
|
# 基础爬虫客户端和抖音API端点
|
||||||
from crawlers.base_crawler import BaseCrawler
|
from crawlers.base_crawler import BaseCrawler
|
||||||
from crawlers.douyin.web.endpoints import DouyinAPIEndpoints
|
from crawlers.douyin.web.endpoints import DouyinAPIEndpoints
|
||||||
|
# 抖音接口数据请求模型
|
||||||
|
from crawlers.douyin.web.models import (
|
||||||
|
BaseRequestModel, LiveRoomRanking, PostComments,
|
||||||
|
PostCommentsReply, PostDetail,
|
||||||
|
UserProfile, UserCollection, UserLike, UserLive,
|
||||||
|
UserLive2, UserMix, UserPost
|
||||||
|
)
|
||||||
# 抖音应用的工具类
|
# 抖音应用的工具类
|
||||||
from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取
|
from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取
|
||||||
BogusManager, # XBogus管理
|
BogusManager, # XBogus管理
|
||||||
|
|
@ -54,14 +61,6 @@ from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取
|
||||||
extract_valid_urls # URL提取
|
extract_valid_urls # URL提取
|
||||||
)
|
)
|
||||||
|
|
||||||
# 抖音接口数据请求模型
|
|
||||||
from crawlers.douyin.web.models import (
|
|
||||||
BaseRequestModel, LiveRoomRanking, PostComments,
|
|
||||||
PostCommentsReply, PostDanmaku, PostDetail,
|
|
||||||
UserProfile, UserCollection, UserLike, UserLive,
|
|
||||||
UserLive2, UserMix, UserPost
|
|
||||||
)
|
|
||||||
|
|
||||||
# 配置文件路径
|
# 配置文件路径
|
||||||
path = os.path.abspath(os.path.dirname(__file__))
|
path = os.path.abspath(os.path.dirname(__file__))
|
||||||
|
|
||||||
|
|
@ -98,9 +97,17 @@ class DouyinWebCrawler:
|
||||||
# 创建一个作品详情的BaseModel参数
|
# 创建一个作品详情的BaseModel参数
|
||||||
params = PostDetail(aweme_id=aweme_id)
|
params = PostDetail(aweme_id=aweme_id)
|
||||||
# 生成一个作品详情的带有加密参数的Endpoint
|
# 生成一个作品详情的带有加密参数的Endpoint
|
||||||
endpoint = BogusManager.xb_model_2_endpoint(
|
# 2024年6月12日22:41:44 由于XBogus加密已经失效,所以不再使用XBogus加密参数,转移至a_bogus加密参数。
|
||||||
DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"]
|
# endpoint = BogusManager.xb_model_2_endpoint(
|
||||||
)
|
# DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"]
|
||||||
|
# )
|
||||||
|
|
||||||
|
# 生成一个作品详情的带有a_bogus加密参数的Endpoint
|
||||||
|
params_dict = params.dict()
|
||||||
|
params_dict["msToken"] = ''
|
||||||
|
a_bogus = BogusManager.ab_model_2_endpoint(params_dict, kwargs["headers"]["User-Agent"])
|
||||||
|
endpoint = f"{DouyinAPIEndpoints.POST_DETAIL}?{urlencode(params_dict)}&a_bogus={a_bogus}"
|
||||||
|
|
||||||
response = await crawler.fetch_get_json(endpoint)
|
response = await crawler.fetch_get_json(endpoint)
|
||||||
return response
|
return response
|
||||||
|
|
||||||
|
|
@ -239,19 +246,6 @@ class DouyinWebCrawler:
|
||||||
|
|
||||||
"-------------------------------------------------------utils接口列表-------------------------------------------------------"
|
"-------------------------------------------------------utils接口列表-------------------------------------------------------"
|
||||||
|
|
||||||
# 获取抖音Web的游客Cookie
|
|
||||||
async def fetch_douyin_web_guest_cookie(self, user_agent: str):
|
|
||||||
headers = {
|
|
||||||
'User-Agent': user_agent,
|
|
||||||
'Cookie': ''
|
|
||||||
}
|
|
||||||
async with httpx.AsyncClient() as client:
|
|
||||||
domain = "https://beta.tikhub.io"
|
|
||||||
uri = "/api/v1/douyin/web/fetch_douyin_web_guest_cookie"
|
|
||||||
url = f"{domain}{uri}?user_agent={user_agent}"
|
|
||||||
response = await client.get(url, headers=headers)
|
|
||||||
return response.json().get("data")
|
|
||||||
|
|
||||||
# 生成真实msToken
|
# 生成真实msToken
|
||||||
async def gen_real_msToken(self, ):
|
async def gen_real_msToken(self, ):
|
||||||
result = {
|
result = {
|
||||||
|
|
@ -290,6 +284,21 @@ class DouyinWebCrawler:
|
||||||
}
|
}
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
# 使用接口地址生成Ab参数
|
||||||
|
async def get_a_bogus(self, url: str, user_agent: str):
|
||||||
|
endpoint = url.split("?")[0]
|
||||||
|
# 将URL参数转换为dict
|
||||||
|
params = dict([i.split("=") for i in url.split("?")[1].split("&")])
|
||||||
|
# 去除URL中的msToken参数
|
||||||
|
params["msToken"] = ""
|
||||||
|
a_bogus = BogusManager.ab_model_2_endpoint(params, user_agent)
|
||||||
|
result = {
|
||||||
|
"url": f"{endpoint}?{urlencode(params)}&a_bogus={a_bogus}",
|
||||||
|
"a_bogus": a_bogus,
|
||||||
|
"user_agent": user_agent
|
||||||
|
}
|
||||||
|
return result
|
||||||
|
|
||||||
# 提取单个用户id
|
# 提取单个用户id
|
||||||
async def get_sec_user_id(self, url: str):
|
async def get_sec_user_id(self, url: str):
|
||||||
return await SecUserIdFetcher.get_sec_user_id(url)
|
return await SecUserIdFetcher.get_sec_user_id(url)
|
||||||
|
|
|
||||||
|
|
@ -1,3 +1,36 @@
|
||||||
|
# ==============================================================================
|
||||||
|
# Copyright (C) 2021 Evil0ctal
|
||||||
|
#
|
||||||
|
# This file is part of the Douyin_TikTok_Download_API project.
|
||||||
|
#
|
||||||
|
# This project is licensed under the Apache License 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at:
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
# ==============================================================================
|
||||||
|
# __
|
||||||
|
# /> フ
|
||||||
|
# | _ _ l
|
||||||
|
# /` ミ_xノ
|
||||||
|
# / | Feed me Stars ⭐ ️
|
||||||
|
# / ヽ ノ
|
||||||
|
# │ | | |
|
||||||
|
# / ̄| | | |
|
||||||
|
# | ( ̄ヽ__ヽ_)__)
|
||||||
|
# \二つ
|
||||||
|
# ==============================================================================
|
||||||
|
#
|
||||||
|
# Contributor Link:
|
||||||
|
# - https://github.com/Evil0ctal
|
||||||
|
#
|
||||||
|
# ==============================================================================
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
|
||||||
from crawlers.douyin.web.web_crawler import DouyinWebCrawler # 导入抖音Web爬虫
|
from crawlers.douyin.web.web_crawler import DouyinWebCrawler # 导入抖音Web爬虫
|
||||||
|
|
@ -24,9 +57,10 @@ class HybridCrawler:
|
||||||
elif "tiktok" in url:
|
elif "tiktok" in url:
|
||||||
platform = "tiktok"
|
platform = "tiktok"
|
||||||
aweme_id = await self.TikTokWebCrawler.get_aweme_id(url)
|
aweme_id = await self.TikTokWebCrawler.get_aweme_id(url)
|
||||||
data = await self.TikTokAPPCrawler.fetch_one_video(aweme_id)
|
data = await self.TikTokWebCrawler.fetch_one_video(aweme_id)
|
||||||
# $.aweme_type
|
data = data.get("itemInfo").get("itemStruct")
|
||||||
aweme_type = data.get("aweme_type")
|
# $.imagePost exists if aweme_type is photo
|
||||||
|
aweme_type = 150 if data.get("imagePost") else 1
|
||||||
else:
|
else:
|
||||||
raise ValueError("hybrid_parsing_single_video: Cannot judge the video source from the URL.")
|
raise ValueError("hybrid_parsing_single_video: Cannot judge the video source from the URL.")
|
||||||
|
|
||||||
|
|
@ -124,14 +158,14 @@ class HybridCrawler:
|
||||||
# TikTok视频数据处理/TikTok video data processing
|
# TikTok视频数据处理/TikTok video data processing
|
||||||
if url_type == 'video':
|
if url_type == 'video':
|
||||||
# 将信息储存在字典中/Store information in a dictionary
|
# 将信息储存在字典中/Store information in a dictionary
|
||||||
wm_video = data['video']['download_addr']['url_list'][0]
|
wm_video = data['video']['downloadAddr']
|
||||||
api_data = {
|
api_data = {
|
||||||
'video_data':
|
'video_data':
|
||||||
{
|
{
|
||||||
'wm_video_url': wm_video,
|
'wm_video_url': wm_video,
|
||||||
'wm_video_url_HQ': wm_video,
|
'wm_video_url_HQ': wm_video,
|
||||||
'nwm_video_url': data['video']['play_addr']['url_list'][0],
|
'nwm_video_url': data['video']['playAddr'],
|
||||||
'nwm_video_url_HQ': data['video']['bit_rate'][0]['play_addr']['url_list'][0]
|
'nwm_video_url_HQ': data['video']['bitrateInfo'][0]['PlayAddr']['UrlList'][0]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
# TikTok图片数据处理/TikTok image data processing
|
# TikTok图片数据处理/TikTok image data processing
|
||||||
|
|
@ -140,9 +174,9 @@ class HybridCrawler:
|
||||||
no_watermark_image_list = []
|
no_watermark_image_list = []
|
||||||
# 有水印图片列表/With watermark image list
|
# 有水印图片列表/With watermark image list
|
||||||
watermark_image_list = []
|
watermark_image_list = []
|
||||||
for i in data['image_post_info']['images']:
|
for i in data['imagePost']['images']:
|
||||||
no_watermark_image_list.append(i['display_image']['url_list'][0])
|
no_watermark_image_list.append(i['imageURL']['urlList'][0])
|
||||||
watermark_image_list.append(i['owner_watermark_image']['url_list'][0])
|
# watermark_image_list.append(i['owner_watermark_image']['url_list'][0])
|
||||||
api_data = {
|
api_data = {
|
||||||
'image_data':
|
'image_data':
|
||||||
{
|
{
|
||||||
|
|
@ -158,6 +192,7 @@ class HybridCrawler:
|
||||||
# 测试混合解析单一视频接口/Test hybrid parsing single video endpoint
|
# 测试混合解析单一视频接口/Test hybrid parsing single video endpoint
|
||||||
# url = "https://v.douyin.com/L4FJNR3/"
|
# url = "https://v.douyin.com/L4FJNR3/"
|
||||||
url = "https://www.tiktok.com/@evil0ctal/video/7156033831819037994"
|
url = "https://www.tiktok.com/@evil0ctal/video/7156033831819037994"
|
||||||
|
# url = "https://www.tiktok.com/@minecraft/photo/7369296852669205791"
|
||||||
minimal = True
|
minimal = True
|
||||||
result = await self.hybrid_parsing_single_video(url, minimal=minimal)
|
result = await self.hybrid_parsing_single_video(url, minimal=minimal)
|
||||||
print(result)
|
print(result)
|
||||||
|
|
|
||||||
|
|
@ -48,6 +48,9 @@ from crawlers.tiktok.app.models import (
|
||||||
BaseRequestModel, FeedVideoDetail
|
BaseRequestModel, FeedVideoDetail
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# 标记已废弃的方法
|
||||||
|
from crawlers.utils.deprecated import deprecated
|
||||||
|
|
||||||
# 配置文件路径
|
# 配置文件路径
|
||||||
path = os.path.abspath(os.path.dirname(__file__))
|
path = os.path.abspath(os.path.dirname(__file__))
|
||||||
|
|
||||||
|
|
@ -74,6 +77,7 @@ class TikTokAPPCrawler:
|
||||||
"""-------------------------------------------------------handler接口列表-------------------------------------------------------"""
|
"""-------------------------------------------------------handler接口列表-------------------------------------------------------"""
|
||||||
|
|
||||||
# 获取单个作品数据
|
# 获取单个作品数据
|
||||||
|
@deprecated("TikTok APP fetch_one_video is deprecated and will be removed in a future release. Use Web API instead. | TikTok APP fetch_one_video 已弃用,将在将来的版本中删除。请改用Web API。")
|
||||||
async def fetch_one_video(self, aweme_id: str):
|
async def fetch_one_video(self, aweme_id: str):
|
||||||
# 获取TikTok的实时Cookie
|
# 获取TikTok的实时Cookie
|
||||||
kwargs = await self.get_tiktok_headers()
|
kwargs = await self.get_tiktok_headers()
|
||||||
|
|
|
||||||
|
|
@ -3,7 +3,7 @@ TokenManager:
|
||||||
headers:
|
headers:
|
||||||
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
|
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
|
||||||
Referer: https://www.tiktok.com/
|
Referer: https://www.tiktok.com/
|
||||||
Cookie: tt_csrf_token=YmksDB6a-h4cT2fF7JpORI2O9UBMCWjsntIc; ttwid=1%7C0FVb9fFc-sjDG2UdJwdC1AirqYozQ0xfbAS4N72vN2Y%7C1713886256%7C78a9d83445b82b73ca8d4e0cf024ea6cdf1329b7f3866c826b0a69a300ebce46; ak_bmsc=51B1D53481A3A4E4D0CEFF2BCF622DA2~000000000000000000000000000000~YAAQ7uIsF6c4j+SOAQAAANmUCxfRGVXZ4D9xnO97l1yDw0OWyomnVkNY7IUKaggUja0kQzFQ+WG4xaxBcPt0AN0n26KeHXGGKgHYpHPUMUBHGHQGDtE4RLyy7U+LPbSJCqVaSDiPuzxHht0YUIbWogvrFmBfkP4ohcmjkZxWtEI9qQ4Whaobb2CFHGdKNt0zlVNBjJQ3uYRAvUe12zSBynQB18y6QhE8goneRkCEw9VIeft2pFIwNQ8tkWWEjDt6wHNaqeND7eASg5WLzYskWbTt6bPAOhSNRLJ38HZrOB5QNg+xxN5uuCSYmjMXCl8SkvQr91pInmOng+V898FLLBQtefs95whvbpfE0mKwBk5Cz2TkkHcUJa/IoC0CLmNqoEk3AtKxpw/J; tt_chain_token=46Xkv2ukMzyJ2e7XU7y0AQ==; bm_sv=A2E67B998DE8E6A4F1C2C02485467446~YAAQ7uIsF6g4j+SOAQAABdqUCxf1J/K4dYG0k7bbw2m5rFujdlSqMoCKDubu4R602nFvbY6zWC5puJczBv3IXwJJRpQxxR03wDCMVlKTCqjQvgDs8BoCuoNQxfY2fdS+F3bKut2lxXPQ2qctqz4kHBrgspJArHn/zu/IuKCIeSzmV4KcyxW6Zvw3/xMRA0MeHgyuHsTRBS+VrFk8Ju2NbJWWC8uSHbLCM/dhFT7/ktw8RE30r24XpQmhLpVTsUSC~1; tiktok_webapp_theme=light; msToken=ySXERzKCE0QUG0cCg6TWLw3wfEB-6kh6kAfuzhzjcQvmV1jBFloSgIsT9xk-QXFVdI99U1Fqb9mhUpIOldoDkjdZwskB8rvt66MHZaHnvBRZRtOKtTYsWT8osDyQXDVZWdPkvyE598h9; passport_csrf_token=1a47d95ebf68fc3648b0018ee75afc9f; passport_csrf_token_default=1a47d95ebf68fc3648b0018ee75afc9f; perf_feed_cache={%22expireTimestamp%22:1714057200000%2C%22itemIds%22:[%227346425092966206766%22%2C%227353812964207594795%22%2C%227343343741916171563%22]}; msToken=yWwG-ITrCnjJbx5ltBa9FTXdCImOJrl-wtQJSQH3afeEumWZcbo_qcrF6F7-NjYcrG6JVxtJiOU208REZeCSgXEZrrs5_65K741fQ7PSzCGOhz6vUyycq3Xvj4Mu-S0kJ6SqyltHnpJp
|
Cookie: tt_csrf_token=bwnaRGd9-B-0ce8ntqw9jtGzAdvzTRKNpBl0; ak_bmsc=75A1956756DE42FD14ED069AAE7A8780~000000000000000000000000000000~YAAQXCw+F8jpmBGQAQAAIfGsFBj+ZEGzR/ZeiuPpMtItu0QQUQRmjBX2kADliy6QA9rZSfrxRUZc9zuRrI4/xbIrAwA/nkdguGpa+v3QSn/1sk5uP2aqLVm0eYB/SGNafa2h2QvIPbLNiSCRhgq1GalZJL4+udqDnyBRJWE74nin74bZwrVDvCX1s8M2hWqZ9/jTkdm4sfwON9MdJIEtjAPlddQ4gxoqjPoWhfnrm24dhPT4OjL1B8QP1mgurj7zJGspqD53VcjkAl65gHVxp3dwZ5WbPYpqrh9j8wo2u/Wh6uhX+0HWmkv5yVZyTyYQTl3/ilPp9G4CuIUi84gaPLjNYea9AEnphNX0ywzDa6/yegfqyE6r3wqBBDCrR1xRM98YEB4A5PV7pw==; tt_chain_token=ljZFLdRDfyfDflXMg5XGpg==; tiktok_webapp_theme_auto_dark_ab=1; tiktok_webapp_theme=dark; perf_feed_cache={%22expireTimestamp%22:1718503200000%2C%22itemIds%22:[%227348816520216186158%22%2C%227356022137678810410%22%2C%227349561209340857630%22]}; s_v_web_id=verify_lxe3l432_JnDE5WWo_URef_4WrS_88IM_fd1CqEXZs4dZ; passport_csrf_token=af197f073ed95f4dc2636f24d55566a6; passport_csrf_token_default=af197f073ed95f4dc2636f24d55566a6; ttwid=1%7CuNT4GcgvvOjH8rTETh9d9xti_QDJjlcnSK2V7djIpuc%7C1718333954%7Cf81b989a495aedff91302da4d0a3ab6055dea486fb203a4326b37d5a5346ad0c; msToken=1Mhpyi8MlaZjM6bbLDVUhCj_6C0kEO_1_Nb62ByXLg7wy_vLnBxdMFpKclhf4HYnEjCghk2Gq47ZM5jPj3L1yFxQUZvq4oPLo1b2Wfe_33RE94uIxdiR-eSueWbcYDDgOj1Pn9Wyid5Uf5fzBQ7xxFA=; bm_sv=9ADBA7BE06EC41817F117E2279F1410C~YAAQXCw+F8bsmBGQAQAAzSewFBg2fP3Zd0aky2x7S13D97O64xi8EXhoKORBnPQyCHlh0iSlh63FFjoy6peDWaF3lkWaTly3Z7I7WvWk1GCntnYzpJaSCE5EO2OL38zPWpHcgGWuekluvptHXsheedNEefN4SUHVMt4jJynWNeTKrao0RmNLkH4zGs7QO6+MPCt94QFvNfLjBRr0wVcXlN/hx9m6kcvCyzsBBqEnpugoYvZ0SMA+INsKI5PDfQz1~1; msToken=449_l3kdcLmnEHdDP0uACa5EcPVL1NbpjyVv8yah61EwxIPZRDlGwpGIkpIjH0Tk-CDtoKwFrDdP1v2AOpwmdoIz5oQzPEXCdyfGzcVXCHbwMX1fwPxMHpea5yFPUYEDlNWaCFlgLnejRdWeN5sB_lE=
|
||||||
|
|
||||||
proxies:
|
proxies:
|
||||||
http:
|
http:
|
||||||
|
|
|
||||||
|
|
@ -22,7 +22,7 @@ class BaseRequestModel(BaseModel):
|
||||||
)
|
)
|
||||||
channel: str = "tiktok_web"
|
channel: str = "tiktok_web"
|
||||||
cookie_enabled: str = "true"
|
cookie_enabled: str = "true"
|
||||||
device_id: int = 7349090360347690538
|
device_id: int = 7380187414842836523
|
||||||
device_platform: str = "web_pc"
|
device_platform: str = "web_pc"
|
||||||
focus_state: str = "true"
|
focus_state: str = "true"
|
||||||
from_page: str = "user"
|
from_page: str = "user"
|
||||||
|
|
|
||||||
|
|
@ -430,15 +430,17 @@ class SecUserIdFetcher:
|
||||||
class AwemeIdFetcher:
|
class AwemeIdFetcher:
|
||||||
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715
|
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715
|
||||||
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715?is_from_webapp=1&sender_device=pc&web_id=7306060721837852167
|
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715?is_from_webapp=1&sender_device=pc&web_id=7306060721837852167
|
||||||
|
# https://www.tiktok.com/@zoyapea5/photo/7370061866879454469
|
||||||
|
|
||||||
# 预编译正则表达式
|
# 预编译正则表达式
|
||||||
_TIKTOK_AWEMEID_PARREN = re.compile(r"video/(\d*)")
|
_TIKTOK_AWEMEID_PATTERN = re.compile(r"video/(\d+)")
|
||||||
_TIKTOK_NOTFOUND_PARREN = re.compile(r"notfound")
|
_TIKTOK_PHOTOID_PATTERN = re.compile(r"photo/(\d+)")
|
||||||
|
_TIKTOK_NOTFOUND_PATTERN = re.compile(r"notfound")
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
async def get_aweme_id(cls, url: str) -> str:
|
async def get_aweme_id(cls, url: str) -> str:
|
||||||
"""
|
"""
|
||||||
获取TikTok作品aweme_id
|
获取TikTok作品aweme_id或photo_id
|
||||||
Args:
|
Args:
|
||||||
url: 作品链接
|
url: 作品链接
|
||||||
Return:
|
Return:
|
||||||
|
|
@ -453,11 +455,27 @@ class AwemeIdFetcher:
|
||||||
url = extract_valid_urls(url)
|
url = extract_valid_urls(url)
|
||||||
|
|
||||||
if url is None:
|
if url is None:
|
||||||
raise (
|
raise APINotFoundError("输入的URL不合法。类名:{0}".format(cls.__name__))
|
||||||
APINotFoundError("输入的URL不合法。类名:{0}".format(cls.__name__))
|
|
||||||
)
|
|
||||||
|
|
||||||
transport = httpx.AsyncHTTPTransport(retries=5)
|
# 处理不是短连接的情况
|
||||||
|
if "tiktok" and "@" in url:
|
||||||
|
print(f"输入的URL无需重定向: {url}")
|
||||||
|
video_match = cls._TIKTOK_AWEMEID_PATTERN.search(url)
|
||||||
|
photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(url)
|
||||||
|
|
||||||
|
if not video_match and not photo_match:
|
||||||
|
raise APIResponseError("未在响应中找到 aweme_id 或 photo_id")
|
||||||
|
|
||||||
|
aweme_id = video_match.group(1) if video_match else photo_match.group(1)
|
||||||
|
|
||||||
|
if aweme_id is None:
|
||||||
|
raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(url))
|
||||||
|
|
||||||
|
return aweme_id
|
||||||
|
|
||||||
|
# 处理短连接的情况,根据重定向后的链接获取aweme_id
|
||||||
|
print(f"输入的URL需要重定向: {url}")
|
||||||
|
transport = httpx.AsyncHTTPTransport(retries=10)
|
||||||
async with httpx.AsyncClient(
|
async with httpx.AsyncClient(
|
||||||
transport=transport, proxies=TokenManager.proxies, timeout=10
|
transport=transport, proxies=TokenManager.proxies, timeout=10
|
||||||
) as client:
|
) as client:
|
||||||
|
|
@ -465,32 +483,28 @@ class AwemeIdFetcher:
|
||||||
response = await client.get(url, follow_redirects=True)
|
response = await client.get(url, follow_redirects=True)
|
||||||
|
|
||||||
if response.status_code in {200, 444}:
|
if response.status_code in {200, 444}:
|
||||||
if cls._TIKTOK_NOTFOUND_PARREN.search(str(response.url)):
|
if cls._TIKTOK_NOTFOUND_PATTERN.search(str(response.url)):
|
||||||
raise APINotFoundError("页面不可用,可能是由于区域限制(代理)造成的。类名: {0}"
|
raise APINotFoundError("页面不可用,可能是由于区域限制(代理)造成的。类名: {0}"
|
||||||
.format(cls.__name__)
|
.format(cls.__name__)
|
||||||
)
|
)
|
||||||
|
|
||||||
match = cls._TIKTOK_AWEMEID_PARREN.search(str(response.url))
|
video_match = cls._TIKTOK_AWEMEID_PATTERN.search(str(response.url))
|
||||||
if not match:
|
photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(str(response.url))
|
||||||
raise APIResponseError(
|
|
||||||
"未在响应中找到 {0}".format("aweme_id")
|
|
||||||
)
|
|
||||||
|
|
||||||
aweme_id = match.group(1)
|
if not video_match and not photo_match:
|
||||||
|
raise APIResponseError("未在响应中找到 aweme_id 或 photo_id")
|
||||||
|
|
||||||
|
aweme_id = video_match.group(1) if video_match else photo_match.group(1)
|
||||||
|
|
||||||
if aweme_id is None:
|
if aweme_id is None:
|
||||||
raise RuntimeError(
|
raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(response.url))
|
||||||
"获取 {0} 失败,{1}".format("aweme_id", response.url)
|
|
||||||
)
|
|
||||||
|
|
||||||
return aweme_id
|
return aweme_id
|
||||||
else:
|
else:
|
||||||
raise ConnectionError(
|
raise ConnectionError("接口状态码异常 {0},请检查重试".format(response.status_code))
|
||||||
"接口状态码异常 {0},请检查重试".format(response.status_code)
|
|
||||||
)
|
|
||||||
|
|
||||||
except httpx.RequestError as exc:
|
except httpx.RequestError as exc:
|
||||||
# 捕获所有与 httpx 请求相关的异常情况 (Captures all httpx request-related exceptions)
|
# 捕获所有与 httpx 请求相关的异常情况
|
||||||
raise APIConnectionError("请求端点失败,请检查当前网络环境。 链接:{0},代理:{1},异常类名:{2},异常详细信息:{3}"
|
raise APIConnectionError("请求端点失败,请检查当前网络环境。 链接:{0},代理:{1},异常类名:{2},异常详细信息:{3}"
|
||||||
.format(url, TokenManager.proxies, cls.__name__, exc)
|
.format(url, TokenManager.proxies, cls.__name__, exc)
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -343,9 +343,9 @@ class TikTokWebCrawler:
|
||||||
|
|
||||||
async def main(self):
|
async def main(self):
|
||||||
# 获取单个作品数据
|
# 获取单个作品数据
|
||||||
# item_id = "7339393672959757570"
|
item_id = "7369296852669205791"
|
||||||
# response = await self.fetch_one_video(item_id)
|
response = await self.fetch_one_video(item_id)
|
||||||
# print(response)
|
print(response)
|
||||||
|
|
||||||
# 获取用户的个人信息
|
# 获取用户的个人信息
|
||||||
# secUid = "MS4wLjABAAAAfDPs6wbpBcMMb85xkvDGdyyyVAUS2YoVCT9P6WQ1bpuwEuPhL9eFtTmGvxw1lT2C"
|
# secUid = "MS4wLjABAAAAfDPs6wbpBcMMb85xkvDGdyyyVAUS2YoVCT9P6WQ1bpuwEuPhL9eFtTmGvxw1lT2C"
|
||||||
|
|
|
||||||
18
crawlers/utils/deprecated.py
Normal file
18
crawlers/utils/deprecated.py
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
import warnings
|
||||||
|
import functools
|
||||||
|
|
||||||
|
|
||||||
|
def deprecated(message):
|
||||||
|
def decorator(func):
|
||||||
|
@functools.wraps(func)
|
||||||
|
async def wrapper(*args, **kwargs):
|
||||||
|
warnings.warn(
|
||||||
|
f"{func.__name__} is deprecated: {message}",
|
||||||
|
DeprecationWarning,
|
||||||
|
stacklevel=2
|
||||||
|
)
|
||||||
|
return await func(*args, **kwargs)
|
||||||
|
|
||||||
|
return wrapper
|
||||||
|
|
||||||
|
return decorator
|
||||||
Loading…
Reference in a new issue