🎨: Add Douyin Web A_Bogus encryption algorithm Support

This commit is contained in:
Evil0ctal 2024-06-14 00:49:56 -07:00
parent 5fffdfa7f3
commit 5b72b41d3b
16 changed files with 861 additions and 112 deletions

View file

@ -45,13 +45,38 @@
## 🔊 V4 版本备注
- 感兴趣一起写这个项目的给请加微信`Evil0ctal`备注github项目重构大家可以在群里互相交流学习不允许发广告以及违法的东西纯粹交朋友和技术交流。
- 本项目使用的`X-Bogus`算法依旧可以正常调用Douyin以及TikTok的API`A-Bogus`算法暂时不会开源
- 本项目使用`X-Bogus`算法以及`A_Bogus`算法请求抖音和TikTok的Web API
- 由于Douyin的风控部署完本项目后请在**浏览器中获取Douyin网站的Cookie然后在config.yaml中进行替换。**
- 请在提出issue之前先阅读下方的文档大多数问题的解决方法都会包含在文档中。
- 本项目是完全免费的,但使用时请遵守:[Apache-2.0 license](https://github.com/Evil0ctal/Douyin_TikTok_Download_API?tab=Apache-2.0-1-ov-file#readme)
- 本项目有一个闭源的分支版本,包含更多的接口和服务,详情请查看下方的信息。
## 🔖TikHub.io API
[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)是一个API平台提供包括Douyin、TikTok在内的各种公开数据接口如果您想支持 [Douyin_TikTok_Download_API](https://github.com/Evil0ctal/Douyin_TikTok_Download_API) 项目的开发,我们强烈建议您选择[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)。
#### 特点:
> 📦 开箱即用
省去繁琐的使用流程使用封装好的SDK快速进行开发让调用变得更简单所有API接口都按照OpenAPI规范进行编写并且附带示例参数。
> 💰 成本优势
不预设套餐限制,没有月度使用门槛,所有消费按实际使用量即时计费,并且根据用户每日的请求量进行阶梯式计费,同时可以通过每日签到在用户后台进行签到获取免费的额度,并且这些免费额度不会过期。
> ⚡️ 快速支持
我们有一个庞大的Discord社区服务器管理员和其他用户会在服务器中快速的回复你帮助你快速解决当前的问题。
> 🎉 拥抱开源
TikHub的部分源代码会开源在Github上并且会赞助一些开源项目的作者。
#### 链接:
- Discord: [TikHub Discord](https://discord.com/invite/aMEAS8Xsvz)
- Free Douyin/TikTok API: [TikHub Beta API](https://beta.tikhub.io/)
- Register: [TikHub signup](https://beta-web.tikhub.io/en-us/users/signup)
- API Docs: [TikHub API Docs](https://api.tikhub.io/)
## 🖥演示站点: 我很脆弱...请勿压测(·•᷄ࡇ•᷅
@ -95,21 +120,21 @@
```
./Douyin_TikTok_Download_API
├─app
│ ├─api
│ │ ├─endpoints
│ │ └─models
│ ├─download
│ └─web
│ └─views
└─crawlers
├─douyin
│ └─web
├─hybrid
├─tiktok
│ ├─app
│ └─web
└─utils
├─app
│ ├─api
│ │ ├─endpoints
│ │ └─models
│ ├─download
│ └─web
│ └─views
└─crawlers
├─douyin
│ └─web
├─hybrid
├─tiktok
│ ├─app
│ └─web
└─utils
```
## ✨支持功能:
@ -121,6 +146,7 @@
- 完善的API文档([Demo/演示](https://api.douyin.wtf/docs))
- 丰富的API接口
- 抖音网页版API
- [x] 视频数据解析
- [x] 获取用户主页作品数据
- [x] 获取用户主页喜欢作品数据
@ -136,14 +162,15 @@
- [x] 生成verify_fp
- [x] 生成s_v_web_id
- [x] 使用接口网址生成X-Bogus参数
- [x] 使用接口网址生成A_Bogus参数
- [x] 提取单个用户id
- [x] 提取列表用户id
- [x] 提取单个作品id
- [x] 提取列表作品id
- [x] 提取列表直播间号
- [x] 提取列表直播间号
- TikTok网页版API
- [x] 视频数据解析
- [x] 获取用户主页作品数据
- [x] 获取用户主页喜欢作品数据
@ -165,7 +192,6 @@
- [x] 获取用户unique_id
- [x] 获取列表unique_id
---
## 📦调用解析库(已废弃需要更新):
@ -257,7 +283,6 @@ https://www.tiktok.com/@evil0ctal/video/7156033831819037994
***更多演示请查看文档内容......***
## ⚠️部署前的准备工作(请仔细阅读)
- 你需要自行解决爬虫Cookie风控问题否则可能会导致接口无法使用。
@ -267,6 +292,7 @@ https://www.tiktok.com/@evil0ctal/video/7156033831819037994
- https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/30e56e5a7f97f87d60b1045befb1f6db147f8590/crawlers/tiktok/web/config.yaml#L6
- 演示站点的在线下载功能被我关掉了,有人下的视频巨大无比直接给我服务器干崩了,你可以在网页解析结果页面右键保存视频...
- 演示站点的Cookie是我自己的不保证长期有效只起到演示作用自己部署的话请自行获取Cookie。
- 需要TikTok Web API返回的视频链接直接访问会发生HTTP 403错误请使用本项目API中的`/api/download`接口对TikTok 视频进行下载,这个接口在演示站点中已经被手动关闭了,需要你自行部署本项目。
- 这里有一个**视频教程**可以参考:***[https://www.bilibili.com/video/BV1vE421j7NR/](https://www.bilibili.com/video/BV1vE421j7NR/)***
## 💻部署(方式一 Linux)

View file

@ -734,6 +734,48 @@ async def generate_x_bogus(request: Request,
raise HTTPException(status_code=status_code, detail=detail.dict())
# 使用接口地址生成Abogus参数
@router.get("/generate_a_bogus",
response_model=ResponseModel,
summary="使用接口网址生成A-Bogus参数/Generate A-Bogus parameter using API URL")
async def generate_a_bogus(request: Request,
url: str = Query(
example="https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"),
user_agent: str = Query(
example="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36")):
"""
# [中文]
### 用途:
- 使用接口网址生成A-Bogus参数
### 参数:
- url: 接口网址
- user_agent: 用户代理暂时不支持自定义直接使用默认值即可
# [English]
### Purpose:
- Generate A-Bogus parameter using API URL
### Parameters:
- url: API URL
- user_agent: User agent, temporarily does not support customization, just use the default value.
# [示例/Example]
url = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
"""
try:
a_bogus = await DouyinWebCrawler.get_a_bogus(url, user_agent)
return ResponseModel(code=200,
router=request.url.path,
data=a_bogus)
except Exception as e:
status_code = 400
detail = ErrorResponseModel(code=status_code,
router=request.url.path,
params=dict(request.query_params),
)
raise HTTPException(status_code=status_code, detail=detail.dict())
# 提取单个用户id
@router.get("/get_sec_user_id",
response_model=ResponseModel,

View file

@ -19,10 +19,10 @@ with open(config_path, 'r', encoding='utf-8') as file:
config = yaml.safe_load(file)
async def fetch_data(url: str):
async def fetch_data(url: str, headers: dict = None):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
} if headers is None else headers.get('headers')
async with httpx.AsyncClient() as client:
response = await client.get(url, headers=headers)
response.raise_for_status() # 确保响应是成功的
@ -68,7 +68,7 @@ async def download_file_hybrid(request: Request,
return FileResponse(path=file_path, media_type='video/mp4', filename=file_name)
# 获取视频文件
response = await fetch_data(url)
response = await fetch_data(url) if platform == 'douyin' else await fetch_data(url, headers=await HybridCrawler.TikTokWebCrawler.get_tiktok_headers())
# 保存文件
async with aiofiles.open(file_path, 'wb') as out_file:
@ -115,6 +115,7 @@ async def download_file_hybrid(request: Request,
# 异常处理/Exception handling
except Exception as e:
print(e)
code = 400
return ErrorResponseModel(code=code, message=str(e), router=request.url.path, params=dict(request.query_params))

View file

@ -103,7 +103,7 @@ description = f"""
#### 备注
- 本项目仅供学习交流使用不得用于违法用途否则后果自负
- 如果你不想自己部署可以直接使用我们的在线API服务[Douyin_TikTok_Download_API](https://douyin.wtf/docs)
- 如果你需要更稳定以及更多功能的API服务可以使用付费API服务[TikHub API](https://beta.tikhub.io/)
- 如果你需要更稳定以及更多功能的API服务可以使用付费API服务[TikHub API](https://api.tikhub.io/)
### [English]
@ -116,7 +116,7 @@ description = f"""
#### Note
- This project is for learning and communication only, and shall not be used for illegal purposes, otherwise the consequences shall be borne by yourself.
- If you do not want to deploy it yourself, you can directly use our online API service: [Douyin_TikTok_Download_API](https://douyin.wtf/docs)
- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://beta.tikhub.io)
- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://api.tikhub.io)
"""
docs_url = config['API']['Docs_URL']

View file

@ -30,8 +30,8 @@ API:
Redoc_URL: /redoc # API documentation URL | API文档URL
# API Information
Version: V4.0.0 # API version | API版本
Update_Time: 2024/04/22 # API update time | API更新时间
Version: V4.0.2 # API version | API版本
Update_Time: 2024/06/14 # API update time | API更新时间
Environment: Demo # API environment | API环境
# Download Configuration

View file

@ -0,0 +1,559 @@
"""
Original Author:
This file is from https://github.com/JoeanAmier/TikTokDownloader
And is licensed under the GNU General Public License v3.0
If you use this code, please keep this license and the original author information.
Modified by:
And this file is now a part of the https://github.com/Evil0ctal/Douyin_TikTok_Download_API open-source project.
This project is licensed under the Apache License 2.0, and the original author information is kept.
Purpose:
This file is used to generate the `a_bogus` parameter for the Douyin Web API.
Changes Made:
1. Changed the ua_code to compatible with the current config file User-Agent string in https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/main/crawlers/douyin/web/config.yaml
"""
from random import randint
from random import random
from re import compile
from time import time
from urllib.parse import urlencode, quote
class ABogus:
__filter = compile(r'%([0-9A-F]{2})')
__arguments = [0, 1, 14]
__end_string = "cus"
__version = [1, 0, 1, 5]
__env = [
49,
53,
51,
54,
124,
55,
52,
50,
124,
49,
53,
51,
54,
124,
56,
54,
52,
124,
48,
124,
48,
124,
48,
124,
48,
124,
49,
53,
51,
54,
124,
56,
54,
52,
124,
49,
53,
51,
54,
124,
56,
54,
52,
124,
49,
53,
51,
54,
124,
55,
52,
50,
124,
50,
52,
124,
50,
52,
124,
87,
105,
110,
51,
50]
__reg = [
1937774191,
1226093241,
388252375,
3666478592,
2842636476,
372324522,
3817729613,
2969243214,
]
__str = {
"s0": "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=",
"s1": "Dkdpgh4ZKsQB80/Mfvw36XI1R25+WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=",
"s2": "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=",
"s3": "ckdp1h4ZKsUB80/Mfvw36XIgR25+WQAlEi7NLboqYTOPuzmFjJnryx9HVGDaStCe",
"s4": "Dkdpgh2ZmsQB80/MfvV36XI1R45-WUAlEixNLwoqYTOPuzKFjJnry79HbGcaStCe"}
def __init__(self, ):
self.chunk = []
self.size = 0
self.reg = self.__reg[:]
@classmethod
def list_1(cls, random_num=None, a=170, b=85, c=45, ) -> list:
return cls.random_list(
random_num,
a,
b,
1,
2,
5,
c & a,
)
@classmethod
def list_2(cls, random_num=None, a=170, b=85, ) -> list:
return cls.random_list(
random_num,
a,
b,
1,
0,
0,
0,
)
@classmethod
def list_3(cls, random_num=None, a=170, b=85, ) -> list:
return cls.random_list(
random_num,
a,
b,
1,
0,
5,
0,
)
@staticmethod
def random_list(
a: float = None,
b=170,
c=85,
d=0,
e=0,
f=0,
g=0,
) -> list:
r = a or (random() * 10000)
v = [
r,
int(r) & 255,
int(r) >> 8,
]
s = v[1] & b | d
v.append(s)
s = v[1] & c | e
v.append(s)
s = v[2] & b | f
v.append(s)
s = v[2] & c | g
v.append(s)
return v[-4:]
@staticmethod
def from_char_code(*args):
return "".join(chr(code) for code in args)
@classmethod
def generate_string_1(
cls,
random_num_1=None,
random_num_2=None,
random_num_3=None,
):
return cls.from_char_code(*cls.list_1(random_num_1)) + cls.from_char_code(
*cls.list_2(random_num_2)) + cls.from_char_code(*cls.list_3(random_num_3))
def generate_string_2(
self,
url_params: str,
user_agent: str,
start_time=0,
end_time=0,
) -> str:
a = self.generate_string_2_list(
url_params,
user_agent,
start_time,
end_time,
)
e = self.end_check_num(a)
a.extend(self.__env)
a.append(e)
return self.rc4_encrypt(self.from_char_code(*a), "y")
def generate_string_2_list(
self,
url_params: str,
user_agent: str,
start_time=0,
end_time=0,
) -> list:
start_time = start_time or int(time() * 1000)
end_time = end_time or (start_time + randint(4, 8))
params_array = self.sum(self.sum(url_params))
# TODO: 需要编写一个函数来生成ua_code 2024年6月13日17:13:08
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
ua_code = [76, 98, 15, 131, 97, 245, 224, 133, 122, 199, 241, 166, 79, 34, 90, 191, 128, 126, 122, 98, 66, 11, 14, 40, 49, 110, 110, 173, 67, 96, 138, 252]
return self.list_4(
(end_time >> 24) & 255,
params_array[21],
ua_code[23],
(end_time >> 16) & 255,
params_array[22],
ua_code[24],
(end_time >> 8) & 255,
(end_time >> 0) & 255,
(start_time >> 24) & 255,
(start_time >> 16) & 255,
(start_time >> 8) & 255,
(start_time >> 0) & 255,
)
@staticmethod
def reg_to_array(a):
o = [0] * 32
for i in range(8):
c = a[i]
o[4 * i + 3] = (255 & c)
c >>= 8
o[4 * i + 2] = (255 & c)
c >>= 8
o[4 * i + 1] = (255 & c)
c >>= 8
o[4 * i] = (255 & c)
return o
def compress(self, a):
f = self.generate_f(a)
i = self.reg[:]
for o in range(64):
c = self.de(i[0], 12) + i[4] + self.de(self.pe(o), o)
c = (c & 0xFFFFFFFF)
c = self.de(c, 7)
s = (c ^ self.de(i[0], 12)) & 0xFFFFFFFF
u = self.he(o, i[0], i[1], i[2])
u = (u + i[3] + s + f[o + 68]) & 0xFFFFFFFF
b = self.ve(o, i[4], i[5], i[6])
b = (b + i[7] + c + f[o]) & 0xFFFFFFFF
i[3] = i[2]
i[2] = self.de(i[1], 9)
i[1] = i[0]
i[0] = u
i[7] = i[6]
i[6] = self.de(i[5], 19)
i[5] = i[4]
i[4] = (b ^ self.de(b, 9) ^ self.de(b, 17)) & 0xFFFFFFFF
for l in range(8):
self.reg[l] = (self.reg[l] ^ i[l]) & 0xFFFFFFFF
@classmethod
def generate_f(cls, e):
r = [0] * 132
for t in range(16):
r[t] = (e[4 * t] << 24) | (e[4 * t + 1] <<
16) | (e[4 * t + 2] << 8) | e[4 * t + 3]
r[t] &= 0xFFFFFFFF
for n in range(16, 68):
a = r[n - 16] ^ r[n - 9] ^ cls.de(r[n - 3], 15)
a = a ^ cls.de(a, 15) ^ cls.de(a, 23)
r[n] = (a ^ cls.de(r[n - 13], 7) ^ r[n - 6]) & 0xFFFFFFFF
for n in range(68, 132):
r[n] = (r[n - 68] ^ r[n - 64]) & 0xFFFFFFFF
return r
@staticmethod
def pad_array(arr, length=60):
while len(arr) < length:
arr.append(0)
return arr
def fill(self, length=60):
size = 8 * self.size
self.chunk.append(128)
self.chunk = self.pad_array(self.chunk, length)
for i in range(4):
self.chunk.append((size >> 8 * (3 - i)) & 255)
@staticmethod
def list_4(
a: int,
b: int,
c: int,
d: int,
e: int,
f: int,
g: int,
h: int,
i: int,
j: int,
k: int,
m: int,
) -> list:
return [
44,
a,
0,
0,
0,
0,
24,
b,
58,
0,
c,
d,
0,
24,
97,
1,
0,
239,
e,
51,
f,
g,
0,
0,
0,
0,
h,
0,
0,
14,
i,
j,
0,
k,
m,
3,
399,
1,
399,
1,
64,
0,
0,
0]
@staticmethod
def end_check_num(a: list):
r = 0
for i in a:
r ^= i
return r
@classmethod
def decode_string(cls, url_string, ):
decoded = cls.__filter.sub(cls.replace_func, url_string)
return decoded
@staticmethod
def replace_func(match):
return chr(int(match.group(1), 16))
@staticmethod
def de(e, r):
r %= 32
return ((e << r) & 0xFFFFFFFF) | (e >> (32 - r))
@staticmethod
def pe(e):
return 2043430169 if 0 <= e < 16 else 2055708042
@staticmethod
def he(e, r, t, n):
if 0 <= e < 16:
return (r ^ t ^ n) & 0xFFFFFFFF
elif 16 <= e < 64:
return (r & t | r & n | t & n) & 0xFFFFFFFF
raise ValueError
@staticmethod
def ve(e, r, t, n):
if 0 <= e < 16:
return (r ^ t ^ n) & 0xFFFFFFFF
elif 16 <= e < 64:
return (r & t | ~r & n) & 0xFFFFFFFF
raise ValueError
@staticmethod
def convert_to_char_code(a):
d = []
for i in a:
d.append(ord(i))
return d
@staticmethod
def split_array(arr, chunk_size=64):
result = []
for i in range(0, len(arr), chunk_size):
result.append(arr[i:i + chunk_size])
return result
@staticmethod
def char_code_at(s):
return [ord(char) for char in s]
def write(self, e, ):
if isinstance(e, str):
e = self.decode_string(e + self.__end_string)
e = self.char_code_at(e)
self.size = len(e)
if len(e) <= 64:
self.chunk = e
else:
chunks = self.split_array(e, 64)
for i in chunks[:-1]:
self.compress(i)
self.chunk = chunks[-1]
def reset(self, ):
self.chunk = []
self.size = 0
self.reg = self.__reg[:]
def sum(self, e, length=60):
self.reset()
self.write(e)
self.fill(length)
self.compress(self.chunk)
a = self.reg_to_array(self.reg)
self.reset()
return a
@classmethod
def generate_result_unit(cls, n, s):
r = ""
for i, j in zip(range(18, -1, -6), (16515072, 258048, 4032, 63)):
r += cls.__str[s][(n & j) >> i]
return r
@classmethod
def generate_result_end(cls, s, e="s4"):
r = ""
b = ord(s[120]) << 16
r += cls.__str[e][(b & 16515072) >> 18]
r += cls.__str[e][(b & 258048) >> 12]
r += "=="
return r
@classmethod
def generate_result(cls, s, n, e="s4"):
r = ""
for i in range(n):
b = ((ord(s[i * 3]) << 16) | (ord(s[i * 3 + 1]))
<< 8) | ord(s[i * 3 + 2])
r += cls.generate_result_unit(b, e)
return r
@classmethod
def generate_args_code(cls):
a = []
for j in range(24, -1, -8):
a.append(cls.__arguments[0] >> j)
a.append(cls.__arguments[1] / 256)
a.append(cls.__arguments[1] % 256)
a.append(cls.__arguments[1] >> 24)
a.append(cls.__arguments[1] >> 16)
for j in range(24, -1, -8):
a.append(cls.__arguments[2] >> j)
return [int(i) & 255 for i in a]
@staticmethod
def rc4_encrypt(plaintext, key):
s = list(range(256))
j = 0
# Key Scheduling Algorithm (KSA)
for i in range(256):
j = (j + s[i] + ord(key[i % len(key)])) % 256
s[i], s[j] = s[j], s[i]
i = 0
j = 0
cipher = []
# Pseudo-Random Generation Algorithm (PRGA)
for k in range(len(plaintext)):
i = (i + 1) % 256
j = (j + s[i]) % 256
s[i], s[j] = s[j], s[i]
t = (s[i] + s[j]) % 256
cipher.append(chr(s[t] ^ ord(plaintext[k])))
return ''.join(cipher)
def get_value(self,
url_params: dict,
user_agent: str,
start_time=0,
end_time=0,
random_num_1=None,
random_num_2=None,
random_num_3=None,
) -> str:
string_1 = self.generate_string_1(
random_num_1,
random_num_2,
random_num_3,
)
string_2 = self.generate_string_2(
urlencode(url_params),
user_agent,
start_time,
end_time,
)
string = string_1 + string_2
return self.generate_result(
string, 40, "s4") + self.generate_result_end(string, "s4")
if __name__ == "__main__":
bogus = ABogus()
USERAGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
url_str = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"
# 将url参数转换为字典
url_params = dict([param.split("=") for param in url_str.split("?")[1].split("&")])
print(f"URL参数: {url_params}")
a_bogus = bogus.get_value(url_params, USERAGENT)
# 使用url编码a_bogus
a_bogus = quote(a_bogus, safe='')
print(a_bogus)
print(USERAGENT)

View file

@ -4,7 +4,7 @@ TokenManager:
Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
Referer: https://www.douyin.com/
Cookie: odin_tt=deb76f54241001639f1ebbb3bbdd3637c52604632821dea7f6413b1d0527957d;passport_fe_beating_status=false;sid_guard=c7845c8f01865cc93dcee7b32f8e64a3%7C1715033646%7C21600%7CTue%2C+07-May-2024+04%3A14%3A06+GMT;uid_tt=3a85f4bd9ba5573dcf39917c95135faa;uid_tt_ss=3a85f4bd9ba5573dcf39917c95135faa;sid_tt=c7845c8f01865cc93dcee7b32f8e64a3;sessionid=c7845c8f01865cc93dcee7b32f8e64a3;sessionid_ss=c7845c8f01865cc93dcee7b32f8e64a3;sid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;ssid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;passport_assist_user=; ttwid=1%7CbfT5_gVNmSYDxhSIwlPZJhBGSdN6dx98CLMd336o8Cs%7C1715033645%7Ceefdce4479938326bd878311d974fe92c6a0d014b89345b3687ead20e6e68b53
Cookie: __ac_nonce=0666b92b000a2c224ac28; __ac_signature=_02B4Z6wo00f01cJo1cwAAIDC-hz88a728VnCWdFAABbzbc; ttwid=1%7C3mHLmtqu19mj4mwynGHoMV69QN2dnPid7GkoF6qMGxg%7C1718325937%7C1175da4da9c5aedc0f298981771e3ceb96bb26b590d93d0c23eaf0bb5ecd2d25; douyin.com; device_web_cpu_core=16; device_web_memory_size=-1; architecture=amd64; IsDouyinActive=true; home_can_add_dy_2_desktop=%220%22; dy_swidth=1835; dy_sheight=1147; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1835%2C%5C%22screen_height%5C%22%3A1147%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A16%2C%5C%22device_memory%5C%22%3A0%2C%5C%22downlink%5C%22%3A%5C%22%5C%22%2C%5C%22effective_type%5C%22%3A%5C%22%5C%22%2C%5C%22round_trip_time%5C%22%3A0%7D%22; strategyABtestKey=%221718325939.224%22; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.5%7D; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A1%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A1%7D%22; xgplayer_user_id=778628299652; csrf_session_id=120d8aacffb06addd01cb40859003c8e; passport_csrf_token=6f9c9a1bc411c0e6b5c8e5bee6622f91; passport_csrf_token_default=6f9c9a1bc411c0e6b5c8e5bee6622f91; s_v_web_id=verify_lxdywd34_SU6sqPg8_fjkN_4ldR_BMvz_wvgDZPXkm5fY; msToken=y09BW1cI9bHiuOMAYN0mqoVkihUmHlKs_YaKQdTxtBCekbSed8UidXPK74QjPNgszAmYDSKy5aF1ns1f3L5GazwXUISTHgj2x9Bne9p2; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; xg_device_score=Infinity; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCTEhjWkJWemp2MUZRbXY2ZHY5dmtGcVN2eHlqa2ZVZU1laXVtaTRzblh5T2VNSHdhbzNWS1pialYxRHN3VjlLYW9iVk1ROEJDMjQvOVRueHhTY0J1Z0k9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; bd_ticket_guard_client_web_domain=2
proxies:
http:

View file

@ -31,27 +31,25 @@
# - https://github.com/Johnserf-Seed
#
# ==============================================================================
import re
import asyncio
import json
import os
import random
import re
import time
import urllib
from pathlib import Path
from typing import Union
from urllib.parse import urlencode, quote
import execjs
import httpx
import qrcode
import random
import asyncio
import yaml
from typing import Union
from pathlib import Path
from crawlers.douyin.web.xbogus import XBogus as XB
from crawlers.douyin.web.abogus import ABogus as AB
from crawlers.utils.logger import logger
from crawlers.utils.utils import (
gen_random_str,
get_timestamp,
extract_valid_urls,
split_filename,
)
from crawlers.utils.api_exceptions import (
APIError,
APIConnectionError,
@ -60,11 +58,13 @@ from crawlers.utils.api_exceptions import (
APIUnauthorizedError,
APINotFoundError,
)
from crawlers.douyin.web.xbogus import XBogus as XB
from urllib.parse import quote
import os
from crawlers.utils.logger import logger
from crawlers.utils.utils import (
gen_random_str,
get_timestamp,
extract_valid_urls,
split_filename,
)
# 配置文件路径
# Read the configuration file
@ -234,6 +234,8 @@ class VerifyFpManager:
class BogusManager:
# 字符串方法生成X-Bogus参数
@classmethod
def xb_str_2_endpoint(cls, endpoint: str, user_agent: str) -> str:
try:
@ -243,6 +245,7 @@ class BogusManager:
return final_endpoint[0]
# 字典方法生成X-Bogus参数
@classmethod
def xb_model_2_endpoint(cls, base_endpoint: str, params: dict, user_agent: str) -> str:
if not isinstance(params, dict):
@ -262,6 +265,44 @@ class BogusManager:
return final_endpoint
# 字符串方法生成A-Bogus参数
# TODO: 未完成测试,暂时不提交至主分支。
@classmethod
def ab_str_2_endpoint_js_ver(cls, endpoint: str, user_agent: str) -> str:
try:
# 获取请求参数
endpoint_query_params = urllib.parse.urlparse(endpoint).query
# 确定A-Bogus JS文件路径
js_path = os.path.dirname(os.path.abspath(__file__))
a_bogus_js_path = os.path.join(js_path, 'a_bogus.js')
with open(a_bogus_js_path, 'r', encoding='utf-8') as file:
js_code = file.read()
# 此处需要使用Node环境
# - 安装Node.js
# - 安装execjs库
# - 安装NPM依赖
# - npm install jsdom
node_runtime = execjs.get('Node')
context = node_runtime.compile(js_code)
arg = [0, 1, 0, endpoint_query_params, "", user_agent]
a_bougus = quote(context.call('get_a_bogus', arg), safe='')
return a_bougus
except Exception as e:
raise RuntimeError("生成A-Bogus失败: {0})".format(e))
# 字典方法生成A-Bogus参数感谢 @JoeanAmier 提供的纯Python版本算法。
@classmethod
def ab_model_2_endpoint(cls, params: dict, user_agent: str) -> str:
if not isinstance(params, dict):
raise TypeError("参数必须是字典类型")
try:
ab_value = AB().get_value(params, user_agent)
except Exception as e:
raise RuntimeError("生成A-Bogus失败: {0})".format(e))
return quote(ab_value, safe='')
class SecUserIdFetcher:
# 预编译正则表达式

View file

@ -34,16 +34,23 @@
import asyncio # 异步I/O
import os # 系统操作
import time # 时间操作
from urllib.parse import urlencode, quote # URL编码
import httpx
import yaml # 配置文件
import os # 系统操作
# 基础爬虫客户端和抖音API端点
from crawlers.base_crawler import BaseCrawler
from crawlers.douyin.web.endpoints import DouyinAPIEndpoints
# 抖音接口数据请求模型
from crawlers.douyin.web.models import (
BaseRequestModel, LiveRoomRanking, PostComments,
PostCommentsReply, PostDetail,
UserProfile, UserCollection, UserLike, UserLive,
UserLive2, UserMix, UserPost
)
# 抖音应用的工具类
from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取
BogusManager, # XBogus管理
@ -54,14 +61,6 @@ from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取
extract_valid_urls # URL提取
)
# 抖音接口数据请求模型
from crawlers.douyin.web.models import (
BaseRequestModel, LiveRoomRanking, PostComments,
PostCommentsReply, PostDanmaku, PostDetail,
UserProfile, UserCollection, UserLike, UserLive,
UserLive2, UserMix, UserPost
)
# 配置文件路径
path = os.path.abspath(os.path.dirname(__file__))
@ -98,9 +97,17 @@ class DouyinWebCrawler:
# 创建一个作品详情的BaseModel参数
params = PostDetail(aweme_id=aweme_id)
# 生成一个作品详情的带有加密参数的Endpoint
endpoint = BogusManager.xb_model_2_endpoint(
DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"]
)
# 2024年6月12日22:41:44 由于XBogus加密已经失效所以不再使用XBogus加密参数转移至a_bogus加密参数。
# endpoint = BogusManager.xb_model_2_endpoint(
# DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"]
# )
# 生成一个作品详情的带有a_bogus加密参数的Endpoint
params_dict = params.dict()
params_dict["msToken"] = ''
a_bogus = BogusManager.ab_model_2_endpoint(params_dict, kwargs["headers"]["User-Agent"])
endpoint = f"{DouyinAPIEndpoints.POST_DETAIL}?{urlencode(params_dict)}&a_bogus={a_bogus}"
response = await crawler.fetch_get_json(endpoint)
return response
@ -239,19 +246,6 @@ class DouyinWebCrawler:
"-------------------------------------------------------utils接口列表-------------------------------------------------------"
# 获取抖音Web的游客Cookie
async def fetch_douyin_web_guest_cookie(self, user_agent: str):
headers = {
'User-Agent': user_agent,
'Cookie': ''
}
async with httpx.AsyncClient() as client:
domain = "https://beta.tikhub.io"
uri = "/api/v1/douyin/web/fetch_douyin_web_guest_cookie"
url = f"{domain}{uri}?user_agent={user_agent}"
response = await client.get(url, headers=headers)
return response.json().get("data")
# 生成真实msToken
async def gen_real_msToken(self, ):
result = {
@ -290,6 +284,21 @@ class DouyinWebCrawler:
}
return result
# 使用接口地址生成Ab参数
async def get_a_bogus(self, url: str, user_agent: str):
endpoint = url.split("?")[0]
# 将URL参数转换为dict
params = dict([i.split("=") for i in url.split("?")[1].split("&")])
# 去除URL中的msToken参数
params["msToken"] = ""
a_bogus = BogusManager.ab_model_2_endpoint(params, user_agent)
result = {
"url": f"{endpoint}?{urlencode(params)}&a_bogus={a_bogus}",
"a_bogus": a_bogus,
"user_agent": user_agent
}
return result
# 提取单个用户id
async def get_sec_user_id(self, url: str):
return await SecUserIdFetcher.get_sec_user_id(url)

View file

@ -1,3 +1,36 @@
# ==============================================================================
# Copyright (C) 2021 Evil0ctal
#
# This file is part of the Douyin_TikTok_Download_API project.
#
# This project is licensed under the Apache License 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
#         __
#        />  フ
#       |  _  _ l
#       ` ミ_x
#      /      | Feed me Stars ⭐
#     /  ヽ   ノ
#     │  | | |
#  / ̄|   | | |
#  | ( ̄ヽ__ヽ_)__)
#  \二つ
# ==============================================================================
#
# Contributor Link:
# - https://github.com/Evil0ctal
#
# ==============================================================================
import asyncio
from crawlers.douyin.web.web_crawler import DouyinWebCrawler # 导入抖音Web爬虫
@ -24,9 +57,10 @@ class HybridCrawler:
elif "tiktok" in url:
platform = "tiktok"
aweme_id = await self.TikTokWebCrawler.get_aweme_id(url)
data = await self.TikTokAPPCrawler.fetch_one_video(aweme_id)
# $.aweme_type
aweme_type = data.get("aweme_type")
data = await self.TikTokWebCrawler.fetch_one_video(aweme_id)
data = data.get("itemInfo").get("itemStruct")
# $.imagePost exists if aweme_type is photo
aweme_type = 150 if data.get("imagePost") else 1
else:
raise ValueError("hybrid_parsing_single_video: Cannot judge the video source from the URL.")
@ -124,14 +158,14 @@ class HybridCrawler:
# TikTok视频数据处理/TikTok video data processing
if url_type == 'video':
# 将信息储存在字典中/Store information in a dictionary
wm_video = data['video']['download_addr']['url_list'][0]
wm_video = data['video']['downloadAddr']
api_data = {
'video_data':
{
'wm_video_url': wm_video,
'wm_video_url_HQ': wm_video,
'nwm_video_url': data['video']['play_addr']['url_list'][0],
'nwm_video_url_HQ': data['video']['bit_rate'][0]['play_addr']['url_list'][0]
'nwm_video_url': data['video']['playAddr'],
'nwm_video_url_HQ': data['video']['bitrateInfo'][0]['PlayAddr']['UrlList'][0]
}
}
# TikTok图片数据处理/TikTok image data processing
@ -140,9 +174,9 @@ class HybridCrawler:
no_watermark_image_list = []
# 有水印图片列表/With watermark image list
watermark_image_list = []
for i in data['image_post_info']['images']:
no_watermark_image_list.append(i['display_image']['url_list'][0])
watermark_image_list.append(i['owner_watermark_image']['url_list'][0])
for i in data['imagePost']['images']:
no_watermark_image_list.append(i['imageURL']['urlList'][0])
# watermark_image_list.append(i['owner_watermark_image']['url_list'][0])
api_data = {
'image_data':
{
@ -158,6 +192,7 @@ class HybridCrawler:
# 测试混合解析单一视频接口/Test hybrid parsing single video endpoint
# url = "https://v.douyin.com/L4FJNR3/"
url = "https://www.tiktok.com/@evil0ctal/video/7156033831819037994"
# url = "https://www.tiktok.com/@minecraft/photo/7369296852669205791"
minimal = True
result = await self.hybrid_parsing_single_video(url, minimal=minimal)
print(result)

View file

@ -48,6 +48,9 @@ from crawlers.tiktok.app.models import (
BaseRequestModel, FeedVideoDetail
)
# 标记已废弃的方法
from crawlers.utils.deprecated import deprecated
# 配置文件路径
path = os.path.abspath(os.path.dirname(__file__))
@ -74,6 +77,7 @@ class TikTokAPPCrawler:
"""-------------------------------------------------------handler接口列表-------------------------------------------------------"""
# 获取单个作品数据
@deprecated("TikTok APP fetch_one_video is deprecated and will be removed in a future release. Use Web API instead. | TikTok APP fetch_one_video 已弃用将在将来的版本中删除。请改用Web API。")
async def fetch_one_video(self, aweme_id: str):
# 获取TikTok的实时Cookie
kwargs = await self.get_tiktok_headers()

View file

@ -3,7 +3,7 @@ TokenManager:
headers:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
Referer: https://www.tiktok.com/
Cookie: tt_csrf_token=YmksDB6a-h4cT2fF7JpORI2O9UBMCWjsntIc; ttwid=1%7C0FVb9fFc-sjDG2UdJwdC1AirqYozQ0xfbAS4N72vN2Y%7C1713886256%7C78a9d83445b82b73ca8d4e0cf024ea6cdf1329b7f3866c826b0a69a300ebce46; ak_bmsc=51B1D53481A3A4E4D0CEFF2BCF622DA2~000000000000000000000000000000~YAAQ7uIsF6c4j+SOAQAAANmUCxfRGVXZ4D9xnO97l1yDw0OWyomnVkNY7IUKaggUja0kQzFQ+WG4xaxBcPt0AN0n26KeHXGGKgHYpHPUMUBHGHQGDtE4RLyy7U+LPbSJCqVaSDiPuzxHht0YUIbWogvrFmBfkP4ohcmjkZxWtEI9qQ4Whaobb2CFHGdKNt0zlVNBjJQ3uYRAvUe12zSBynQB18y6QhE8goneRkCEw9VIeft2pFIwNQ8tkWWEjDt6wHNaqeND7eASg5WLzYskWbTt6bPAOhSNRLJ38HZrOB5QNg+xxN5uuCSYmjMXCl8SkvQr91pInmOng+V898FLLBQtefs95whvbpfE0mKwBk5Cz2TkkHcUJa/IoC0CLmNqoEk3AtKxpw/J; tt_chain_token=46Xkv2ukMzyJ2e7XU7y0AQ==; bm_sv=A2E67B998DE8E6A4F1C2C02485467446~YAAQ7uIsF6g4j+SOAQAABdqUCxf1J/K4dYG0k7bbw2m5rFujdlSqMoCKDubu4R602nFvbY6zWC5puJczBv3IXwJJRpQxxR03wDCMVlKTCqjQvgDs8BoCuoNQxfY2fdS+F3bKut2lxXPQ2qctqz4kHBrgspJArHn/zu/IuKCIeSzmV4KcyxW6Zvw3/xMRA0MeHgyuHsTRBS+VrFk8Ju2NbJWWC8uSHbLCM/dhFT7/ktw8RE30r24XpQmhLpVTsUSC~1; tiktok_webapp_theme=light; msToken=ySXERzKCE0QUG0cCg6TWLw3wfEB-6kh6kAfuzhzjcQvmV1jBFloSgIsT9xk-QXFVdI99U1Fqb9mhUpIOldoDkjdZwskB8rvt66MHZaHnvBRZRtOKtTYsWT8osDyQXDVZWdPkvyE598h9; passport_csrf_token=1a47d95ebf68fc3648b0018ee75afc9f; passport_csrf_token_default=1a47d95ebf68fc3648b0018ee75afc9f; perf_feed_cache={%22expireTimestamp%22:1714057200000%2C%22itemIds%22:[%227346425092966206766%22%2C%227353812964207594795%22%2C%227343343741916171563%22]}; msToken=yWwG-ITrCnjJbx5ltBa9FTXdCImOJrl-wtQJSQH3afeEumWZcbo_qcrF6F7-NjYcrG6JVxtJiOU208REZeCSgXEZrrs5_65K741fQ7PSzCGOhz6vUyycq3Xvj4Mu-S0kJ6SqyltHnpJp
Cookie: tt_csrf_token=bwnaRGd9-B-0ce8ntqw9jtGzAdvzTRKNpBl0; ak_bmsc=75A1956756DE42FD14ED069AAE7A8780~000000000000000000000000000000~YAAQXCw+F8jpmBGQAQAAIfGsFBj+ZEGzR/ZeiuPpMtItu0QQUQRmjBX2kADliy6QA9rZSfrxRUZc9zuRrI4/xbIrAwA/nkdguGpa+v3QSn/1sk5uP2aqLVm0eYB/SGNafa2h2QvIPbLNiSCRhgq1GalZJL4+udqDnyBRJWE74nin74bZwrVDvCX1s8M2hWqZ9/jTkdm4sfwON9MdJIEtjAPlddQ4gxoqjPoWhfnrm24dhPT4OjL1B8QP1mgurj7zJGspqD53VcjkAl65gHVxp3dwZ5WbPYpqrh9j8wo2u/Wh6uhX+0HWmkv5yVZyTyYQTl3/ilPp9G4CuIUi84gaPLjNYea9AEnphNX0ywzDa6/yegfqyE6r3wqBBDCrR1xRM98YEB4A5PV7pw==; tt_chain_token=ljZFLdRDfyfDflXMg5XGpg==; tiktok_webapp_theme_auto_dark_ab=1; tiktok_webapp_theme=dark; perf_feed_cache={%22expireTimestamp%22:1718503200000%2C%22itemIds%22:[%227348816520216186158%22%2C%227356022137678810410%22%2C%227349561209340857630%22]}; s_v_web_id=verify_lxe3l432_JnDE5WWo_URef_4WrS_88IM_fd1CqEXZs4dZ; passport_csrf_token=af197f073ed95f4dc2636f24d55566a6; passport_csrf_token_default=af197f073ed95f4dc2636f24d55566a6; ttwid=1%7CuNT4GcgvvOjH8rTETh9d9xti_QDJjlcnSK2V7djIpuc%7C1718333954%7Cf81b989a495aedff91302da4d0a3ab6055dea486fb203a4326b37d5a5346ad0c; msToken=1Mhpyi8MlaZjM6bbLDVUhCj_6C0kEO_1_Nb62ByXLg7wy_vLnBxdMFpKclhf4HYnEjCghk2Gq47ZM5jPj3L1yFxQUZvq4oPLo1b2Wfe_33RE94uIxdiR-eSueWbcYDDgOj1Pn9Wyid5Uf5fzBQ7xxFA=; bm_sv=9ADBA7BE06EC41817F117E2279F1410C~YAAQXCw+F8bsmBGQAQAAzSewFBg2fP3Zd0aky2x7S13D97O64xi8EXhoKORBnPQyCHlh0iSlh63FFjoy6peDWaF3lkWaTly3Z7I7WvWk1GCntnYzpJaSCE5EO2OL38zPWpHcgGWuekluvptHXsheedNEefN4SUHVMt4jJynWNeTKrao0RmNLkH4zGs7QO6+MPCt94QFvNfLjBRr0wVcXlN/hx9m6kcvCyzsBBqEnpugoYvZ0SMA+INsKI5PDfQz1~1; msToken=449_l3kdcLmnEHdDP0uACa5EcPVL1NbpjyVv8yah61EwxIPZRDlGwpGIkpIjH0Tk-CDtoKwFrDdP1v2AOpwmdoIz5oQzPEXCdyfGzcVXCHbwMX1fwPxMHpea5yFPUYEDlNWaCFlgLnejRdWeN5sB_lE=
proxies:
http:

View file

@ -22,7 +22,7 @@ class BaseRequestModel(BaseModel):
)
channel: str = "tiktok_web"
cookie_enabled: str = "true"
device_id: int = 7349090360347690538
device_id: int = 7380187414842836523
device_platform: str = "web_pc"
focus_state: str = "true"
from_page: str = "user"

View file

@ -430,15 +430,17 @@ class SecUserIdFetcher:
class AwemeIdFetcher:
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715
# https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715?is_from_webapp=1&sender_device=pc&web_id=7306060721837852167
# https://www.tiktok.com/@zoyapea5/photo/7370061866879454469
# 预编译正则表达式
_TIKTOK_AWEMEID_PARREN = re.compile(r"video/(\d*)")
_TIKTOK_NOTFOUND_PARREN = re.compile(r"notfound")
_TIKTOK_AWEMEID_PATTERN = re.compile(r"video/(\d+)")
_TIKTOK_PHOTOID_PATTERN = re.compile(r"photo/(\d+)")
_TIKTOK_NOTFOUND_PATTERN = re.compile(r"notfound")
@classmethod
async def get_aweme_id(cls, url: str) -> str:
"""
获取TikTok作品aweme_id
获取TikTok作品aweme_id或photo_id
Args:
url: 作品链接
Return:
@ -453,11 +455,27 @@ class AwemeIdFetcher:
url = extract_valid_urls(url)
if url is None:
raise (
APINotFoundError("输入的URL不合法。类名{0}".format(cls.__name__))
)
raise APINotFoundError("输入的URL不合法。类名{0}".format(cls.__name__))
transport = httpx.AsyncHTTPTransport(retries=5)
# 处理不是短连接的情况
if "tiktok" and "@" in url:
print(f"输入的URL无需重定向: {url}")
video_match = cls._TIKTOK_AWEMEID_PATTERN.search(url)
photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(url)
if not video_match and not photo_match:
raise APIResponseError("未在响应中找到 aweme_id 或 photo_id")
aweme_id = video_match.group(1) if video_match else photo_match.group(1)
if aweme_id is None:
raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(url))
return aweme_id
# 处理短连接的情况根据重定向后的链接获取aweme_id
print(f"输入的URL需要重定向: {url}")
transport = httpx.AsyncHTTPTransport(retries=10)
async with httpx.AsyncClient(
transport=transport, proxies=TokenManager.proxies, timeout=10
) as client:
@ -465,32 +483,28 @@ class AwemeIdFetcher:
response = await client.get(url, follow_redirects=True)
if response.status_code in {200, 444}:
if cls._TIKTOK_NOTFOUND_PARREN.search(str(response.url)):
if cls._TIKTOK_NOTFOUND_PATTERN.search(str(response.url)):
raise APINotFoundError("页面不可用,可能是由于区域限制(代理)造成的。类名: {0}"
.format(cls.__name__)
)
match = cls._TIKTOK_AWEMEID_PARREN.search(str(response.url))
if not match:
raise APIResponseError(
"未在响应中找到 {0}".format("aweme_id")
)
video_match = cls._TIKTOK_AWEMEID_PATTERN.search(str(response.url))
photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(str(response.url))
aweme_id = match.group(1)
if not video_match and not photo_match:
raise APIResponseError("未在响应中找到 aweme_id 或 photo_id")
aweme_id = video_match.group(1) if video_match else photo_match.group(1)
if aweme_id is None:
raise RuntimeError(
"获取 {0} 失败,{1}".format("aweme_id", response.url)
)
raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(response.url))
return aweme_id
else:
raise ConnectionError(
"接口状态码异常 {0},请检查重试".format(response.status_code)
)
raise ConnectionError("接口状态码异常 {0},请检查重试".format(response.status_code))
except httpx.RequestError as exc:
# 捕获所有与 httpx 请求相关的异常情况 (Captures all httpx request-related exceptions)
# 捕获所有与 httpx 请求相关的异常情况
raise APIConnectionError("请求端点失败,请检查当前网络环境。 链接:{0},代理:{1},异常类名:{2},异常详细信息:{3}"
.format(url, TokenManager.proxies, cls.__name__, exc)
)

View file

@ -343,9 +343,9 @@ class TikTokWebCrawler:
async def main(self):
# 获取单个作品数据
# item_id = "7339393672959757570"
# response = await self.fetch_one_video(item_id)
# print(response)
item_id = "7369296852669205791"
response = await self.fetch_one_video(item_id)
print(response)
# 获取用户的个人信息
# secUid = "MS4wLjABAAAAfDPs6wbpBcMMb85xkvDGdyyyVAUS2YoVCT9P6WQ1bpuwEuPhL9eFtTmGvxw1lT2C"

View file

@ -0,0 +1,18 @@
import warnings
import functools
def deprecated(message):
def decorator(func):
@functools.wraps(func)
async def wrapper(*args, **kwargs):
warnings.warn(
f"{func.__name__} is deprecated: {message}",
DeprecationWarning,
stacklevel=2
)
return await func(*args, **kwargs)
return wrapper
return decorator