diff --git a/README.md b/README.md index 5af6041..b393748 100644 --- a/README.md +++ b/README.md @@ -45,13 +45,38 @@ ## 🔊 V4 版本备注 - 感兴趣一起写这个项目的给请加微信`Evil0ctal`备注github项目重构,大家可以在群里互相交流学习,不允许发广告以及违法的东西,纯粹交朋友和技术交流。 -- 本项目使用的`X-Bogus`算法依旧可以正常调用Douyin以及TikTok的API,`A-Bogus`算法暂时不会开源。 +- 本项目使用`X-Bogus`算法以及`A_Bogus`算法请求抖音和TikTok的Web API。 - 由于Douyin的风控,部署完本项目后请在**浏览器中获取Douyin网站的Cookie然后在config.yaml中进行替换。** - 请在提出issue之前先阅读下方的文档,大多数问题的解决方法都会包含在文档中。 - 本项目是完全免费的,但使用时请遵守:[Apache-2.0 license](https://github.com/Evil0ctal/Douyin_TikTok_Download_API?tab=Apache-2.0-1-ov-file#readme) -- 本项目有一个闭源的分支版本,包含更多的接口和服务,详情请查看下方的信息。 + +## 🔖TikHub.io API + +[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)是一个API平台,提供包括Douyin、TikTok在内的各种公开数据接口,如果您想支持 [Douyin_TikTok_Download_API](https://github.com/Evil0ctal/Douyin_TikTok_Download_API) 项目的开发,我们强烈建议您选择[TikHub.io](https://beta-web.tikhub.io/en-us/users/signin)。 + +#### 特点: + +> 📦 开箱即用 + +省去繁琐的使用流程,使用封装好的SDK快速进行开发,让调用变得更简单,所有API接口都按照OpenAPI规范进行编写,并且附带示例参数。 + +> 💰 成本优势 + +不预设套餐限制,没有月度使用门槛,所有消费按实际使用量即时计费,并且根据用户每日的请求量进行阶梯式计费,同时可以通过每日签到在用户后台进行签到获取免费的额度,并且这些免费额度不会过期。 + +> ⚡️ 快速支持 + +我们有一个庞大的Discord社区服务器,管理员和其他用户会在服务器中快速的回复你,帮助你快速解决当前的问题。 + +> 🎉 拥抱开源 + +TikHub的部分源代码会开源在Github上,并且会赞助一些开源项目的作者。 + +#### 链接: + - Discord: [TikHub Discord](https://discord.com/invite/aMEAS8Xsvz) -- Free Douyin/TikTok API: [TikHub Beta API](https://beta.tikhub.io/) +- Register: [TikHub signup](https://beta-web.tikhub.io/en-us/users/signup) +- API Docs: [TikHub API Docs](https://api.tikhub.io/) ## 🖥演示站点: 我很脆弱...请勿压测(·•᷄ࡇ•᷅ ) @@ -95,25 +120,25 @@ ``` ./Douyin_TikTok_Download_API - ├─app - │ ├─api - │ │ ├─endpoints - │ │ └─models - │ ├─download - │ └─web - │ └─views - └─crawlers - ├─douyin - │ └─web - ├─hybrid - ├─tiktok - │ ├─app - │ └─web - └─utils +├─app +│ ├─api +│ │ ├─endpoints +│ │ └─models +│ ├─download +│ └─web +│ └─views +└─crawlers +├─douyin +│ └─web +├─hybrid +├─tiktok +│ ├─app +│ └─web +└─utils ``` ## ✨支持功能: - + - 网页端批量解析(支持抖音/TikTok混合解析) - 在线下载视频或图集。 - 制作[pip包](https://pypi.org/project/douyin-tiktok-scraper/)方便快速导入你的项目 @@ -121,6 +146,7 @@ - 完善的API文档([Demo/演示](https://api.douyin.wtf/docs)) - 丰富的API接口: - 抖音网页版API + - [x] 视频数据解析 - [x] 获取用户主页作品数据 - [x] 获取用户主页喜欢作品数据 @@ -136,14 +162,15 @@ - [x] 生成verify_fp - [x] 生成s_v_web_id - [x] 使用接口网址生成X-Bogus参数 + - [x] 使用接口网址生成A_Bogus参数 - [x] 提取单个用户id - [x] 提取列表用户id - [x] 提取单个作品id - [x] 提取列表作品id - [x] 提取列表直播间号 - [x] 提取列表直播间号 - - TikTok网页版API + - [x] 视频数据解析 - [x] 获取用户主页作品数据 - [x] 获取用户主页喜欢作品数据 @@ -165,7 +192,6 @@ - [x] 获取用户unique_id - [x] 获取列表unique_id - --- ## 📦调用解析库(已废弃需要更新): @@ -257,16 +283,16 @@ https://www.tiktok.com/@evil0ctal/video/7156033831819037994 ***更多演示请查看文档内容......*** - ## ⚠️部署前的准备工作(请仔细阅读): - 你需要自行解决爬虫Cookie风控问题,否则可能会导致接口无法使用。 - - 抖音网页端Cookie(自行获取并替换下面配置文件中的Cookie): + - 抖音网页端Cookie(自行获取并替换下面配置文件中的Cookie): - https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/30e56e5a7f97f87d60b1045befb1f6db147f8590/crawlers/douyin/web/config.yaml#L7 - TikTok网页端Cookie(自行获取并替换下面配置文件中的Cookie): - https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/30e56e5a7f97f87d60b1045befb1f6db147f8590/crawlers/tiktok/web/config.yaml#L6 - 演示站点的在线下载功能被我关掉了,有人下的视频巨大无比直接给我服务器干崩了,你可以在网页解析结果页面右键保存视频... - 演示站点的Cookie是我自己的,不保证长期有效,只起到演示作用,自己部署的话请自行获取Cookie。 +- 需要TikTok Web API返回的视频链接直接访问会发生HTTP 403错误,请使用本项目API中的`/api/download`接口对TikTok 视频进行下载,这个接口在演示站点中已经被手动关闭了,需要你自行部署本项目。 - 这里有一个**视频教程**可以参考:***[https://www.bilibili.com/video/BV1vE421j7NR/](https://www.bilibili.com/video/BV1vE421j7NR/)*** ## 💻部署(方式一 Linux) @@ -387,7 +413,7 @@ docker run -d --name douyin_tiktok_api -p 80:80 \ docker stop douyin_tiktok_api # Remove -docker rm douyin_tiktok_api +docker rm douyin_tiktok_api ``` ## 📸截图 diff --git a/app/api/endpoints/douyin_web.py b/app/api/endpoints/douyin_web.py index 61de621..9b884ee 100644 --- a/app/api/endpoints/douyin_web.py +++ b/app/api/endpoints/douyin_web.py @@ -734,6 +734,48 @@ async def generate_x_bogus(request: Request, raise HTTPException(status_code=status_code, detail=detail.dict()) +# 使用接口地址生成Abogus参数 +@router.get("/generate_a_bogus", + response_model=ResponseModel, + summary="使用接口网址生成A-Bogus参数/Generate A-Bogus parameter using API URL") +async def generate_a_bogus(request: Request, + url: str = Query( + example="https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379"), + user_agent: str = Query( + example="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36")): + """ + # [中文] + ### 用途: + - 使用接口网址生成A-Bogus参数 + ### 参数: + - url: 接口网址 + - user_agent: 用户代理,暂时不支持自定义,直接使用默认值即可。 + + # [English] + ### Purpose: + - Generate A-Bogus parameter using API URL + ### Parameters: + - url: API URL + - user_agent: User agent, temporarily does not support customization, just use the default value. + + # [示例/Example] + url = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379" + user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" + """ + try: + a_bogus = await DouyinWebCrawler.get_a_bogus(url, user_agent) + return ResponseModel(code=200, + router=request.url.path, + data=a_bogus) + except Exception as e: + status_code = 400 + detail = ErrorResponseModel(code=status_code, + router=request.url.path, + params=dict(request.query_params), + ) + raise HTTPException(status_code=status_code, detail=detail.dict()) + + # 提取单个用户id @router.get("/get_sec_user_id", response_model=ResponseModel, diff --git a/app/api/endpoints/download.py b/app/api/endpoints/download.py index 758ba8c..e58225a 100644 --- a/app/api/endpoints/download.py +++ b/app/api/endpoints/download.py @@ -19,10 +19,10 @@ with open(config_path, 'r', encoding='utf-8') as file: config = yaml.safe_load(file) -async def fetch_data(url: str): +async def fetch_data(url: str, headers: dict = None): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' - } + } if headers is None else headers.get('headers') async with httpx.AsyncClient() as client: response = await client.get(url, headers=headers) response.raise_for_status() # 确保响应是成功的 @@ -68,7 +68,7 @@ async def download_file_hybrid(request: Request, return FileResponse(path=file_path, media_type='video/mp4', filename=file_name) # 获取视频文件 - response = await fetch_data(url) + response = await fetch_data(url) if platform == 'douyin' else await fetch_data(url, headers=await HybridCrawler.TikTokWebCrawler.get_tiktok_headers()) # 保存文件 async with aiofiles.open(file_path, 'wb') as out_file: @@ -115,6 +115,7 @@ async def download_file_hybrid(request: Request, # 异常处理/Exception handling except Exception as e: + print(e) code = 400 return ErrorResponseModel(code=code, message=str(e), router=request.url.path, params=dict(request.query_params)) diff --git a/app/main.py b/app/main.py index da03f3a..38b6415 100644 --- a/app/main.py +++ b/app/main.py @@ -103,7 +103,7 @@ description = f""" #### 备注 - 本项目仅供学习交流使用,不得用于违法用途,否则后果自负。 - 如果你不想自己部署,可以直接使用我们的在线API服务:[Douyin_TikTok_Download_API](https://douyin.wtf/docs) -- 如果你需要更稳定以及更多功能的API服务,可以使用付费API服务:[TikHub API](https://beta.tikhub.io/) +- 如果你需要更稳定以及更多功能的API服务,可以使用付费API服务:[TikHub API](https://api.tikhub.io/) ### [English] @@ -116,7 +116,7 @@ description = f""" #### Note - This project is for learning and communication only, and shall not be used for illegal purposes, otherwise the consequences shall be borne by yourself. - If you do not want to deploy it yourself, you can directly use our online API service: [Douyin_TikTok_Download_API](https://douyin.wtf/docs) -- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://beta.tikhub.io) +- If you need a more stable and feature-rich API service, you can use the paid API service: [TikHub API](https://api.tikhub.io) """ docs_url = config['API']['Docs_URL'] diff --git a/config.yaml b/config.yaml index fef8f7c..1e75415 100644 --- a/config.yaml +++ b/config.yaml @@ -30,8 +30,8 @@ API: Redoc_URL: /redoc # API documentation URL | API文档URL # API Information - Version: V4.0.0 # API version | API版本 - Update_Time: 2024/04/22 # API update time | API更新时间 + Version: V4.0.2 # API version | API版本 + Update_Time: 2024/06/14 # API update time | API更新时间 Environment: Demo # API environment | API环境 # Download Configuration diff --git a/crawlers/douyin/web/abogus.py b/crawlers/douyin/web/abogus.py new file mode 100644 index 0000000..a19d348 --- /dev/null +++ b/crawlers/douyin/web/abogus.py @@ -0,0 +1,559 @@ +""" +Original Author: +This file is from https://github.com/JoeanAmier/TikTokDownloader +And is licensed under the GNU General Public License v3.0 +If you use this code, please keep this license and the original author information. + +Modified by: +And this file is now a part of the https://github.com/Evil0ctal/Douyin_TikTok_Download_API open-source project. +This project is licensed under the Apache License 2.0, and the original author information is kept. + +Purpose: +This file is used to generate the `a_bogus` parameter for the Douyin Web API. + +Changes Made: +1. Changed the ua_code to compatible with the current config file User-Agent string in https://github.com/Evil0ctal/Douyin_TikTok_Download_API/blob/main/crawlers/douyin/web/config.yaml +""" + +from random import randint +from random import random +from re import compile +from time import time +from urllib.parse import urlencode, quote + + +class ABogus: + __filter = compile(r'%([0-9A-F]{2})') + __arguments = [0, 1, 14] + __end_string = "cus" + __version = [1, 0, 1, 5] + __env = [ + 49, + 53, + 51, + 54, + 124, + 55, + 52, + 50, + 124, + 49, + 53, + 51, + 54, + 124, + 56, + 54, + 52, + 124, + 48, + 124, + 48, + 124, + 48, + 124, + 48, + 124, + 49, + 53, + 51, + 54, + 124, + 56, + 54, + 52, + 124, + 49, + 53, + 51, + 54, + 124, + 56, + 54, + 52, + 124, + 49, + 53, + 51, + 54, + 124, + 55, + 52, + 50, + 124, + 50, + 52, + 124, + 50, + 52, + 124, + 87, + 105, + 110, + 51, + 50] + __reg = [ + 1937774191, + 1226093241, + 388252375, + 3666478592, + 2842636476, + 372324522, + 3817729613, + 2969243214, + ] + __str = { + "s0": "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=", + "s1": "Dkdpgh4ZKsQB80/Mfvw36XI1R25+WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=", + "s2": "Dkdpgh4ZKsQB80/Mfvw36XI1R25-WUAlEi7NLboqYTOPuzmFjJnryx9HVGcaStCe=", + "s3": "ckdp1h4ZKsUB80/Mfvw36XIgR25+WQAlEi7NLboqYTOPuzmFjJnryx9HVGDaStCe", + "s4": "Dkdpgh2ZmsQB80/MfvV36XI1R45-WUAlEixNLwoqYTOPuzKFjJnry79HbGcaStCe"} + + def __init__(self, ): + self.chunk = [] + self.size = 0 + self.reg = self.__reg[:] + + @classmethod + def list_1(cls, random_num=None, a=170, b=85, c=45, ) -> list: + return cls.random_list( + random_num, + a, + b, + 1, + 2, + 5, + c & a, + ) + + @classmethod + def list_2(cls, random_num=None, a=170, b=85, ) -> list: + return cls.random_list( + random_num, + a, + b, + 1, + 0, + 0, + 0, + ) + + @classmethod + def list_3(cls, random_num=None, a=170, b=85, ) -> list: + return cls.random_list( + random_num, + a, + b, + 1, + 0, + 5, + 0, + ) + + @staticmethod + def random_list( + a: float = None, + b=170, + c=85, + d=0, + e=0, + f=0, + g=0, + ) -> list: + r = a or (random() * 10000) + v = [ + r, + int(r) & 255, + int(r) >> 8, + ] + s = v[1] & b | d + v.append(s) + s = v[1] & c | e + v.append(s) + s = v[2] & b | f + v.append(s) + s = v[2] & c | g + v.append(s) + return v[-4:] + + @staticmethod + def from_char_code(*args): + return "".join(chr(code) for code in args) + + @classmethod + def generate_string_1( + cls, + random_num_1=None, + random_num_2=None, + random_num_3=None, + ): + return cls.from_char_code(*cls.list_1(random_num_1)) + cls.from_char_code( + *cls.list_2(random_num_2)) + cls.from_char_code(*cls.list_3(random_num_3)) + + def generate_string_2( + self, + url_params: str, + user_agent: str, + start_time=0, + end_time=0, + ) -> str: + a = self.generate_string_2_list( + url_params, + user_agent, + start_time, + end_time, + ) + e = self.end_check_num(a) + a.extend(self.__env) + a.append(e) + return self.rc4_encrypt(self.from_char_code(*a), "y") + + def generate_string_2_list( + self, + url_params: str, + user_agent: str, + start_time=0, + end_time=0, + ) -> list: + start_time = start_time or int(time() * 1000) + end_time = end_time or (start_time + randint(4, 8)) + params_array = self.sum(self.sum(url_params)) + # TODO: 需要编写一个函数来生成ua_code 2024年6月13日17:13:08 + # Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 + ua_code = [76, 98, 15, 131, 97, 245, 224, 133, 122, 199, 241, 166, 79, 34, 90, 191, 128, 126, 122, 98, 66, 11, 14, 40, 49, 110, 110, 173, 67, 96, 138, 252] + return self.list_4( + (end_time >> 24) & 255, + params_array[21], + ua_code[23], + (end_time >> 16) & 255, + params_array[22], + ua_code[24], + (end_time >> 8) & 255, + (end_time >> 0) & 255, + (start_time >> 24) & 255, + (start_time >> 16) & 255, + (start_time >> 8) & 255, + (start_time >> 0) & 255, + ) + + @staticmethod + def reg_to_array(a): + o = [0] * 32 + for i in range(8): + c = a[i] + o[4 * i + 3] = (255 & c) + c >>= 8 + o[4 * i + 2] = (255 & c) + c >>= 8 + o[4 * i + 1] = (255 & c) + c >>= 8 + o[4 * i] = (255 & c) + + return o + + def compress(self, a): + f = self.generate_f(a) + i = self.reg[:] + for o in range(64): + c = self.de(i[0], 12) + i[4] + self.de(self.pe(o), o) + c = (c & 0xFFFFFFFF) + c = self.de(c, 7) + s = (c ^ self.de(i[0], 12)) & 0xFFFFFFFF + + u = self.he(o, i[0], i[1], i[2]) + u = (u + i[3] + s + f[o + 68]) & 0xFFFFFFFF + + b = self.ve(o, i[4], i[5], i[6]) + b = (b + i[7] + c + f[o]) & 0xFFFFFFFF + + i[3] = i[2] + i[2] = self.de(i[1], 9) + i[1] = i[0] + i[0] = u + + i[7] = i[6] + i[6] = self.de(i[5], 19) + i[5] = i[4] + i[4] = (b ^ self.de(b, 9) ^ self.de(b, 17)) & 0xFFFFFFFF + + for l in range(8): + self.reg[l] = (self.reg[l] ^ i[l]) & 0xFFFFFFFF + + @classmethod + def generate_f(cls, e): + r = [0] * 132 + + for t in range(16): + r[t] = (e[4 * t] << 24) | (e[4 * t + 1] << + 16) | (e[4 * t + 2] << 8) | e[4 * t + 3] + r[t] &= 0xFFFFFFFF + + for n in range(16, 68): + a = r[n - 16] ^ r[n - 9] ^ cls.de(r[n - 3], 15) + a = a ^ cls.de(a, 15) ^ cls.de(a, 23) + r[n] = (a ^ cls.de(r[n - 13], 7) ^ r[n - 6]) & 0xFFFFFFFF + + for n in range(68, 132): + r[n] = (r[n - 68] ^ r[n - 64]) & 0xFFFFFFFF + + return r + + @staticmethod + def pad_array(arr, length=60): + while len(arr) < length: + arr.append(0) + return arr + + def fill(self, length=60): + size = 8 * self.size + self.chunk.append(128) + self.chunk = self.pad_array(self.chunk, length) + for i in range(4): + self.chunk.append((size >> 8 * (3 - i)) & 255) + + @staticmethod + def list_4( + a: int, + b: int, + c: int, + d: int, + e: int, + f: int, + g: int, + h: int, + i: int, + j: int, + k: int, + m: int, + ) -> list: + return [ + 44, + a, + 0, + 0, + 0, + 0, + 24, + b, + 58, + 0, + c, + d, + 0, + 24, + 97, + 1, + 0, + 239, + e, + 51, + f, + g, + 0, + 0, + 0, + 0, + h, + 0, + 0, + 14, + i, + j, + 0, + k, + m, + 3, + 399, + 1, + 399, + 1, + 64, + 0, + 0, + 0] + + @staticmethod + def end_check_num(a: list): + r = 0 + for i in a: + r ^= i + return r + + @classmethod + def decode_string(cls, url_string, ): + decoded = cls.__filter.sub(cls.replace_func, url_string) + return decoded + + @staticmethod + def replace_func(match): + return chr(int(match.group(1), 16)) + + @staticmethod + def de(e, r): + r %= 32 + return ((e << r) & 0xFFFFFFFF) | (e >> (32 - r)) + + @staticmethod + def pe(e): + return 2043430169 if 0 <= e < 16 else 2055708042 + + @staticmethod + def he(e, r, t, n): + if 0 <= e < 16: + return (r ^ t ^ n) & 0xFFFFFFFF + elif 16 <= e < 64: + return (r & t | r & n | t & n) & 0xFFFFFFFF + raise ValueError + + @staticmethod + def ve(e, r, t, n): + if 0 <= e < 16: + return (r ^ t ^ n) & 0xFFFFFFFF + elif 16 <= e < 64: + return (r & t | ~r & n) & 0xFFFFFFFF + raise ValueError + + @staticmethod + def convert_to_char_code(a): + d = [] + for i in a: + d.append(ord(i)) + return d + + @staticmethod + def split_array(arr, chunk_size=64): + result = [] + for i in range(0, len(arr), chunk_size): + result.append(arr[i:i + chunk_size]) + return result + + @staticmethod + def char_code_at(s): + return [ord(char) for char in s] + + def write(self, e, ): + if isinstance(e, str): + e = self.decode_string(e + self.__end_string) + e = self.char_code_at(e) + self.size = len(e) + if len(e) <= 64: + self.chunk = e + else: + chunks = self.split_array(e, 64) + for i in chunks[:-1]: + self.compress(i) + self.chunk = chunks[-1] + + def reset(self, ): + self.chunk = [] + self.size = 0 + self.reg = self.__reg[:] + + def sum(self, e, length=60): + self.reset() + self.write(e) + self.fill(length) + self.compress(self.chunk) + a = self.reg_to_array(self.reg) + self.reset() + return a + + @classmethod + def generate_result_unit(cls, n, s): + r = "" + for i, j in zip(range(18, -1, -6), (16515072, 258048, 4032, 63)): + r += cls.__str[s][(n & j) >> i] + return r + + @classmethod + def generate_result_end(cls, s, e="s4"): + r = "" + b = ord(s[120]) << 16 + r += cls.__str[e][(b & 16515072) >> 18] + r += cls.__str[e][(b & 258048) >> 12] + r += "==" + return r + + @classmethod + def generate_result(cls, s, n, e="s4"): + r = "" + for i in range(n): + b = ((ord(s[i * 3]) << 16) | (ord(s[i * 3 + 1])) + << 8) | ord(s[i * 3 + 2]) + r += cls.generate_result_unit(b, e) + return r + + @classmethod + def generate_args_code(cls): + a = [] + for j in range(24, -1, -8): + a.append(cls.__arguments[0] >> j) + a.append(cls.__arguments[1] / 256) + a.append(cls.__arguments[1] % 256) + a.append(cls.__arguments[1] >> 24) + a.append(cls.__arguments[1] >> 16) + for j in range(24, -1, -8): + a.append(cls.__arguments[2] >> j) + return [int(i) & 255 for i in a] + + @staticmethod + def rc4_encrypt(plaintext, key): + s = list(range(256)) + j = 0 + + # Key Scheduling Algorithm (KSA) + for i in range(256): + j = (j + s[i] + ord(key[i % len(key)])) % 256 + s[i], s[j] = s[j], s[i] + + i = 0 + j = 0 + cipher = [] + + # Pseudo-Random Generation Algorithm (PRGA) + for k in range(len(plaintext)): + i = (i + 1) % 256 + j = (j + s[i]) % 256 + s[i], s[j] = s[j], s[i] + t = (s[i] + s[j]) % 256 + cipher.append(chr(s[t] ^ ord(plaintext[k]))) + + return ''.join(cipher) + + def get_value(self, + url_params: dict, + user_agent: str, + start_time=0, + end_time=0, + random_num_1=None, + random_num_2=None, + random_num_3=None, + ) -> str: + string_1 = self.generate_string_1( + random_num_1, + random_num_2, + random_num_3, + ) + string_2 = self.generate_string_2( + urlencode(url_params), + user_agent, + start_time, + end_time, + ) + string = string_1 + string_2 + return self.generate_result( + string, 40, "s4") + self.generate_result_end(string, "s4") + + +if __name__ == "__main__": + bogus = ABogus() + USERAGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36" + url_str = "https://www.douyin.com/aweme/v1/web/aweme/detail/?device_platform=webapp&aid=6383&channel=channel_pc_web&pc_client_type=1&version_code=190500&version_name=19.5.0&cookie_enabled=true&browser_language=zh-CN&browser_platform=Win32&browser_name=Firefox&browser_online=true&engine_name=Gecko&os_name=Windows&os_version=10&platform=PC&screen_width=1920&screen_height=1080&browser_version=124.0&engine_version=122.0.0.0&cpu_core_num=12&device_memory=8&aweme_id=7345492945006595379" + # 将url参数转换为字典 + url_params = dict([param.split("=") for param in url_str.split("?")[1].split("&")]) + print(f"URL参数: {url_params}") + a_bogus = bogus.get_value(url_params, USERAGENT) + # 使用url编码a_bogus + a_bogus = quote(a_bogus, safe='') + print(a_bogus) + print(USERAGENT) diff --git a/crawlers/douyin/web/config.yaml b/crawlers/douyin/web/config.yaml index 790b96a..00ea12c 100644 --- a/crawlers/douyin/web/config.yaml +++ b/crawlers/douyin/web/config.yaml @@ -4,7 +4,7 @@ TokenManager: Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 Referer: https://www.douyin.com/ - Cookie: odin_tt=deb76f54241001639f1ebbb3bbdd3637c52604632821dea7f6413b1d0527957d;passport_fe_beating_status=false;sid_guard=c7845c8f01865cc93dcee7b32f8e64a3%7C1715033646%7C21600%7CTue%2C+07-May-2024+04%3A14%3A06+GMT;uid_tt=3a85f4bd9ba5573dcf39917c95135faa;uid_tt_ss=3a85f4bd9ba5573dcf39917c95135faa;sid_tt=c7845c8f01865cc93dcee7b32f8e64a3;sessionid=c7845c8f01865cc93dcee7b32f8e64a3;sessionid_ss=c7845c8f01865cc93dcee7b32f8e64a3;sid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;ssid_ucp_v1=1.0.0-KDVlNDc1Y2VjOTU3NzFhM2E1M2UyMWExMmQ2OTJhYjNhYzk3YzQ3MGQKCBCurOWxBhgNGgJsZiIgYzc4NDVjOGYwMTg2NWNjOTNkY2VlN2IzMmY4ZTY0YTM;passport_assist_user=; ttwid=1%7CbfT5_gVNmSYDxhSIwlPZJhBGSdN6dx98CLMd336o8Cs%7C1715033645%7Ceefdce4479938326bd878311d974fe92c6a0d014b89345b3687ead20e6e68b53 + Cookie: __ac_nonce=0666b92b000a2c224ac28; __ac_signature=_02B4Z6wo00f01cJo1cwAAIDC-hz88a728VnCWdFAABbzbc; ttwid=1%7C3mHLmtqu19mj4mwynGHoMV69QN2dnPid7GkoF6qMGxg%7C1718325937%7C1175da4da9c5aedc0f298981771e3ceb96bb26b590d93d0c23eaf0bb5ecd2d25; douyin.com; device_web_cpu_core=16; device_web_memory_size=-1; architecture=amd64; IsDouyinActive=true; home_can_add_dy_2_desktop=%220%22; dy_swidth=1835; dy_sheight=1147; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1835%2C%5C%22screen_height%5C%22%3A1147%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A16%2C%5C%22device_memory%5C%22%3A0%2C%5C%22downlink%5C%22%3A%5C%22%5C%22%2C%5C%22effective_type%5C%22%3A%5C%22%5C%22%2C%5C%22round_trip_time%5C%22%3A0%7D%22; strategyABtestKey=%221718325939.224%22; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.5%7D; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A1%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A1%7D%22; xgplayer_user_id=778628299652; csrf_session_id=120d8aacffb06addd01cb40859003c8e; passport_csrf_token=6f9c9a1bc411c0e6b5c8e5bee6622f91; passport_csrf_token_default=6f9c9a1bc411c0e6b5c8e5bee6622f91; s_v_web_id=verify_lxdywd34_SU6sqPg8_fjkN_4ldR_BMvz_wvgDZPXkm5fY; msToken=y09BW1cI9bHiuOMAYN0mqoVkihUmHlKs_YaKQdTxtBCekbSed8UidXPK74QjPNgszAmYDSKy5aF1ns1f3L5GazwXUISTHgj2x9Bne9p2; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; xg_device_score=Infinity; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCTEhjWkJWemp2MUZRbXY2ZHY5dmtGcVN2eHlqa2ZVZU1laXVtaTRzblh5T2VNSHdhbzNWS1pialYxRHN3VjlLYW9iVk1ROEJDMjQvOVRueHhTY0J1Z0k9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; bd_ticket_guard_client_web_domain=2 proxies: http: diff --git a/crawlers/douyin/web/utils.py b/crawlers/douyin/web/utils.py index 1528d4e..40a0a09 100644 --- a/crawlers/douyin/web/utils.py +++ b/crawlers/douyin/web/utils.py @@ -31,27 +31,25 @@ # - https://github.com/Johnserf-Seed # # ============================================================================== - - -import re +import asyncio import json +import os +import random +import re import time +import urllib +from pathlib import Path +from typing import Union +from urllib.parse import urlencode, quote + +import execjs import httpx import qrcode -import random -import asyncio import yaml -from typing import Union -from pathlib import Path +from crawlers.douyin.web.xbogus import XBogus as XB +from crawlers.douyin.web.abogus import ABogus as AB -from crawlers.utils.logger import logger -from crawlers.utils.utils import ( - gen_random_str, - get_timestamp, - extract_valid_urls, - split_filename, -) from crawlers.utils.api_exceptions import ( APIError, APIConnectionError, @@ -60,11 +58,13 @@ from crawlers.utils.api_exceptions import ( APIUnauthorizedError, APINotFoundError, ) - -from crawlers.douyin.web.xbogus import XBogus as XB - -from urllib.parse import quote -import os +from crawlers.utils.logger import logger +from crawlers.utils.utils import ( + gen_random_str, + get_timestamp, + extract_valid_urls, + split_filename, +) # 配置文件路径 # Read the configuration file @@ -234,6 +234,8 @@ class VerifyFpManager: class BogusManager: + + # 字符串方法生成X-Bogus参数 @classmethod def xb_str_2_endpoint(cls, endpoint: str, user_agent: str) -> str: try: @@ -243,6 +245,7 @@ class BogusManager: return final_endpoint[0] + # 字典方法生成X-Bogus参数 @classmethod def xb_model_2_endpoint(cls, base_endpoint: str, params: dict, user_agent: str) -> str: if not isinstance(params, dict): @@ -262,6 +265,44 @@ class BogusManager: return final_endpoint + # 字符串方法生成A-Bogus参数 + # TODO: 未完成测试,暂时不提交至主分支。 + @classmethod + def ab_str_2_endpoint_js_ver(cls, endpoint: str, user_agent: str) -> str: + try: + # 获取请求参数 + endpoint_query_params = urllib.parse.urlparse(endpoint).query + # 确定A-Bogus JS文件路径 + js_path = os.path.dirname(os.path.abspath(__file__)) + a_bogus_js_path = os.path.join(js_path, 'a_bogus.js') + with open(a_bogus_js_path, 'r', encoding='utf-8') as file: + js_code = file.read() + # 此处需要使用Node环境 + # - 安装Node.js + # - 安装execjs库 + # - 安装NPM依赖 + # - npm install jsdom + node_runtime = execjs.get('Node') + context = node_runtime.compile(js_code) + arg = [0, 1, 0, endpoint_query_params, "", user_agent] + a_bougus = quote(context.call('get_a_bogus', arg), safe='') + return a_bougus + except Exception as e: + raise RuntimeError("生成A-Bogus失败: {0})".format(e)) + + # 字典方法生成A-Bogus参数,感谢 @JoeanAmier 提供的纯Python版本算法。 + @classmethod + def ab_model_2_endpoint(cls, params: dict, user_agent: str) -> str: + if not isinstance(params, dict): + raise TypeError("参数必须是字典类型") + + try: + ab_value = AB().get_value(params, user_agent) + except Exception as e: + raise RuntimeError("生成A-Bogus失败: {0})".format(e)) + + return quote(ab_value, safe='') + class SecUserIdFetcher: # 预编译正则表达式 diff --git a/crawlers/douyin/web/web_crawler.py b/crawlers/douyin/web/web_crawler.py index 12f86bf..81a2014 100644 --- a/crawlers/douyin/web/web_crawler.py +++ b/crawlers/douyin/web/web_crawler.py @@ -34,16 +34,23 @@ import asyncio # 异步I/O +import os # 系统操作 import time # 时间操作 +from urllib.parse import urlencode, quote # URL编码 import httpx import yaml # 配置文件 -import os # 系统操作 # 基础爬虫客户端和抖音API端点 from crawlers.base_crawler import BaseCrawler from crawlers.douyin.web.endpoints import DouyinAPIEndpoints - +# 抖音接口数据请求模型 +from crawlers.douyin.web.models import ( + BaseRequestModel, LiveRoomRanking, PostComments, + PostCommentsReply, PostDetail, + UserProfile, UserCollection, UserLike, UserLive, + UserLive2, UserMix, UserPost +) # 抖音应用的工具类 from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取 BogusManager, # XBogus管理 @@ -54,14 +61,6 @@ from crawlers.douyin.web.utils import (AwemeIdFetcher, # Aweme ID获取 extract_valid_urls # URL提取 ) -# 抖音接口数据请求模型 -from crawlers.douyin.web.models import ( - BaseRequestModel, LiveRoomRanking, PostComments, - PostCommentsReply, PostDanmaku, PostDetail, - UserProfile, UserCollection, UserLike, UserLive, - UserLive2, UserMix, UserPost -) - # 配置文件路径 path = os.path.abspath(os.path.dirname(__file__)) @@ -98,9 +97,17 @@ class DouyinWebCrawler: # 创建一个作品详情的BaseModel参数 params = PostDetail(aweme_id=aweme_id) # 生成一个作品详情的带有加密参数的Endpoint - endpoint = BogusManager.xb_model_2_endpoint( - DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"] - ) + # 2024年6月12日22:41:44 由于XBogus加密已经失效,所以不再使用XBogus加密参数,转移至a_bogus加密参数。 + # endpoint = BogusManager.xb_model_2_endpoint( + # DouyinAPIEndpoints.POST_DETAIL, params.dict(), kwargs["headers"]["User-Agent"] + # ) + + # 生成一个作品详情的带有a_bogus加密参数的Endpoint + params_dict = params.dict() + params_dict["msToken"] = '' + a_bogus = BogusManager.ab_model_2_endpoint(params_dict, kwargs["headers"]["User-Agent"]) + endpoint = f"{DouyinAPIEndpoints.POST_DETAIL}?{urlencode(params_dict)}&a_bogus={a_bogus}" + response = await crawler.fetch_get_json(endpoint) return response @@ -239,19 +246,6 @@ class DouyinWebCrawler: "-------------------------------------------------------utils接口列表-------------------------------------------------------" - # 获取抖音Web的游客Cookie - async def fetch_douyin_web_guest_cookie(self, user_agent: str): - headers = { - 'User-Agent': user_agent, - 'Cookie': '' - } - async with httpx.AsyncClient() as client: - domain = "https://beta.tikhub.io" - uri = "/api/v1/douyin/web/fetch_douyin_web_guest_cookie" - url = f"{domain}{uri}?user_agent={user_agent}" - response = await client.get(url, headers=headers) - return response.json().get("data") - # 生成真实msToken async def gen_real_msToken(self, ): result = { @@ -290,6 +284,21 @@ class DouyinWebCrawler: } return result + # 使用接口地址生成Ab参数 + async def get_a_bogus(self, url: str, user_agent: str): + endpoint = url.split("?")[0] + # 将URL参数转换为dict + params = dict([i.split("=") for i in url.split("?")[1].split("&")]) + # 去除URL中的msToken参数 + params["msToken"] = "" + a_bogus = BogusManager.ab_model_2_endpoint(params, user_agent) + result = { + "url": f"{endpoint}?{urlencode(params)}&a_bogus={a_bogus}", + "a_bogus": a_bogus, + "user_agent": user_agent + } + return result + # 提取单个用户id async def get_sec_user_id(self, url: str): return await SecUserIdFetcher.get_sec_user_id(url) diff --git a/crawlers/hybrid/hybrid_crawler.py b/crawlers/hybrid/hybrid_crawler.py index 5532ef9..dd1fb6c 100644 --- a/crawlers/hybrid/hybrid_crawler.py +++ b/crawlers/hybrid/hybrid_crawler.py @@ -1,3 +1,36 @@ +# ============================================================================== +# Copyright (C) 2021 Evil0ctal +# +# This file is part of the Douyin_TikTok_Download_API project. +# +# This project is licensed under the Apache License 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +#         __ +#        />  フ +#       |  _  _ l +#       /` ミ_xノ +#      /      | Feed me Stars ⭐ ️ +#     /  ヽ   ノ +#     │  | | | +#  / ̄|   | | | +#  | ( ̄ヽ__ヽ_)__) +#  \二つ +# ============================================================================== +# +# Contributor Link: +# - https://github.com/Evil0ctal +# +# ============================================================================== + import asyncio from crawlers.douyin.web.web_crawler import DouyinWebCrawler # 导入抖音Web爬虫 @@ -24,9 +57,10 @@ class HybridCrawler: elif "tiktok" in url: platform = "tiktok" aweme_id = await self.TikTokWebCrawler.get_aweme_id(url) - data = await self.TikTokAPPCrawler.fetch_one_video(aweme_id) - # $.aweme_type - aweme_type = data.get("aweme_type") + data = await self.TikTokWebCrawler.fetch_one_video(aweme_id) + data = data.get("itemInfo").get("itemStruct") + # $.imagePost exists if aweme_type is photo + aweme_type = 150 if data.get("imagePost") else 1 else: raise ValueError("hybrid_parsing_single_video: Cannot judge the video source from the URL.") @@ -124,14 +158,14 @@ class HybridCrawler: # TikTok视频数据处理/TikTok video data processing if url_type == 'video': # 将信息储存在字典中/Store information in a dictionary - wm_video = data['video']['download_addr']['url_list'][0] + wm_video = data['video']['downloadAddr'] api_data = { 'video_data': { 'wm_video_url': wm_video, 'wm_video_url_HQ': wm_video, - 'nwm_video_url': data['video']['play_addr']['url_list'][0], - 'nwm_video_url_HQ': data['video']['bit_rate'][0]['play_addr']['url_list'][0] + 'nwm_video_url': data['video']['playAddr'], + 'nwm_video_url_HQ': data['video']['bitrateInfo'][0]['PlayAddr']['UrlList'][0] } } # TikTok图片数据处理/TikTok image data processing @@ -140,9 +174,9 @@ class HybridCrawler: no_watermark_image_list = [] # 有水印图片列表/With watermark image list watermark_image_list = [] - for i in data['image_post_info']['images']: - no_watermark_image_list.append(i['display_image']['url_list'][0]) - watermark_image_list.append(i['owner_watermark_image']['url_list'][0]) + for i in data['imagePost']['images']: + no_watermark_image_list.append(i['imageURL']['urlList'][0]) + # watermark_image_list.append(i['owner_watermark_image']['url_list'][0]) api_data = { 'image_data': { @@ -158,6 +192,7 @@ class HybridCrawler: # 测试混合解析单一视频接口/Test hybrid parsing single video endpoint # url = "https://v.douyin.com/L4FJNR3/" url = "https://www.tiktok.com/@evil0ctal/video/7156033831819037994" + # url = "https://www.tiktok.com/@minecraft/photo/7369296852669205791" minimal = True result = await self.hybrid_parsing_single_video(url, minimal=minimal) print(result) diff --git a/crawlers/tiktok/app/app_crawler.py b/crawlers/tiktok/app/app_crawler.py index 25905b6..104b84f 100644 --- a/crawlers/tiktok/app/app_crawler.py +++ b/crawlers/tiktok/app/app_crawler.py @@ -48,6 +48,9 @@ from crawlers.tiktok.app.models import ( BaseRequestModel, FeedVideoDetail ) +# 标记已废弃的方法 +from crawlers.utils.deprecated import deprecated + # 配置文件路径 path = os.path.abspath(os.path.dirname(__file__)) @@ -74,6 +77,7 @@ class TikTokAPPCrawler: """-------------------------------------------------------handler接口列表-------------------------------------------------------""" # 获取单个作品数据 + @deprecated("TikTok APP fetch_one_video is deprecated and will be removed in a future release. Use Web API instead. | TikTok APP fetch_one_video 已弃用,将在将来的版本中删除。请改用Web API。") async def fetch_one_video(self, aweme_id: str): # 获取TikTok的实时Cookie kwargs = await self.get_tiktok_headers() diff --git a/crawlers/tiktok/web/config.yaml b/crawlers/tiktok/web/config.yaml index c3016fb..4550651 100644 --- a/crawlers/tiktok/web/config.yaml +++ b/crawlers/tiktok/web/config.yaml @@ -3,7 +3,7 @@ TokenManager: headers: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 Referer: https://www.tiktok.com/ - Cookie: tt_csrf_token=YmksDB6a-h4cT2fF7JpORI2O9UBMCWjsntIc; ttwid=1%7C0FVb9fFc-sjDG2UdJwdC1AirqYozQ0xfbAS4N72vN2Y%7C1713886256%7C78a9d83445b82b73ca8d4e0cf024ea6cdf1329b7f3866c826b0a69a300ebce46; ak_bmsc=51B1D53481A3A4E4D0CEFF2BCF622DA2~000000000000000000000000000000~YAAQ7uIsF6c4j+SOAQAAANmUCxfRGVXZ4D9xnO97l1yDw0OWyomnVkNY7IUKaggUja0kQzFQ+WG4xaxBcPt0AN0n26KeHXGGKgHYpHPUMUBHGHQGDtE4RLyy7U+LPbSJCqVaSDiPuzxHht0YUIbWogvrFmBfkP4ohcmjkZxWtEI9qQ4Whaobb2CFHGdKNt0zlVNBjJQ3uYRAvUe12zSBynQB18y6QhE8goneRkCEw9VIeft2pFIwNQ8tkWWEjDt6wHNaqeND7eASg5WLzYskWbTt6bPAOhSNRLJ38HZrOB5QNg+xxN5uuCSYmjMXCl8SkvQr91pInmOng+V898FLLBQtefs95whvbpfE0mKwBk5Cz2TkkHcUJa/IoC0CLmNqoEk3AtKxpw/J; tt_chain_token=46Xkv2ukMzyJ2e7XU7y0AQ==; bm_sv=A2E67B998DE8E6A4F1C2C02485467446~YAAQ7uIsF6g4j+SOAQAABdqUCxf1J/K4dYG0k7bbw2m5rFujdlSqMoCKDubu4R602nFvbY6zWC5puJczBv3IXwJJRpQxxR03wDCMVlKTCqjQvgDs8BoCuoNQxfY2fdS+F3bKut2lxXPQ2qctqz4kHBrgspJArHn/zu/IuKCIeSzmV4KcyxW6Zvw3/xMRA0MeHgyuHsTRBS+VrFk8Ju2NbJWWC8uSHbLCM/dhFT7/ktw8RE30r24XpQmhLpVTsUSC~1; tiktok_webapp_theme=light; msToken=ySXERzKCE0QUG0cCg6TWLw3wfEB-6kh6kAfuzhzjcQvmV1jBFloSgIsT9xk-QXFVdI99U1Fqb9mhUpIOldoDkjdZwskB8rvt66MHZaHnvBRZRtOKtTYsWT8osDyQXDVZWdPkvyE598h9; passport_csrf_token=1a47d95ebf68fc3648b0018ee75afc9f; passport_csrf_token_default=1a47d95ebf68fc3648b0018ee75afc9f; perf_feed_cache={%22expireTimestamp%22:1714057200000%2C%22itemIds%22:[%227346425092966206766%22%2C%227353812964207594795%22%2C%227343343741916171563%22]}; msToken=yWwG-ITrCnjJbx5ltBa9FTXdCImOJrl-wtQJSQH3afeEumWZcbo_qcrF6F7-NjYcrG6JVxtJiOU208REZeCSgXEZrrs5_65K741fQ7PSzCGOhz6vUyycq3Xvj4Mu-S0kJ6SqyltHnpJp + Cookie: tt_csrf_token=bwnaRGd9-B-0ce8ntqw9jtGzAdvzTRKNpBl0; ak_bmsc=75A1956756DE42FD14ED069AAE7A8780~000000000000000000000000000000~YAAQXCw+F8jpmBGQAQAAIfGsFBj+ZEGzR/ZeiuPpMtItu0QQUQRmjBX2kADliy6QA9rZSfrxRUZc9zuRrI4/xbIrAwA/nkdguGpa+v3QSn/1sk5uP2aqLVm0eYB/SGNafa2h2QvIPbLNiSCRhgq1GalZJL4+udqDnyBRJWE74nin74bZwrVDvCX1s8M2hWqZ9/jTkdm4sfwON9MdJIEtjAPlddQ4gxoqjPoWhfnrm24dhPT4OjL1B8QP1mgurj7zJGspqD53VcjkAl65gHVxp3dwZ5WbPYpqrh9j8wo2u/Wh6uhX+0HWmkv5yVZyTyYQTl3/ilPp9G4CuIUi84gaPLjNYea9AEnphNX0ywzDa6/yegfqyE6r3wqBBDCrR1xRM98YEB4A5PV7pw==; tt_chain_token=ljZFLdRDfyfDflXMg5XGpg==; tiktok_webapp_theme_auto_dark_ab=1; tiktok_webapp_theme=dark; perf_feed_cache={%22expireTimestamp%22:1718503200000%2C%22itemIds%22:[%227348816520216186158%22%2C%227356022137678810410%22%2C%227349561209340857630%22]}; s_v_web_id=verify_lxe3l432_JnDE5WWo_URef_4WrS_88IM_fd1CqEXZs4dZ; passport_csrf_token=af197f073ed95f4dc2636f24d55566a6; passport_csrf_token_default=af197f073ed95f4dc2636f24d55566a6; ttwid=1%7CuNT4GcgvvOjH8rTETh9d9xti_QDJjlcnSK2V7djIpuc%7C1718333954%7Cf81b989a495aedff91302da4d0a3ab6055dea486fb203a4326b37d5a5346ad0c; msToken=1Mhpyi8MlaZjM6bbLDVUhCj_6C0kEO_1_Nb62ByXLg7wy_vLnBxdMFpKclhf4HYnEjCghk2Gq47ZM5jPj3L1yFxQUZvq4oPLo1b2Wfe_33RE94uIxdiR-eSueWbcYDDgOj1Pn9Wyid5Uf5fzBQ7xxFA=; bm_sv=9ADBA7BE06EC41817F117E2279F1410C~YAAQXCw+F8bsmBGQAQAAzSewFBg2fP3Zd0aky2x7S13D97O64xi8EXhoKORBnPQyCHlh0iSlh63FFjoy6peDWaF3lkWaTly3Z7I7WvWk1GCntnYzpJaSCE5EO2OL38zPWpHcgGWuekluvptHXsheedNEefN4SUHVMt4jJynWNeTKrao0RmNLkH4zGs7QO6+MPCt94QFvNfLjBRr0wVcXlN/hx9m6kcvCyzsBBqEnpugoYvZ0SMA+INsKI5PDfQz1~1; msToken=449_l3kdcLmnEHdDP0uACa5EcPVL1NbpjyVv8yah61EwxIPZRDlGwpGIkpIjH0Tk-CDtoKwFrDdP1v2AOpwmdoIz5oQzPEXCdyfGzcVXCHbwMX1fwPxMHpea5yFPUYEDlNWaCFlgLnejRdWeN5sB_lE= proxies: http: diff --git a/crawlers/tiktok/web/models.py b/crawlers/tiktok/web/models.py index 55ff8d7..36ad7dd 100644 --- a/crawlers/tiktok/web/models.py +++ b/crawlers/tiktok/web/models.py @@ -22,7 +22,7 @@ class BaseRequestModel(BaseModel): ) channel: str = "tiktok_web" cookie_enabled: str = "true" - device_id: int = 7349090360347690538 + device_id: int = 7380187414842836523 device_platform: str = "web_pc" focus_state: str = "true" from_page: str = "user" diff --git a/crawlers/tiktok/web/utils.py b/crawlers/tiktok/web/utils.py index a53fd76..5b71031 100644 --- a/crawlers/tiktok/web/utils.py +++ b/crawlers/tiktok/web/utils.py @@ -430,15 +430,17 @@ class SecUserIdFetcher: class AwemeIdFetcher: # https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715 # https://www.tiktok.com/@scarlettjonesuk/video/7255716763118226715?is_from_webapp=1&sender_device=pc&web_id=7306060721837852167 + # https://www.tiktok.com/@zoyapea5/photo/7370061866879454469 # 预编译正则表达式 - _TIKTOK_AWEMEID_PARREN = re.compile(r"video/(\d*)") - _TIKTOK_NOTFOUND_PARREN = re.compile(r"notfound") + _TIKTOK_AWEMEID_PATTERN = re.compile(r"video/(\d+)") + _TIKTOK_PHOTOID_PATTERN = re.compile(r"photo/(\d+)") + _TIKTOK_NOTFOUND_PATTERN = re.compile(r"notfound") @classmethod async def get_aweme_id(cls, url: str) -> str: """ - 获取TikTok作品aweme_id + 获取TikTok作品aweme_id或photo_id Args: url: 作品链接 Return: @@ -453,11 +455,27 @@ class AwemeIdFetcher: url = extract_valid_urls(url) if url is None: - raise ( - APINotFoundError("输入的URL不合法。类名:{0}".format(cls.__name__)) - ) + raise APINotFoundError("输入的URL不合法。类名:{0}".format(cls.__name__)) - transport = httpx.AsyncHTTPTransport(retries=5) + # 处理不是短连接的情况 + if "tiktok" and "@" in url: + print(f"输入的URL无需重定向: {url}") + video_match = cls._TIKTOK_AWEMEID_PATTERN.search(url) + photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(url) + + if not video_match and not photo_match: + raise APIResponseError("未在响应中找到 aweme_id 或 photo_id") + + aweme_id = video_match.group(1) if video_match else photo_match.group(1) + + if aweme_id is None: + raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(url)) + + return aweme_id + + # 处理短连接的情况,根据重定向后的链接获取aweme_id + print(f"输入的URL需要重定向: {url}") + transport = httpx.AsyncHTTPTransport(retries=10) async with httpx.AsyncClient( transport=transport, proxies=TokenManager.proxies, timeout=10 ) as client: @@ -465,32 +483,28 @@ class AwemeIdFetcher: response = await client.get(url, follow_redirects=True) if response.status_code in {200, 444}: - if cls._TIKTOK_NOTFOUND_PARREN.search(str(response.url)): + if cls._TIKTOK_NOTFOUND_PATTERN.search(str(response.url)): raise APINotFoundError("页面不可用,可能是由于区域限制(代理)造成的。类名: {0}" .format(cls.__name__) ) - match = cls._TIKTOK_AWEMEID_PARREN.search(str(response.url)) - if not match: - raise APIResponseError( - "未在响应中找到 {0}".format("aweme_id") - ) + video_match = cls._TIKTOK_AWEMEID_PATTERN.search(str(response.url)) + photo_match = cls._TIKTOK_PHOTOID_PATTERN.search(str(response.url)) - aweme_id = match.group(1) + if not video_match and not photo_match: + raise APIResponseError("未在响应中找到 aweme_id 或 photo_id") + + aweme_id = video_match.group(1) if video_match else photo_match.group(1) if aweme_id is None: - raise RuntimeError( - "获取 {0} 失败,{1}".format("aweme_id", response.url) - ) + raise RuntimeError("获取 aweme_id 或 photo_id 失败,{0}".format(response.url)) return aweme_id else: - raise ConnectionError( - "接口状态码异常 {0},请检查重试".format(response.status_code) - ) + raise ConnectionError("接口状态码异常 {0},请检查重试".format(response.status_code)) except httpx.RequestError as exc: - # 捕获所有与 httpx 请求相关的异常情况 (Captures all httpx request-related exceptions) + # 捕获所有与 httpx 请求相关的异常情况 raise APIConnectionError("请求端点失败,请检查当前网络环境。 链接:{0},代理:{1},异常类名:{2},异常详细信息:{3}" .format(url, TokenManager.proxies, cls.__name__, exc) ) diff --git a/crawlers/tiktok/web/web_crawler.py b/crawlers/tiktok/web/web_crawler.py index 8fd2b37..c7e03ac 100644 --- a/crawlers/tiktok/web/web_crawler.py +++ b/crawlers/tiktok/web/web_crawler.py @@ -343,9 +343,9 @@ class TikTokWebCrawler: async def main(self): # 获取单个作品数据 - # item_id = "7339393672959757570" - # response = await self.fetch_one_video(item_id) - # print(response) + item_id = "7369296852669205791" + response = await self.fetch_one_video(item_id) + print(response) # 获取用户的个人信息 # secUid = "MS4wLjABAAAAfDPs6wbpBcMMb85xkvDGdyyyVAUS2YoVCT9P6WQ1bpuwEuPhL9eFtTmGvxw1lT2C" diff --git a/crawlers/utils/deprecated.py b/crawlers/utils/deprecated.py new file mode 100644 index 0000000..095554d --- /dev/null +++ b/crawlers/utils/deprecated.py @@ -0,0 +1,18 @@ +import warnings +import functools + + +def deprecated(message): + def decorator(func): + @functools.wraps(func) + async def wrapper(*args, **kwargs): + warnings.warn( + f"{func.__name__} is deprecated: {message}", + DeprecationWarning, + stacklevel=2 + ) + return await func(*args, **kwargs) + + return wrapper + + return decorator \ No newline at end of file