Skip to main content

CurateClick

github.com favicon

大模型RLHF(ppo奖励模型)训练偏好数据排序助手

一个支持ollama本地模型的工具,用于对大模型生成回答进行排序,生成偏好数据以训练奖励模型,进而优化SFT模型。

Check out 大模型RLHF(ppo奖励模型)训练偏好数据排序助手 on CurateClick

• Copy the embed code to showcase this product on your website

• Share on X to spread the word about this amazing tool

描述:简单来说就是你经过sft微调后,想通过RLHF(ppo奖励模型)训练怎么样的模型,就给你的模型生成回答进行排序,最后再导出偏好数据去训练奖励模型,再用奖励模型去训练sft模型开源地址:https://github.com/zjrwtx/preference_databuilder演示视频地址:https://www.bilibili.com/video/BV1P1421675z/?spm_id_from=333.999.0.0&vd_source=5531fb0981ef79f87198a3c2651dff93

Latest Weekly Picks

Predict video virality before you publish.

Virality Predictor

Powerful Motion Control AI Video Generator

Motion Control AI

PlusGO 是面向中国大陆用户的 ChatGPT Plus 代充值服务平台。使用官方 CDKey,支持支付宝和微信扫码支付,平均 2 分钟完成 Plus 开通,自 2025 年起已为超过 10,000 名用户完成充值。

www.toolcenter.ai favicon
Weekly Pick

Discover the Best AI & Productivity Tools — 1,500+ tools with honest reviews.

ToolCenter
huntereyes.net favicon
Weekly Pick

AI Eye Area Evaluation

Hunter Eyes
www.payforchat.com favicon
Weekly Pick

Subscribe to ChatGPT Plus/Pro in 1 minute — no international credit card needed.

PayForChat
happyhorseai.ai favicon
Weekly Pick

Happy Horse AI Creates Videos with Motion and Storytelling

Happy Horse
promptbuilder.cc favicon
Weekly Pick

Generate, optimize, test, and manage AI prompts in one place. Turn an idea into a ready-to-use prompt in seconds.

Prompt Builder