热门

最新

红包

立Flag

投票

同城

我的

发布
universsky2015
禅与计算机程序设计艺术
2 年前
trueuniverssky2015

#禅与计算机程序设计艺术##ChatGPT# ChatGPT 简介
ChatGPT 模型以对话方式进行交互。对话格式使 ChatGPT 可以回答后续问题、承认错误、挑战不正确的前提并拒绝不适当的请求。

ChatGPT 是在 GPT (Generative Pre-training Transformer)模型的基础上通过改进优化得到的。GPT 是一种大型语言模型,能够生成各种不同的文本类型,而 ChatGPT 则是针对对话场景特别优化过的,它可以根据上下文自动生成跟人类一样的文本对话。

Methods
iWe trained this model using Reinforcement Learning from Human Feedback (RLHF)

To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process.

码友杂谈区
杭州市余杭区
CSDN App 扫码分享
分享
评论
点赞
打赏
  • 复制链接
  • 举报
下一条:
评论:缓解焦虑,从容应对!
立即登录