Alibaba'S Qwen Team Makes Ai Models Think Deeper with New Algorithm
Alibaba's Qwen team has developed a new algorithm that enables AI models to think more deeply by assigning different rewards to each step of the reasoning process. The current approach to reinforcement learning, which rewards every token equally, has been shown to limit the length of thought processes.