Daily Paper Cast

🤗 Upvotes: 13 | cs.CL

Authors:
Lin Yueyu, Li Zhiyuan, Peter Yue, Liu Xiao

Title:
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer

Arxiv:
http://arxiv.org/abs/2501.15570v1

Abstract:
As is known, hybrid quadratic and subquadratic attention models in multi-head architectures have surpassed both Transformer and Linear RNN models , with these works primarily focusing on reducing KV complexity and improving efficiency. For further research on expressiveness, we introduce our series of models distilled from Qwen 2.5, based on pure native RWKV-7 attention, which aims to make RNN more expressive and demonstrates state tracking ability beyond transformers. We work with QRWK 32B based on RWKV-6 architecture, another approach that reduces the entire knowledge processing time to just 8 hours using 16 AMD MI300X GPUs while maintaining Qwen 2.5's performance. In fact, the distillation process can utilize any LLM, not just Qwen, and enables knowledge transfer from larger LLMs to smaller ones with more fewer tokens. We will explain the detailed process and share our insights on building more powerful foundation models. Please note that this is an ongoing work that will be updated continuously. The model checkpoints and source code are available at \href{https://github.com/yynil/RWKVInside}{https://github.com/yynil/RWKVInside}, \href{https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1}{https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1}.

What is Daily Paper Cast?

We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com

Creator:
Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/
Gengyu Wang, LLM ML, http://wanggengyu.com

Listen on:
Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL
Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236

Cover Image by Kawen Kuang https://kawen.art