Machine Learning – Page 75

Mistral AI Releases the Mistral-Small-24B-Instruct-2501: A Latency-Optimized 24B-Parameter Model Released Under the Apache 2.0 License

Developing compact yet high-performing language models remains a significant challenge in artificial intelligence. Large-scale models often require extensive computational resources, […]

Curiosity-Driven Reinforcement Learning from Human Feedback CD-RLHF: An AI Framework that Mitigates the Diversity Alignment Trade-off In Language Models

Large Language Models (LLMs) have become increasingly reliant on Reinforcement Learning from Human Feedback (RLHF) for fine-tuning across various applications, […]

Memorization vs. Generalization: How Supervised Fine-Tuning SFT and Reinforcement Learning RL Shape Foundation Model Learning

Modern AI systems rely heavily on post-training techniques like supervised fine-tuning (SFT) and reinforcement learning (RL) to adapt foundation models […]

The Allen Institute for AI (AI2) Releases Tülu 3 405B: Scaling Open-Weight Post-Training with Reinforcement Learning from Verifiable Rewards (RLVR) to Surpass DeepSeek V3 and GPT-4o in Key Benchmarks

Post-training techniques, such as instruction tuning and reinforcement learning from human feedback, have become essential for refining language models. But, […]

Meta AI Proposes EvalPlanner: A Preference Optimization Algorithm for Thinking-LLM-as-a-Judge

The rapid advancement of Large Language Models (LLMs) has significantly improved their ability to generate long-form responses. However, evaluating these […]