Language Model – Page 73

YuLan-Mini: A 2.42B Parameter Open Data-efficient Language Model with Long-Context Capabilities and Advanced Training Techniques

Large language models (LLMs) built using transformer architectures heavily depend on pre-training with large-scale data to predict sequential tokens. This […]

Quasar-1: A Rigorous Mathematical Framework for Temperature-Guided Reasoning in Language Models

Large language models (LLMs) encounter significant difficulties in performing efficient and logically consistent reasoning. Existing methods, such as CoT prompting, […]

Meet SemiKong: The World’s First Open-Source Semiconductor-Focused LLM

The semiconductor industry enables advancements in consumer electronics, automotive systems, and cutting-edge computing technologies. The production of semiconductors involves sophisticated […]

Google DeepMind Introduces Differentiable Cache Augmentation: A Coprocessor-Enhanced Approach to Boost LLM Reasoning and Efficiency

Large language models (LLMs) are integral to solving complex problems across language processing, mathematics, and reasoning domains. Enhancements in computational […]

DeepSeek-AI Just Released DeepSeek-V3: A Strong Mixture-of-Experts (MoE) Language Model with 671B Total Parameters with 37B Activated for Each Token

The field of Natural Language Processing (NLP) has made significant strides with the development of large-scale language models (LLMs). However, […]

A Comprehensive Analytical Framework for Mathematical Reasoning in Multimodal Large Language Models

Mathematical reasoning has emerged as a critical frontier in artificial intelligence, particularly in developing Large Language Models (LLMs) capable of […]

This Research from Amazon Explores Step-Skipping Frameworks: Advancing Efficiency and Human-Like Reasoning in Language Models

The pursuit of enhancing artificial intelligence (AI) capabilities is significantly influenced by human intelligence, particularly in reasoning and problem-solving. Researchers […]

Tsinghua University Researchers Just Open-Sourced CogAgent-9B-20241220: The Latest Version of CogAgent

Graphical User Interfaces (GUIs) are central to how users engage with software. However, building intelligent agents capable of effectively navigating […]

Qwen Team Releases QvQ: An Open-Weight Model for Multimodal Reasoning

Multimodal reasoning—the ability to process and integrate information from diverse data sources such as text, images, and video—remains a demanding […]

This AI Paper by The Data Provenance Initiative Team Highlights Challenges in Multimodal Dataset Provenance, Licensing, Representation, and Transparency for Responsible Development

The advancement of artificial intelligence hinges on the availability and quality of training data, particularly as multimodal foundation models grow […]