Nvidia: MatMul-free Technology Eliminates GPU Dependency for Language Models

Researchers from the University of California, Santa Cruz, Sichuan University, and the University of California, Davis have introduced a new architecture for language models that removes the need for matrix multiplications (MatMul). This innovation significantly reduces memory and time costs for training and running models.

Matrix multiplication (MatMul) is one of the most computationally intensive operations in transformer models. As language models increase in size, MatMul costs escalate, demanding substantial computational resources and inevitably leading to delays.

In their work, researchers proposed MatMul-free models that achieve performance comparable to modern transformers but require significantly less memory to operate. Unlike traditional models that use 16-bit weights, the new architecture employs 3-bit weights that take values of -1, 0, and +1, thus greatly reducing computational costs.

The use of ternary weights (comprising three components) allows MatMul to be replaced by addition and subtraction operations, simplifying computations significantly. The new architecture incorporates “BitLinear layers,” which utilize ternary weights to achieve similar results with reduced costs.

Researchers also introduced a MatMul-free Linear Gated Recurrent Unit (MLGRU) to replace the traditional token mixer. This model updates hidden states using simple ternary operations, bypassing expensive matrix multiplications.

Instead of a traditional channel mixer, they employed a modified Gated Linear Unit (GLU) adapted to work with ternary weights. This adaptation reduces computational complexity and memory consumption while maintaining feature integration efficiency.

Researchers compared two variants of their model with the Transformer++ architecture (used in Llama-2) and found that their new models more efficiently utilize additional computational resources to improve performance.

MatMul-free models also demonstrated superiority in several language tasks. For instance, a model with 2.7 billion parameters outperformed Transformer++ on two challenging tests (ARC-Challenge and OpenbookQA), while maintaining comparable performance on other tasks.

As expected, MatMul-free models require less memory usage and exhibit fewer delays compared to Transformer++. For a model with 13 billion parameters, MatMul-free consumed only 4.19 GB of memory and had a delay of 695.48 ms, whereas Transformer++ required 48.50 GB of memory and had a delay of 3183.10 ms.

Researchers also developed an optimized GPU implementation and a specialized FPGA configuration for MatMul-free models. This accelerated training by 25.6% and reduced memory consumption by 61% compared to the non-optimized implementation.

The authors believe their research could pave the way for developing more efficient and hardware-friendly deep learning architectures.

Due to computational resource limitations, they were unable to test the architecture on models with more than 100 billion parameters. However, researchers hope their work will inspire other institutions to create and utilize such lightweight models.

Ideally, such an architecture will make language models much less reliant on high-performance graphics processors like Nvidia’s, allowing researchers to deploy powerful models on more budget-friendly types of processors, which are also easier to obtain in the era of ubiquitous machine learning.

The algorithm code and all models are already available to the research community, enabling collaborative efforts and transparent development and improvement of this architecture in the future.

  • Say

    Related Posts

    Simply NUC Releases Expensive and Powerful Mini-PC with RTX 4070
    • SaySay
    • July 18, 2024

    Simply NUC has launched the Scorpion Canyon NUC 14 Performance mini-PC. It comes in several configurations, with the most powerful one featuring an Intel Core Ultra 9 185H processor and…

    Continue reading
    Xbox Series Sales to Cease in Several European Countries Due to Lack of Demand
    • SaySay
    • July 12, 2024

    Microsoft’s battle in the console wars this generation isn’t going well. Recently, we reported that Xbox Series consoles are selling even worse than the Xbox One. Now, Tom Warren has…

    Continue reading

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    YouTube Relic: Video of Herobrine’s First Appearance in Minecraft Found

    • By Say
    • July 22, 2024
    • 4 views
    YouTube Relic: Video of Herobrine’s First Appearance in Minecraft Found

    Samsung Patents Accordion-Style Tablet with Expandable Screen

    • By Say
    • July 22, 2024
    • 4 views
    Samsung Patents Accordion-Style Tablet with Expandable Screen

    The End: Showrunners of The Witcher Series Conclude Plot for Fifth Season

    • By Say
    • July 22, 2024
    • 4 views
    The End: Showrunners of The Witcher Series Conclude Plot for Fifth Season

    Samsung Reveals “Contraindications” for Galaxy Ring Smart Ring

    • By Say
    • July 22, 2024
    • 3 views
    Samsung Reveals “Contraindications” for Galaxy Ring Smart Ring

    Yasuke Wasn’t a Samurai: Japanese Historians Clash Over His Status

    • By Say
    • July 22, 2024
    • 6 views
    Yasuke Wasn’t a Samurai: Japanese Historians Clash Over His Status

    Leak: HUAWEI to Soon Unveil Affordable Foldable Smartphone

    • By Say
    • July 22, 2024
    • 4 views
    Leak: HUAWEI to Soon Unveil Affordable Foldable Smartphone