PARAMANU-GANITA: Language Model with Mathematical Capabilities

  • 2024-04-22 18:55:56
  • Mitodru Niyogi, Arnab Bhattacharya
  • 0

Abstract

In this paper, we present Paramanu-Ganita, a 208 million parameter novel AutoRegressive (AR) decoder based language model on mathematics. The model ispretrained from scratch at context size of 4096 on our curated mixedmathematical corpus. We evaluate our model on both perplexity metric and GSM8kmathematical benchmark. Paramanu-Ganita despite being 35 times smaller than 7BLLMs, outperformed generalist LLMs such as LLaMa-1 7B by 28.4% points, LLaMa-27B by 27.6% points, Falcon 7B by 32.6% points, PaLM 8B by 35.3% points, andmath specialised LLMs such as Minerva 8B by 23.2% points, and LLEMMA-7B by 3.0%points in GSM8k test accuracy metric respectively. Paramanu-Ganita alsooutperformed giant LLMs like PaLM 62B by 6.4% points, Falcon 40B by 19.8%points, LLaMa-1 33B by 3.8% points and Vicuna 13B by 11.8% points respectively.The large significant margin improvement in performance of our math model overthe existing LLMs signifies that reasoning capabilities of language model arejust not restricted to LLMs with humongous number of parameters.Paramanu-Ganita took 146 hours of A100 training whereas math specialised LLM,LLEMMA 7B, was trained for 23,000 A100 hours of training equivalent. Thus, ourapproach of pretraining powerful domain specialised language models fromscratch for domain adaptation is much more cost-effective than performingcontinual training of LLMs for domain adaptation. Hence, we conclude that forstrong mathematical reasoning abilities of language model, we do not need giantLLMs and immense computing power to our end. In the end, we want to point outthat we have only trained Paramanu-Ganita only on a part of our entiremathematical corpus and yet to explore the full potential of our model.

 

Quick Read (beta)

loading the full paper ...