Introducing Databricks MPT-7B-Instruct: A Powerful Model for Instruction Following

Introducing Databricks MPT-7B-Instruct: A Powerful Model for Instruction Following

We are excited to introduce MPT-7B-Instruct, a specialized variant of the MPT-7B model designed for short-form instruction following. This model is the result of extensive finetuning on datasets derived from Databricks Dolly-15k and Anthropic's Helpful and Harmless (HH-RLHF) datasets.

Model Description:

  • MPT-7B-Instruct adopts a modified decoder-only transformer architecture, utilizing FlashAttention and ALiBi (Attention with Linear Biases). It is notable for not using positional embeddings or biases.

Training and Data:

  • Trained by MosaicML, MPT-7B-Instruct builds on the MPT-7B base model, which was trained on an impressive 1 trillion tokens of text and code.
  • The finetuning process involved data from Databricks Dolly-15k and Anthropic's Helpful and Harmless datasets.

Capabilities:

  • Designed for short-form instruction following, this model excels in tasks such as format conversion (e.g., YAML to JSON) and text generation based on provided instructions.

Technical Specifications:

  • Parameters: 6.7 billion
  • Layers: 32
  • Heads: 32 attention heads
  • Model Dimension: 4096
  • Vocabulary Size: 50,432
  • Sequence Length: Up to 4096 tokens (configurable)

Licensing:

  • The model is licensed under CC-By-SA-3.0, permitting commercial use.

Usage:

  • MPT-7B-Instruct can be integrated into text-generation pipelines. When loading the model and tokenizer, ensure to set trust_remote_code=True.
  • It uses the EleutherAI/gpt-neox-20b tokenizer.

Limitations and Biases:

  • Like other large language models, MPT-7B-Instruct may produce factually incorrect output and could generate biased or offensive content. It should not be used as a sole source for factually accurate information.

For a more comprehensive understanding, please refer to the blog post "Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs" and the model's documentation on Hugging Face.

Read more