Introducing Databricks MPT-7B-Instruct: A Powerful Model for Instruction Following
We are excited to introduce MPT-7B-Instruct, a specialized variant of the MPT-7B model designed for short-form instruction following. This model is the result of extensive finetuning on datasets derived from Databricks Dolly-15k and Anthropic's Helpful and Harmless (HH-RLHF) datasets.
Model Description:
- MPT-7B-Instruct adopts a modified decoder-only transformer architecture, utilizing FlashAttention and ALiBi (Attention with Linear Biases). It is notable for not using positional embeddings or biases.
Training and Data:
- Trained by MosaicML, MPT-7B-Instruct builds on the MPT-7B base model, which was trained on an impressive 1 trillion tokens of text and code.
- The finetuning process involved data from Databricks Dolly-15k and Anthropic's Helpful and Harmless datasets.
Capabilities:
- Designed for short-form instruction following, this model excels in tasks such as format conversion (e.g., YAML to JSON) and text generation based on provided instructions.
Technical Specifications:
- Parameters: 6.7 billion
- Layers: 32
- Heads: 32 attention heads
- Model Dimension: 4096
- Vocabulary Size: 50,432
- Sequence Length: Up to 4096 tokens (configurable)
Licensing:
- The model is licensed under CC-By-SA-3.0, permitting commercial use.
Usage:
- MPT-7B-Instruct can be integrated into text-generation pipelines. When loading the model and tokenizer, ensure to set
trust_remote_code=True
. - It uses the EleutherAI/gpt-neox-20b tokenizer.
Limitations and Biases:
- Like other large language models, MPT-7B-Instruct may produce factually incorrect output and could generate biased or offensive content. It should not be used as a sole source for factually accurate information.
For a more comprehensive understanding, please refer to the blog post "Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs" and the model's documentation on Hugging Face.