Llama3.1 (8B) Inference
To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!
To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.
You will learn how to do data prep, how to train, how to run the model, & how to save it
News
Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog
You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog
Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog
3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context
New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL
Visit our docs for all our model uploads and notebooks.
Installation
Unsloth
If you want to finetune Llama-3 2x faster and use 70% less VRAM, go to our finetuning notebook!
config.json: 0%| | 0.00/1.15k [00:00<?, ?B/s]
==((====))== Unsloth: Fast Llama patching release 2024.4 \\ /| GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux. O^O/ \_/ \ Pytorch: 2.2.1+cu121. CUDA = 7.5. CUDA Toolkit = 12.1. \ / Bfloat16 = FALSE. Xformers = 0.0.25.post1. FA = False. "-____-" Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
model.safetensors: 0%| | 0.00/5.70G [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/131 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/51.0k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/9.09M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/449 [00:00<?, ?B/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Change the "value" part to call the model!
Unsloth makes inference natively 2x faster! No need to change or do anything!
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|><|start_header_id|>human<|end_header_id|> Continue the fibonacci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|> The next numbers in the Fibonacci sequence would be: 13, 21, 34, 55, 89, 144,...<|eot_id|>
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|><|start_header_id|>human<|end_header_id|> Describe the tallest tower in the world.<|eot_id|><|start_header_id|>assistant<|end_header_id|> The tallest tower in the world is the Tokyo Skytree, located in Tokyo, Japan. It stands at an incredible height of 634 meters (2,080 feet) and was completed in 2012. The Tokyo Skytree is not only the tallest tower in the world but also the tallest free-standing tower, meaning it is not supported by any external structures. The Tokyo Skytree was built as a broadcasting tower, designed to replace the aging Tokyo Tower, which was built in 1958. The new tower was designed to provide better broadcasting services to the Tokyo metropolitan area, as well as to serve as a iconic landmark and tourist attraction. The tower's design is unique, with a distinctive shape that resembles a giant antenna. It has a square base that tapers to a point at the top, with a series of observation decks and broadcasting equipment installed along the way. The tower's height is divided into three sections: the base, the middle, and the top, each with its own unique features and attractions. The Tokyo Skytree has become a popular tourist destination, attracting millions of visitors each year. Visitors can take elevators to the observation decks, which offer stunning views of the Tokyo skyline and the surrounding landscape. The tower also features a variety of shops, restaurants, and entertainment facilities, making it a must-visit destination for anyone traveling to Tokyo. In addition to its height, the Tokyo Skytree is also notable for its innovative design and construction. It was built using a unique " lattice" structure, which allows it to withstand strong winds and earthquakes, while also providing a strong and stable foundation for the broadcasting equipment and observation decks. The tower's construction was also designed to minimize its environmental impact, using sustainable materials and energy-efficient systems wherever possible. Overall, the Tokyo Skytree is an impressive and iconic structure that has become a symbol of Tokyo's modernity and innovation. Its height, design, and attractions make it a must-visit destination for anyone interested in architecture, engineering, or simply wanting to experience the thrill of being at the top of the world's tallest tower.<|eot_id|>
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results. Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
<|begin_of_text|><|start_header_id|>human<|end_header_id|> What is Unsloth?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Unsloth is a fascinating topic! Unsloth is a term used to describe a hypothetical, hypothetical creature that is the opposite of a sloth. While sloths are known for their slow movements and sedentary lifestyle, Unsloth would be a creature that is incredibly fast, agile, and energetic. The concept of Unsloth is often used as a thought experiment or a humorous idea, rather than a serious scientific concept. It's a fun way to imagine what a creature would be like if it were the exact opposite of a sloth in terms of its physical abilities and behavior. In reality, there is no such creature as Unsloth, and it's not a recognized scientific term. However, the idea of Unsloth can be a fun and imaginative concept to explore, and it can even inspire creative writing, art, or even scientific speculation about what such a creature might look like or how it might behave.<|eot_id|>
And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Looking to use Unsloth locally? Read our Installation Guide for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
- Learn how to do Reinforcement Learning with our RL Guide and notebooks.
- Read our guides and notebooks for Text-to-speech (TTS) and vision model support.
- Explore our LLM Tutorials Directory to find dedicated guides for each model.
- Need help with Inference? Read our Inference & Deployment page for details on using vLLM, llama.cpp, Ollama etc.



