{"id":93040,"date":"2025-03-26T21:26:58","date_gmt":"2025-03-26T13:26:58","guid":{"rendered":"https:\/\/dhome.com.tw\/?p=93040"},"modified":"2025-03-27T03:03:19","modified_gmt":"2025-03-26T19:03:19","slug":"llms-revolutionizing-ai-hugging-face-video","status":"publish","type":"post","link":"https:\/\/dhome.com.tw\/llms-revolutionizing-ai-hugging-face-video\/","title":{"rendered":"LLMs: Revolutionizing AI Hugging Face Video Tutorial LinkedIn Learning, formerly Lynda com"},"content":{"rendered":"

Fine-Tuning Large Language Models LLMs by Shaw Talebi<\/h1>\n<\/p>\n

\"fine<\/p>\n

This automation is particularly valuable for small teams or individual developers who need to deploy custom LLMs quickly and efficiently. Model quantisation is a technique utilised to reduce the size of an AI model by representing its parameters with fewer bits. Quantisation aims to alleviate this by reducing the precision of these parameters. For instance, instead of storing each parameter as a 32-bit floating-point number, they may be represented using fewer bits, such as 8-bit integers. This compression reduces the memory footprint of the model, making it more efficient to deploy and execute, especially in resource-constrained environments like mobile devices or edge devices. QLoRA is a popular example of this quantisation for LLMs and can be used to deploy LLMs locally or host them on external servers.<\/p>\n<\/p>\n

\"fine<\/p>\n

Fine-tuning large language models (LLMs) is an essential step for anyone looking to leverage AI for specialized tasks. While these models perform exceptionally well on general tasks, they often require fine-tuning to handle more niche, task-oriented challenges effectively. This article will walk you through the key aspects of fine-tuning LLMs starting with what is fine-tuning to help you understand the basics and implement the process for optimal results.<\/p>\n<\/p>\n

GitHub – TimDettmers\/bitsandbytes: Accessible large language models via k-bit quantization for\u2026<\/h2>\n<\/p>\n

Before generating the output, we prepare a simple prompt template as shown below. Soft prompting \u2013 There is also a method of soft prompting or prompt tuning where we add new trainable tokens to the model prompt. These new tokens are trained while all other tokens and model weights are kept frozen. Lastly you can put all of this in Pandas Dataframe and split it into training, validation and test set and save it so you can use it in training process. If you created further synthetic data, as I did with captialization and partial sentences, then make sure that each of train, validation and test set contain and consistent number of such data e.g. Following is the prompt I used to generate bootstrapping dataset and then later updated it to contain examples.<\/p>\n<\/p>\n

The comprehensive training enables the model to handle various tasks proficiently, making it suitable for environments where versatile performance is necessary. Confluent Cloud for Apache Flink\u00ae\ufe0f supports AI model inference and enables the use of models as resources in Flink SQL, just like tables and functions. You can use a SQL statement to create a model resource and invoke it for inference in streaming queries. Remember that Hugging Face datasets are stored on disk by default, so this will not inflate your memory usage! Once the<\/p>\n

columns have been added, you can stream batches from the dataset and add padding to each batch, which greatly<\/p>\n

reduces the number of padding tokens compared to padding the entire dataset. You can see that all the modules were successfully initialized and the model has started training.<\/p>\n<\/p>\n

Users provide the model with a more focused dataset, which may include industry-specific terminology or task-focused interactions, with the objective of helping the model generate more relevant responses for a specific use case. Fine-tuning is taking a pre-trained LLM and refining its weights using a labelled dataset to improve its performance on a specific task. It\u2019s like teaching an expert new skills that are highly relevant to your needs. While the base model may have a broad understanding, fine-tuning improves its abilities, making it better suited for task specific applications.<\/p>\n<\/p>\n

\"fine<\/p>\n

However, users must be mindful of the resource requirements and potential limitations in customisation and complexity management. While large language model (LLM) applications undergo some form of evaluation, continuous monitoring remains inadequately implemented in most cases. This section outlines the components necessary to establish an effective monitoring programme aimed at safeguarding users and preserving brand integrity. A tech company used quantised LLMs to deploy advanced NLP models on mobile devices, enabling offline functionality for applications such as voice recognition and translation.<\/p>\n<\/p>\n

Data Format For SFT \/ Generic Trainer<\/h2>\n<\/p>\n

In the context of optimising model fine-tuning, the pattern analysis of LoRA and Full Fine-Tuning (FT) reveals significant differences in learning behaviours and updates. Despite its computational efficiency, previous studies have suggested that LoRA\u2019s limited number of trainable parameters might contribute to its performance discrepancies when compared to FT. RAG systems can dynamically retrieve information during generation, making them fine tuning llm tutorial<\/a> highly adaptable to changing data and capable of delivering more relevant and informed outputs. This technique is beneficial for applications where the accuracy and freshness of information are critical, such as customer support, content creation, and research. By leveraging RAG, businesses can ensure their language models remain current and provide high-quality responses that are well-grounded in the latest information available.<\/p>\n<\/p>\n

The key is formulating the right mapping from your text inputs to desired outputs. Let\u2019s now use the ROUGE metric to quantify the validity of summarizations produced by models. It compares summarizations to a \u201cbaseline\u201d summary which is usually created by a human. While it\u2019s not a perfect metric, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning. Here, the model is prepared for QLoRA training using the `prepare_model_for_kbit_training()` function.<\/p>\n<\/p>\n

This would involve teaching them the basics of medicine, such as anatomy, physiology, and pharmacology. It would also involve teaching them about specific medical conditions and treatments. The weight matrix is scaled by alpha\/r, and thus a higher value for alpha assigns more weight to the LoRA activations. For instance, a large e-commerce platform implemented traditional on-premises GPU-based deployment to handle millions of customer queries daily.<\/p>\n<\/p>\n

\"fine<\/p>\n

As you can imagine, it would take a lot of time to create this data for your document if you were to do it manually. Don’t worry, I’ll show you how to do it easily with the Haystack annotation tool. Out_proj is a linear layer used to project the decoder output into the vocabulary space. The layer is responsible for converting the decoder\u2019s hidden state into a probability distribution over the vocabulary, which is then used to select the next token to generate.<\/p>\n<\/p>\n

As this value is increased, the number of parameters needed to be updated during the low-rank adaptation increases. Intuitively, a lower r may lead to a quicker, less computationally intensive training process, but may affect the quality of the model thus produced. However, increasing r beyond a certain value may not yield any discernible increase in quality of model output. How the value of r affects adaptation (fine-tuning) quality will be put to the test shortly.<\/p>\n<\/p>\n