Thoughts on applied AI and machine learning
393 followers 0 articles/week
Describing Double Descent with WeightWatcher

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the physics literature in the 80s: And while DD can seem complicated in deep learning models, the original model is actually very easy to understand — and reproduce — with just a few lines of python. IMHO,...

Fri Mar 1, 2024 11:56
SVDSmoothing LLM Layers with WeightWatcher

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction And it got a lot of press (the Verge ) because it hints that it may be possible to improve the truthfulness of LLMs...

Tue Feb 13, 2024 11:33
Evaluating LLMs with WeightWatcher Part III: The Magic of Mistral, a Story of Dragon Kings

Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other models in it’s weight class such as LLamA 2 70B and GPT 3.5. Here’s a quick review of it’s performance on different LLM benchmarks: And even the smaller Mistal 7b model seems to be “punching well above its weight...

Tue Jan 30, 2024 11:06
Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models

Evaluating LLMs is hard. Especially when you don’t have a lot of test data. In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at models after the ‘deltas’ (or updates) have been merged into the base model. In this post, we will look at LLMs fine-tuned using Parameter Efficient...

Sun Jan 28, 2024 11:11
Evaluating Fine-Tuned LLMs with WeightWatcher

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped measure. none of them can identify potential internal problems in your model, and in the end, you will probably need to design a custom...

Wed Jan 24, 2024 11:14
WeightWatcher new feature: fix_fingers=’clip_xmax’

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details = watcher.analyze(..., fix_fingers='clip_xmax', ...) This will take a tiny bit longer, and will yield more reliable alpha for your model layers,...

Wed Mar 22, 2023 02:10

Build your own newsfeed

Ready to give it a go?
Start a 14-day trial, no credit card required.

Create account