CALCULATED CONTENT
Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the physics literature in the 80s: And while DD can seem complicated in deep learning models, the original model is actually very easy to understand — and reproduce — with just a few lines of python. IMHO,...
Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction And it got a lot of press (the Verge ) because it hints that it may be possible to improve the truthfulness of LLMs...
Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other models in it’s weight class such as LLamA 2 70B and GPT 3.5. Here’s a quick review of it’s performance on different LLM benchmarks: And even the smaller Mistal 7b model seems to be “punching well above its weight...
Evaluating LLMs is hard. Especially when you don’t have a lot of test data. In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at models after the ‘deltas’ (or updates) have been merged into the base model. In this post, we will look at LLMs fine-tuned using Parameter Efficient...
if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped measure. none of them can identify potential internal problems in your model, and in the end, you will probably need to design a custom...
WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details = watcher.analyze(..., fix_fingers='clip_xmax', ...) This will take a tiny bit longer, and will yield more reliable alpha for your model layers,...
Build your own newsfeed
Ready to give it a go?
Start a 14-day trial, no credit card required.