Sharing a video from my YT channel that breaks down the new KAN paper. It goes into all the core concepts required to understand the paper - the Kolmogorov Arnold Representation Theorem, Splines, MLPs, comparisons between MLPs and KANs, challenges ahead, and highlights some of the amazing properties/results of KANs like continual learning, sparsification,...
What are your best guesses on how it works (training and architecture) vs. the typical VL formula of pretrained vision encoder + pretrained LLM -> fine-tune with multimodal tasks? E.g. Is it fully mixed modality pre-training the entire system? Does model embed all modalities into a shared space for prediction? Does the system "self-select" the modality...
I am in the market for a new macbook (upgrading from 2018 intel mbp). Of course for big models, I can fit on remote servers. But i find for my domain, it's often simpler to just run things on my laptop for anywhere between 10 minutes to 10 hours while I go do something else. It seems like the new macbook airs are quite capable and I'm wondering whether...
Is active learning with BERT (for certain applications) still a relevant paradigm to submit papers under? Or is this like of work likely to be rejected based on being "out of date"? My idea is related to using BERT for medical classification, and I'm sure that LLMs may perform better. Wondering whether it would be worth it to invest time into a big...
Hey r/MachineLearning, I published a new article where I built an observable semantic research paper application. This is an extensive tutorial where I go in detail about: Developing a RAG pipeline to process and retrieve the most relevant PDF documents from the arXiv API. Developing a Chainlit driven web app with a Copilot for online paper retrieval....
I'm trying to extract past key, value pair using attention_layers and hidden_state for a particular layer def new_past_key_values(attention_layers, hidden_state, idx): W_k = attention_layers[idx].k_proj W_v = attention_layers[idx].v_proj new_key = W_k(hidden_state) new_value = W_v(hidden_state) batch_size, seq_length, hidden_dim = hidden_state.size()...
Створіть власну стрічку новин
Готові спробувати?
Спробуйте протягом 14 днів. Платіжна картка не потрібна.