691 followers 161 стаття/тиждень
[D] Kolmogorov Arnold Networks: A visual paper breakdown (Video)

Sharing a video from my YT channel that breaks down the new KAN paper. It goes into all the core concepts required to understand the paper - the Kolmogorov Arnold Representation Theorem, Splines, MLPs, comparisons between MLPs and KANs, challenges ahead, and highlights some of the amazing properties/results of KANs like continual learning, sparsification,...

Tue May 14, 2024 21:41
[D] GPT-4o "natively" multi-modal, what does this actually mean?

What are your best guesses on how it works (training and architecture) vs. the typical VL formula of pretrained vision encoder + pretrained LLM -> fine-tune with multimodal tasks? E.g. Is it fully mixed modality pre-training the entire system? Does model embed all modalities into a shared space for prediction? Does the system "self-select" the modality...

Tue May 14, 2024 21:41
[D] Which macbook for machine learning/bayesian statistics

I am in the market for a new macbook (upgrading from 2018 intel mbp). Of course for big models, I can fit on remote servers. But i find for my domain, it's often simpler to just run things on my laptop for anywhere between 10 minutes to 10 hours while I go do something else. It seems like the new macbook airs are quite capable and I'm wondering whether...

Tue May 14, 2024 21:41
[D] Is BERT still relevant in 2024 for an EMNLP submission?

Is active learning with BERT (for certain applications) still a relevant paradigm to submit papers under? Or is this like of work likely to be rejected based on being "out of date"? My idea is related to using BERT for medical classification, and I'm sure that LLMs may perform better. Wondering whether it would be worth it to invest time into a big...

Tue May 14, 2024 21:41
[R] Building an Observable arXiv RAG Chatbot with LangChain, Chainlit, and Literal AI

Hey r/MachineLearning, I published a new article where I built an observable semantic research paper application. This is an extensive tutorial where I go in detail about: Developing a RAG pipeline to process and retrieve the most relevant PDF documents from the arXiv API. Developing a Chainlit driven web app with a Copilot for online paper retrieval....

Tue May 14, 2024 21:41
past key values from hidden states [D]

I'm trying to extract past key, value pair using attention_layers and hidden_state for a particular layer def new_past_key_values(attention_layers, hidden_state, idx): W_k = attention_layers[idx].k_proj W_v = attention_layers[idx].v_proj new_key = W_k(hidden_state) new_value = W_v(hidden_state) batch_size, seq_length, hidden_dim = hidden_state.size()...

Tue May 14, 2024 18:42

Створіть власну стрічку новин

Готові спробувати?
Спробуйте протягом 14 днів. Платіжна картка не потрібна.

Зареєструватися