top of page

Shot #

16

Speculative Decoding Made Easy With Only One Model

You already know that Speculative Decoding reduces LLM inference time, don't you? (If you don't, watch shot #14 to learn how Speculative Decoding works!) The magic behind Speculative Decoding lies in leveraging a smaller draft model. But what if you don't have a smaller model on hand? In this shot you'll learn how to achieve a 10% speedup via Speculative Decoding, by simulating a smaller model using the layers of the big model! This shot was filmed in front of a live audience 👥👥, as part of the One Shot Learning’s 2nd Anniversary Fest 🎉 (https://www.oneshotlearning.io/anniversary-fest-2024). The audience was thrilled with this festive event (just as we were). Want to know about future events? Join our Telegram community! https://t.me/one_shot_learning

Shot #

15

Superposition Unveiled in LLMs

In this shot we'll investigate superposition in LLMs! Superposition is a phenomenon where an entity exists in multiple states simultaneously. But what does this have to do with LLMs?... We'll explore superposition by combining two distinct pieces of text into a single superposed text. Then, we'll test if a frozen LLM can still access and utilize the information from the original texts, when exposed to the superposed text! If the model succeeds in the task, it means that superposition is inherent to the way LLMs work. Join us as we uncover the hidden workings of these AI systems!

Shot #

14

From Paper to Code - Speculative Decoding

In this shot we implemented a cool paper - from zero to hero! Speculative decoding is a neat algorithm introduced by two different papers that were published independently around the same time! The idea is relatively simple: Instead of generating tokens using a big model - which cost you time and money - you can use a smaller one to generate the tokens more efficiently, and then verify them using the big model. The result? Faster and cheaper text generation, while not compromising on the quality. In fact, there's a mathematical proof that shows that you end up with a text distribution identical to the text distribution of the big model. Watch the video to learn how to implement the algorithm.

Shot #

13

The AI Agent You’ve Always Wanted – Using LangChain

Everyone knows that AI agents are the next big thing in Generative AI. In this shot, we implemented an AI agent to assist with a personal pain point - writing LinkedIn posts about tech trends. LangChain was the obvious choice: We built the agent according to the ReAct methodology, and enhanced it with the needed tools. We've come up with some interesting results that are seasoned with criticism and insights.

Shot #

12

Unveiling the Wisdom of Hidden Layers for Early Exit

Imagine you could cut 30% of the processing time ⚡️ of a Transformer model with almost no impact on its accuracy 🎯! In this shot you'll see how it can be done. Using the information provided in the hidden layers, we've implemented an early exit strategy. Our latest chaser - Unveiling Vision Transformer’s Gut (https://youtu.be/b6BnhqWQZic) - hints at this. The results were decent, but we wanted even better ones. So we made some neat adaptations to the hidden layers to get the promised runtime performance improvement - with negligible impact on accuracy. This shot was filmed in front of a live audience 👥👥, as part of the One Shot Learning's Anniversary Fest 🎉 (https://www.oneshotlearning.io/anniversary-fest-2023). The audience was thrilled with this festive event (just as we were). Want to know about future events? Join our Telegram community!

Chaser

Unveiling Vision Transformer’s Gut

What if you could open the Transformers black box and see how it builds its predictions layer by layer? It turns out you can! In this chaser, we’ll dissect ViT (Vision Transformer). To get a sense of the flow of predictions within the Transformer, we used a cool flow diagram called Sankey, which revealed some interesting insights!

Shot #

11

Cracking the Job Interview - Argmax

In this shot, we take on a new challenge from Uri Goren, founder of Argmax, and dive into the world of recommender systems. Together with Uri, we tackle a home assignment that involves creating a web app with personalized search capabilities. Users can search by using free text, or by uploading an image, which makes things more interesting! Using OpenAI's Clip model, we embed items and queries into a shared space, enabling effective search. We also experiment with personalizing the search experience for each user. With Uri's years of experience in the domain and our expertise, we avoid common pitfalls and crack the job interview!

Shot #

10

ChatGPT Goes Beyond Its Knowledge Cut-Off With External Database Integration

In this video, we demonstrate the power of ChatGPT, a highly advanced language model 🤖 developed by OpenAI. However, as the model’s knowledge cutoff is 2021, it may not be aware of the most recent developments 😢. To combat this, we utilized TechCrunch as a source for up-to-date information 🗞️: We scraped it, and using OpenAI’s API, we embedded it into a high-dimensional vector space. Now, when a user asks a question, instead of solely relying on ChatGPT’s knowledge, we retrieve the most relevant item from the TechCrunch database (using vector similarity), resulting in a smarter and more informed response 💪 Watch as we put this enhanced version of ChatGPT to the test and see the impressive results for yourself! 👀

Shot #

9

Cracking the Job Interview - ZipRecruiter (Digestif)

The use of open-source models is pretty easy these days, thanks to great packages such as sklearn and huggingface 🤗. But what if we want to use these models in a different way - not the way they were intended? This is a different story... In this shot, you’ll learn how easy it is to open the black box of deep models and modify the architecture to suit your needs. We’ll continue from where we left off in the previous shot (don’t worry if you haven’t seen it yet), where we cracked the job interview of ZipRecruiter. We’ll take SetFit, the Transformers-based model we used to classify a job description, to the next level 🚀! SetFit sometimes misses the obvious correlation between a word’s presence and its label. That makes sense, since deep models (especially those that output sentence embeddings, like SetFit) are effective at understanding semantics, but not at memorizing words 😕. Shallow models, such as a bag of words, are ideal for this purpose. So we decided to take the deep model and change its architecture to make it both deep and shallow, resulting in improved accuracy! 🎯

Shot #

8

Cracking the Job Interview - ZipRecruiter (Aperitif)

This is the first of a brand new series of shots where we crack the job interview! We were given a home task that is part of the Data Scientist hiring process in ZipRecruiter, and we solved it - the One Shot Learning way. We were presented with a multi-class prediction challenge: given a job description, our goal is to predict one of six possible labels. As part of the challenge, we were also asked to identify what the six classes represent. First, we used a state-of-the-art classification model called SetFit. After we saw which classes the model struggled with, we proceeded on to figure out what the classes mean.

Shot #

7

Kaggle Meets Real Life - Featuring Nathaniel Shimoni

There is a widespread belief that knowledge gained from Kaggle can be applied to the real world. But have you ever met someone who actually applied it to their day job? We have! Meet Nathaniel Shimoni, a Kaggle expert who has participated in many competitions. In one of them, he was tasked with predicting the daily sales of Rossmann stores. At the time, his job at Strauss was very similar to this challenge, so trying to solve it was an obvious choice for him. We've discussed (and implemented) many key points, including what's the right loss function to use, how to preprocess time-series features, how to split to train-test in this delicate scenario, and more... Join us to see some of Nathaniel's tricks from the competition, and get a taste of his Kaggle magic powder!

Shot #

6

Overcoming Tower of Babel Using Calibration

Multiple models can be helpful in solving a given task (and you know it if you watched shot #5 😉). Having multiple models, however, can complicate your task if it involves ranking. 😫 Worry not, shot #6 is here to help! In this shot you’ll see that when ranking items based on your models’ scores, the top items may be biased (i.e. more likely to be selected by one of the models). It's because models' probabilities distributions are not necessarily generated in the same manner. You’ll also learn how to handle this situation by using model calibration to ensure the models speak the same language, making it easier for them to collaborate (and solving the Tower of Babel scenario). 🤝

Shot #

5

The Power of Two: When Two Models Are Better Than One

So you've finished training your super-duper machine learning model, right? Turns out it might pay off to train two different models rather than hoping your model will do the job. It is especially true for greedy models, such as decision trees. In this shot, we saw how a decision tree learned not to use one of the features. We believed this feature should be useful - based on our data visualizations. So how can we make the model use it? As easy as pie - we split the model into two! We significantly improved the accuracy by training two different trees on two subsets of the data (based on the feature).

Chaser

Deming Regression Demystified

In shot #4 we've emphasized the need for privacy when conducting a salary survey. Toward that goal, we added noise to the respondents' data, and used Deming Regression - a special linear model that handles noise in the features! In this chaser, with the help of a cool interactive visualization, we'll take a closer look at what the model aims to optimize.

Shot #

4

Breaking the Privacy-Utility Tradeoff with Deming Regression

Imagine you’re conducting a salary survey with the goal of training a model to predict the salary. Cool, right? Not if you don’t handle user privacy... How can we make sure the collected data can’t be used to identify the users, while still being able to properly train our model? In this shot, we’ll eat the cake and leave it whole: We’ll use a less known model called Deming regression to handle our anonymized data, and it’ll have a quality similar to a model trained on the private data!

Shot #

3

Reinforcement (Prompt) Learning

So now you know how to execute few-shot learning, right? (You don’t? So head up to shot #2) The next challenge is choosing the best set of examples (prompt)... Reinforcement Learning to the rescue! In this shot we’ll make an analogy between few-shot learning to the multi-armed bandit setting: given a set of predefined prompts, we’ll balance the exploration-exploitation tradeoff using Thompson-sampling, in the mission to identify the best prompt using a constrained budget.

Shot #

2

What's Your Sentiment Towards Few Shot Learning?

In this shot we'll demonstrate one of the latest main paradigms in NLP: few-shot learning. We'll create a high-performing sentiment classifier of IMDB reviews using Large Language Models (LLM), in less than 30 minutes. And the beautiful part? No model training is required! In the shot we've used AI21 Studio as the LLM provider - which is open for public.

Shot #

1

Exploratory Data Analysis: The Olympic Medals

In this shot we'll perform a classic Exploratory Data Analysis (EDA) using the Olympic Medals dataset. Join us to learn some cool insights: - Which sport activities pass the test of time? - Which countries send more female athletes? - How medals are spread across countries and activities?
bottom of page