- 292
- 2 016 156
RAIL
United States
Приєднався 24 бер 2015
The official channel of the Robotic AI & Learning Lab at UC Berkeley
We are faculty, students, and post-docs who conduct research on machine learning, robotics, and everything in-between.
Website: rail.eecs.berkeley.edu/
Lab director: Sergey Levine [people.eecs.berkeley.edu/~svlevine/]
We are faculty, students, and post-docs who conduct research on machine learning, robotics, and everything in-between.
Website: rail.eecs.berkeley.edu/
Lab director: Sergey Levine [people.eecs.berkeley.edu/~svlevine/]
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
Jianlan Luo*, Zheyuan Hu*, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine
Project page: serl-robot.github.io/
Code: github.com/rail-berkeley/serl
Jianlan Luo*, Zheyuan Hu*, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine
Project page: serl-robot.github.io/
Code: github.com/rail-berkeley/serl
Переглядів: 2 182
Відео
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning
Переглядів 7764 місяці тому
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning Jianlan Luo*, Charles Xu*, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine Project website: functional-manipulation-benchmark.github.io/
Making Real-World Reinforcement Learning Practical
Переглядів 11 тис.4 місяці тому
Lecture by Sergey Levine about progress on real-world deep RL. Covers these papers: A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning: sites.google.com/berkeley.edu/walk-in-the-park Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion: sites.google.com/berkeley.edu/aprl Reset-Free Reinforcement Learning via Multi-Task Learnin...
RLIF: Interactive Imitation Learning as Reinforcement Learning
Переглядів 9505 місяців тому
We present a reinforcement learning algorithm that runs under DAgger-like assumptions, which can improve upon suboptimal experts without knowing ground-truth rewards. RLIF: Interactive Imitation Learning as Reinforcement Learning Jianlan Luo*, Perry Dong*, Yuexiang Zhai, Yi Ma, Sergey Levine Project page: rlif-page.github.io/
CS 285: Guest Lecture: Dorsa Sadigh
Переглядів 1,2 тис.5 місяців тому
CS 285: Guest Lecture: Dorsa Sadigh
CS 285: Guest Lecture: Aviral Kumar
Переглядів 1,8 тис.5 місяців тому
CS 285: Guest Lecture: Aviral Kumar
CS 285: Lecture 23, Part 2: Challenges & Open Problems
Переглядів 1,8 тис.6 місяців тому
CS 285: Lecture 23, Part 2: Challenges & Open Problems
CS 285: Lecture 23, Part 1: Challenges & Open Problems
Переглядів 1,4 тис.6 місяців тому
CS 285: Lecture 23, Part 1: Challenges & Open Problems
Large-Scale Data-Driven Robotic Learning
Переглядів 1,9 тис.6 місяців тому
Presentation by Sergey Levine prepared for the "Towards Generalist Robots" workshop at CoRL. Covers these works: Bridge v2: rail-berkeley.github.io/bridgedata/ GRIF: rail-berkeley.github.io/grif/ SuSIE: rail-berkeley.github.io/susie/ Q-Transformer: qtransformer.github.io/ PTR: sites.google.com/view/ptr-final/ ICVF: dibyaghosh.com/icvf/ V-PTR: dibyaghosh.com/vptr/ RT-X: robotics-transformer-x.gi...
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 3
Переглядів 1,2 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 3
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 2
Переглядів 1,4 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 2
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 1
Переглядів 3,7 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 1
CS 285: Andrea Zanette: Towards a Statistical Foundation for Reinforcement Learning
Переглядів 1,1 тис.6 місяців тому
CS 285: Andrea Zanette: Towards a Statistical Foundation for Reinforcement Learning
CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications
Переглядів 3,5 тис.6 місяців тому
Guest lecture in CS 285 by Eric Mitchell (Stanford)
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
Переглядів 3,9 тис.6 місяців тому
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
CS 285: Lecture 18, Variational Inference, Part 4
Переглядів 1,5 тис.6 місяців тому
CS 285: Lecture 18, Variational Inference, Part 4
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning (Summary)
Переглядів 1,3 тис.7 місяців тому
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning (Summary)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (Summary Video)
Переглядів 1,4 тис.7 місяців тому
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (Summary Video)
CS 285: Lecture 2, Imitation Learning. Part 3
Переглядів 5 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 3
CS 285: Lecture 2, Imitation Learning. Part 1
Переглядів 8 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 1
CS 285: Lecture 2, Imitation Learning. Part 5
Переглядів 3,2 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 5
CS 285: Lecture 2, Imitation Learning. Part 2
Переглядів 7 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 2
CS 285: Lecture 2, Imitation Learning. Part 4
Переглядів 3,3 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 4
CS 285: Lecture 1, Introduction. Part 3
Переглядів 6 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 3
CS 285: Lecture 1, Introduction. Part 1
Переглядів 19 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 1
CS 285: Lecture 1, Introduction. Part 2
Переглядів 6 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 2
Multi-Stage Cable Routing Through Hierarchical Imitation Learning
Переглядів 1,1 тис.10 місяців тому
Multi-Stage Cable Routing Through Hierarchical Imitation Learning
Data-Driven Reinforcement Learning for Robotic Manipulation
Переглядів 2,4 тис.10 місяців тому
Data-Driven Reinforcement Learning for Robotic Manipulation
Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents
Переглядів 2,4 тис.10 місяців тому
Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents
ViNT: A Foundation Model for Visual Navigation (Summary Video)
Переглядів 2,6 тис.10 місяців тому
ViNT: A Foundation Model for Visual Navigation (Summary Video)
Fascinating !
Is the equation for CAL-QL the same for the paper and this video or did it change
Great talk!
Great lecture.
I have a question about mixture gaussian. It outputs n gassian distributions or add them together with weights? Although I don't think adding with weights will work. And when talking about multimodal, does it mean we can have different ways to get the solution? like the example with tree. Then how come adding degree of freedom will relate to multimodal or different ways to do something
Extremely interesting, the part about being less aggressive when prediction is harder kind of reninds me of some speculative thoughts I had about artificial neurotransmitters aping the locus coruleus norepinephrine system to control error rates of subcircuits to speed up learning, that idea is still underdeveloped though and I was thinking more about optimizing the joint relationships of different routes into a neuron which would have atrocious scaling done naively.
So happy to find this open source here!!!
Wow... that was really a TOP video
What happen at NoMad 2:45 ? It seems it generates a batch of noisy trajectories?
1:00
3:54
Sergey Levine is a gem for this world!
nice intro to RL
i am really grateful for your great lecture.
around 7:00, in this case the probability `pi theta (a != a* | s)` is the probability for the non deterministic policy correct? or is it that the probability is coming from the state distribution and the policy is deterministic? or is the policy non-deterministic and the states are also non deterministic?
to ask about this also in around 12:23, here both policy and states are stochastic right?
Offline RL for language models is indeed a promising direction to explore. It's worth noting that Sergey, an expert in the field, has expressed concerns about the feasibility of online RL with language models. This reminds me how brilliant of the RLHF approach is
good energy, mildly funny, by far the best articulation...can only come from the DPO inventors, Eric Mitchell et, al
Excellent explanation! Thank you for the open-source lectures.
Дякую!
its good its really good
Very interesting, thank you for sharing!
Great presentation! Thank you for sharing
Thanks for posting this!
Incredibly well explained. Thank you! Great examples at the end.
Chapter 1: Introduction to Reinforcement Learning in the Real World [0:00-0:54] - Sergey Lan discusses the importance of reinforcement learning (RL) in AI, differentiating it from generative AI techniques. - Highlights the ability of RL to achieve results beyond human capabilities, using the example of AlphaGo's unique moves. Chapter 2: Advantages of Real World RL and Challenges in Simulation [0:54-2:13] - Emphasizes the power of RL in real-world scenarios over simulation. - Discusses the challenges in building accurate simulators, especially for complex environments involving human interaction. Chapter 3: Reinforcement Learning in Complex Environments [2:13-3:18] - Describes how RL is more effective in complex, variable environments. - Explains the need for advanced learning algorithms to interact with these environments for emerging behaviors. - Argues for RL's potential in optimizing policies specifically for real-world settings. Chapter 4: Progress and Practicality of Real-World Deep RL [3:18-5:52] - Outlines the advancements in deep RL, making it more practical and scalable in the real world. - Details on how sample complexity has significantly improved, allowing for real-world locomotion skills to be learned in minutes. - Discusses leveraging prior data in RL and overcoming challenges previously considered as showstoppers. Chapter 5: Learning Locomotion Skills in Real World with Deep RL [5:52-11:46] - Sergey shares examples of learning locomotion skills in real robots, progressing from simplified models to more complex ones. - Highlights the significant reduction in training time due to improved algorithms and engineering techniques. - Discusses the importance of regularization and taking numerous gradient steps for fast learning. Chapter 6: Manipulation Skills Through Real World Experience [11:46-16:03] - Transition to learning manipulation skills in real-world settings. - Discusses the challenges and solutions for continuous learning without human intervention. - Explains the integration of multiple tasks to enable autonomous learning and resetting. Chapter 7: Leveraging Heterogeneous Prior Data in RL [16:03-21:40] - Focuses on combining data-driven learning with RL to improve efficiency. - Describes how pre-training with diverse data can lead to rapid adaptation in new tasks. - Uses examples to illustrate the efficiency of RL with prior data in both locomotion and manipulation tasks. Chapter 8: Scaling Deep RL for Practical Applications [21:40-26:30] - Sergey discusses the scalability of deep RL in practical, real-world applications. - Provides examples of real-world deployments, including navigation and waste sorting robots. - Emphasizes the importance of continuous improvement and adaptation in diverse environments. Chapter 9: Future Directions and Potential of Real World RL [26:30-38:20] - Concludes with a discussion on future potentials and improvements needed in real world RL. - Suggests areas for further research, including leveraging prior data for exploration and lifelong learning. - Acknowledges the need for more stable, efficient, and reliable RL methods for mainstream adoption.
Can the rest of the lectures be uploaded? Thanks a lot!
This was truly an amazing presentation.
thank you mate
Absolutely brilliant talk!
thank you for sharing
We need to make people focus back to RL again
Very interesting talk, thank you for sharing! I used to work in Reinforcement Learning (RL) over 26 years ago on an inverted pendulum control system that was first simulated, then applied to a real pendulum. I was amazed at how well and how quickly it was able to learn how to balance. It seemed a medium resolution of the state space was best for learning rates. I had Dr. Barto's and Dr. Sutton's book, the first edition, on Reinforcement Learning that was released in 1997. Did some k-means clustering on the input space to focus attention of learning on regions of interest in the state space and also used Fuzzy Logic (a sort of merging with language and math to Fuzzify and De-Fuzzify the state space) with Radial Basis Functions as approximators. With the seemingly sudden take off of RL again now in the last few years, I have now bought their 2nd edition of RL and am slowly going through it, also bought their key reference "The Hedonistic Neuron" by Klopf.
Thanks - I haven't been tracking RL for past several years, so this is a nice high-level update on things, with linked papers for details. Given this progress, are we about to see an explosion in robotics deployment? If so, will it be mostly industrial, or some consumer impact also?
This dude is a beast
so flippin cool
love this video!!
Thank you Prof. Levine!! :)
Thank you so much! I used Genetic Algorithm to initialize the weights of my neural network. It is later trained using DQN and I found DQN unlearned everything trained by GA. Now I am clear about the unlearning part
Thanks a lot. This is great
Thanks a lot professor
Thanks a lot
As a computational fluid dynamics engineer and RL beginner, I am very grateful of this video. A lot of opportunities for RL as an engineering tool.
Thank you for uploading these lectures!
Awesome lectures Sergei, Thank you!
Lillicrap LOL
Meet the Poppypoops
Many thanks Prof. Levine for sharing this valuable knowledge to students/reseachers around the world!
Where is the sound!
Still couldn't understand how total variational divergence is 2, anyone with an explanation?
Also confusing there. Did you figure it out?
@@chenpeng4176 I found through internet total variation divergence comes to be 1, but here he says 'when we sum over all of states, the absolute value is 2.' I think it shall be = number of states
Well I mean you sum over the absolute values so, if the LHS of the - sign is 1 and the RHS is zero you get a one and similarly if the LHS of the - sign is zero and the RHS is one, you again get a one; add them up and you have a 2.
If you think about two distributions P1(s1)=1, P1(s2)=0 and P2(s1)=0, P2(s2)=1, the divergence over all states could be 2.
@@snl92 The max of the absolute difference of two probability distributions equals to 2 (in the case of a disjoint distribution). That's why we have a 1/2 term to regularize the absolute difference in the total variation distance. here's an example from chatGPT. "Let's construct an example with two probability distributions \(P\) and \(Q\) over a discrete space \(X = \{1, 2, 3, 4, 5\}\), where these distributions are completely disjoint, to illustrate the concept over a larger set. Here are the distributions: - \(P\) places all its probability mass on the odd numbers, and \(Q\) places all its probability mass on the even numbers. Specifically: - \(P(1) = 0.4\), \(P(2) = 0\), \(P(3) = 0.4\), \(P(4) = 0\), \(P(5) = 0.2\). - \(Q(1) = 0\), \(Q(2) = 0.5\), \(Q(3) = 0\), \(Q(4) = 0.5\), \(Q(5) = 0\). These distributions are designed such that \(P\) and \(Q\) are completely disjoint: \(P\) has probability mass only on odd \(x\) values, while \(Q\) has probability mass only on even \(x\) values. The sum of the absolute differences \(|P(x) - Q(x)|\) across all \(x \in X\) is: \[ \sum_{x \in X} |P(x) - Q(x)| = |P(1) - Q(1)| + |P(2) - Q(2)| + |P(3) - Q(3)| + |P(4) - Q(4)| + |P(5) - Q(5)| \] Substituting the given values: \[ = |0.4 - 0| + |0 - 0.5| + |0.4 - 0| + |0 - 0.5| + |0.2 - 0| \] \[ = 0.4 + 0.5 + 0.4 + 0.5 + 0.2 = 2.0 \] Thus, the raw sum of the absolute differences across all \(x\) in \(X\) equals 2, showing that each distribution contributes a total difference of 1 across the spaces where the other has zero probability mass. Applying the normalization factor to compute the TV distance: \[ TV(P, Q) = \frac{1}{2} \cdot 2 = 1 \] This example with 5 discrete \(x\) values demonstrates how the total variation distance (before normalization) reaches its theoretical maximum of 2 when the distributions are completely disjoint, and then after normalization, the TV distance is 1, indicating maximal dissimilarity between \(P\) and \(Q\)."
Alphago learns through self play, which means move 37 is in distribution but out of distribution with respect to human play.