RAIL
RAIL
  • 292
  • 2 016 156
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
Jianlan Luo*, Zheyuan Hu*, Charles Xu, You Liang Tan, Jacob Berg, Archit Sharma, Stefan Schaal, Chelsea Finn, Abhishek Gupta, Sergey Levine
Project page: serl-robot.github.io/
Code: github.com/rail-berkeley/serl
Переглядів: 2 182

Відео

FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning
Переглядів 7764 місяці тому
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning Jianlan Luo*, Charles Xu*, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, Sergey Levine Project website: functional-manipulation-benchmark.github.io/
Making Real-World Reinforcement Learning Practical
Переглядів 11 тис.4 місяці тому
Lecture by Sergey Levine about progress on real-world deep RL. Covers these papers: A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning: sites.google.com/berkeley.edu/walk-in-the-park Grow Your Limits: Continuous Improvement with Real-World RL for Robotic Locomotion: sites.google.com/berkeley.edu/aprl Reset-Free Reinforcement Learning via Multi-Task Learnin...
RLIF: Interactive Imitation Learning as Reinforcement Learning
Переглядів 9505 місяців тому
We present a reinforcement learning algorithm that runs under DAgger-like assumptions, which can improve upon suboptimal experts without knowing ground-truth rewards. RLIF: Interactive Imitation Learning as Reinforcement Learning Jianlan Luo*, Perry Dong*, Yuexiang Zhai, Yi Ma, Sergey Levine Project page: rlif-page.github.io/
CS 285: Guest Lecture: Dorsa Sadigh
Переглядів 1,2 тис.5 місяців тому
CS 285: Guest Lecture: Dorsa Sadigh
CS 285: Guest Lecture: Aviral Kumar
Переглядів 1,8 тис.5 місяців тому
CS 285: Guest Lecture: Aviral Kumar
CS 285: Lecture 23, Part 2: Challenges & Open Problems
Переглядів 1,8 тис.6 місяців тому
CS 285: Lecture 23, Part 2: Challenges & Open Problems
CS 285: Lecture 23, Part 1: Challenges & Open Problems
Переглядів 1,4 тис.6 місяців тому
CS 285: Lecture 23, Part 1: Challenges & Open Problems
Large-Scale Data-Driven Robotic Learning
Переглядів 1,9 тис.6 місяців тому
Presentation by Sergey Levine prepared for the "Towards Generalist Robots" workshop at CoRL. Covers these works: Bridge v2: rail-berkeley.github.io/bridgedata/ GRIF: rail-berkeley.github.io/grif/ SuSIE: rail-berkeley.github.io/susie/ Q-Transformer: qtransformer.github.io/ PTR: sites.google.com/view/ptr-final/ ICVF: dibyaghosh.com/icvf/ V-PTR: dibyaghosh.com/vptr/ RT-X: robotics-transformer-x.gi...
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 3
Переглядів 1,2 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 3
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 2
Переглядів 1,4 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 2
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 1
Переглядів 3,7 тис.6 місяців тому
CS 285: Lecture 21, RL with Sequence Models & Language Models, Part 1
CS 285: Andrea Zanette: Towards a Statistical Foundation for Reinforcement Learning
Переглядів 1,1 тис.6 місяців тому
CS 285: Andrea Zanette: Towards a Statistical Foundation for Reinforcement Learning
CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications
Переглядів 3,5 тис.6 місяців тому
Guest lecture in CS 285 by Eric Mitchell (Stanford)
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
Переглядів 3,9 тис.6 місяців тому
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
CS 285: Lecture 18, Variational Inference, Part 4
Переглядів 1,5 тис.6 місяців тому
CS 285: Lecture 18, Variational Inference, Part 4
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning (Summary)
Переглядів 1,3 тис.7 місяців тому
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning (Summary)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (Summary Video)
Переглядів 1,4 тис.7 місяців тому
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration (Summary Video)
CS 285: Lecture 2, Imitation Learning. Part 3
Переглядів 5 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 3
CS 285: Lecture 2, Imitation Learning. Part 1
Переглядів 8 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 1
CS 285: Lecture 2, Imitation Learning. Part 5
Переглядів 3,2 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 5
CS 285: Lecture 2, Imitation Learning. Part 2
Переглядів 7 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 2
CS 285: Lecture 2, Imitation Learning. Part 4
Переглядів 3,3 тис.8 місяців тому
CS 285: Lecture 2, Imitation Learning. Part 4
CS 285: Lecture 1, Introduction. Part 3
Переглядів 6 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 3
CS 285: Lecture 1, Introduction. Part 1
Переглядів 19 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 1
CS 285: Lecture 1, Introduction. Part 2
Переглядів 6 тис.8 місяців тому
CS 285: Lecture 1, Introduction. Part 2
Multi-Stage Cable Routing Through Hierarchical Imitation Learning
Переглядів 1,1 тис.10 місяців тому
Multi-Stage Cable Routing Through Hierarchical Imitation Learning
Data-Driven Reinforcement Learning for Robotic Manipulation
Переглядів 2,4 тис.10 місяців тому
Data-Driven Reinforcement Learning for Robotic Manipulation
Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents
Переглядів 2,4 тис.10 місяців тому
Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents
ViNT: A Foundation Model for Visual Navigation (Summary Video)
Переглядів 2,6 тис.10 місяців тому
ViNT: A Foundation Model for Visual Navigation (Summary Video)

КОМЕНТАРІ

  • @ZinzinsIA
    @ZinzinsIA День тому

    Fascinating !

  • @nileshramgolam2908
    @nileshramgolam2908 4 дні тому

    Is the equation for CAL-QL the same for the paper and this video or did it change

  • @sami9323
    @sami9323 5 днів тому

    Great talk!

  • @kevon217
    @kevon217 14 днів тому

    Great lecture.

  • @muzesu4195
    @muzesu4195 27 днів тому

    I have a question about mixture gaussian. It outputs n gassian distributions or add them together with weights? Although I don't think adding with weights will work. And when talking about multimodal, does it mean we can have different ways to get the solution? like the example with tree. Then how come adding degree of freedom will relate to multimodal or different ways to do something

  • @lemurpotatoes7988
    @lemurpotatoes7988 29 днів тому

    Extremely interesting, the part about being less aggressive when prediction is harder kind of reninds me of some speculative thoughts I had about artificial neurotransmitters aping the locus coruleus norepinephrine system to control error rates of subcircuits to speed up learning, that idea is still underdeveloped though and I was thinking more about optimizing the joint relationships of different routes into a neuron which would have atrocious scaling done naively.

  • @ShanshanZhang-kw5qf
    @ShanshanZhang-kw5qf 29 днів тому

    So happy to find this open source here!!!

  • @texwiller7577
    @texwiller7577 Місяць тому

    Wow... that was really a TOP video

  • @zhenghaopeng6633
    @zhenghaopeng6633 Місяць тому

    What happen at NoMad 2:45 ? It seems it generates a batch of noisy trajectories?

  • @forheuristiclifeksh7836
    @forheuristiclifeksh7836 Місяць тому

    1:00

  • @forheuristiclifeksh7836
    @forheuristiclifeksh7836 Місяць тому

    3:54

  • @BellaSportMotoristici
    @BellaSportMotoristici Місяць тому

    Sergey Levine is a gem for this world!

  • @ArvindDevaraj1
    @ArvindDevaraj1 2 місяці тому

    nice intro to RL

  • @user-xn2wk9oy5j
    @user-xn2wk9oy5j 2 місяці тому

    i am really grateful for your great lecture.

  • @AJ-vx3zk
    @AJ-vx3zk 2 місяці тому

    around 7:00, in this case the probability `pi theta (a != a* | s)` is the probability for the non deterministic policy correct? or is it that the probability is coming from the state distribution and the policy is deterministic? or is the policy non-deterministic and the states are also non deterministic?

    • @AJ-vx3zk
      @AJ-vx3zk 2 місяці тому

      to ask about this also in around 12:23, here both policy and states are stochastic right?

  • @user-pu1vr4he6f
    @user-pu1vr4he6f 2 місяці тому

    Offline RL for language models is indeed a promising direction to explore. It's worth noting that Sergey, an expert in the field, has expressed concerns about the feasibility of online RL with language models. This reminds me how brilliant of the RLHF approach is

  • @user-rx5pp3hh1x
    @user-rx5pp3hh1x 2 місяці тому

    good energy, mildly funny, by far the best articulation...can only come from the DPO inventors, Eric Mitchell et, al

  • @haiyunzhang2002
    @haiyunzhang2002 3 місяці тому

    Excellent explanation! Thank you for the open-source lectures.

  • @krezn
    @krezn 3 місяці тому

    Дякую!

  • @timanb2491
    @timanb2491 3 місяці тому

    its good its really good

  • @sapienspace8814
    @sapienspace8814 3 місяці тому

    Very interesting, thank you for sharing!

  • @mehrdadmoghimi
    @mehrdadmoghimi 4 місяці тому

    Great presentation! Thank you for sharing

  • @SantoshGupta-jn1wn
    @SantoshGupta-jn1wn 4 місяці тому

    Thanks for posting this!

  • @joshuasheppard7433
    @joshuasheppard7433 4 місяці тому

    Incredibly well explained. Thank you! Great examples at the end.

  • @user-to9ub5xv7o
    @user-to9ub5xv7o 4 місяці тому

    Chapter 1: Introduction to Reinforcement Learning in the Real World [0:00-0:54] - Sergey Lan discusses the importance of reinforcement learning (RL) in AI, differentiating it from generative AI techniques. - Highlights the ability of RL to achieve results beyond human capabilities, using the example of AlphaGo's unique moves. Chapter 2: Advantages of Real World RL and Challenges in Simulation [0:54-2:13] - Emphasizes the power of RL in real-world scenarios over simulation. - Discusses the challenges in building accurate simulators, especially for complex environments involving human interaction. Chapter 3: Reinforcement Learning in Complex Environments [2:13-3:18] - Describes how RL is more effective in complex, variable environments. - Explains the need for advanced learning algorithms to interact with these environments for emerging behaviors. - Argues for RL's potential in optimizing policies specifically for real-world settings. Chapter 4: Progress and Practicality of Real-World Deep RL [3:18-5:52] - Outlines the advancements in deep RL, making it more practical and scalable in the real world. - Details on how sample complexity has significantly improved, allowing for real-world locomotion skills to be learned in minutes. - Discusses leveraging prior data in RL and overcoming challenges previously considered as showstoppers. Chapter 5: Learning Locomotion Skills in Real World with Deep RL [5:52-11:46] - Sergey shares examples of learning locomotion skills in real robots, progressing from simplified models to more complex ones. - Highlights the significant reduction in training time due to improved algorithms and engineering techniques. - Discusses the importance of regularization and taking numerous gradient steps for fast learning. Chapter 6: Manipulation Skills Through Real World Experience [11:46-16:03] - Transition to learning manipulation skills in real-world settings. - Discusses the challenges and solutions for continuous learning without human intervention. - Explains the integration of multiple tasks to enable autonomous learning and resetting. Chapter 7: Leveraging Heterogeneous Prior Data in RL [16:03-21:40] - Focuses on combining data-driven learning with RL to improve efficiency. - Describes how pre-training with diverse data can lead to rapid adaptation in new tasks. - Uses examples to illustrate the efficiency of RL with prior data in both locomotion and manipulation tasks. Chapter 8: Scaling Deep RL for Practical Applications [21:40-26:30] - Sergey discusses the scalability of deep RL in practical, real-world applications. - Provides examples of real-world deployments, including navigation and waste sorting robots. - Emphasizes the importance of continuous improvement and adaptation in diverse environments. Chapter 9: Future Directions and Potential of Real World RL [26:30-38:20] - Concludes with a discussion on future potentials and improvements needed in real world RL. - Suggests areas for further research, including leveraging prior data for exploration and lifelong learning. - Acknowledges the need for more stable, efficient, and reliable RL methods for mainstream adoption.

  • @omarrayyann
    @omarrayyann 4 місяці тому

    Can the rest of the lectures be uploaded? Thanks a lot!

  • @godwyllaikins3277
    @godwyllaikins3277 4 місяці тому

    This was truly an amazing presentation.

  • @zabean
    @zabean 4 місяці тому

    thank you mate

  • @NoNTr1v1aL
    @NoNTr1v1aL 4 місяці тому

    Absolutely brilliant talk!

  • @ArtOfTheProblem
    @ArtOfTheProblem 4 місяці тому

    thank you for sharing

  • @chillmathematician3303
    @chillmathematician3303 4 місяці тому

    We need to make people focus back to RL again

  • @sapienspace8814
    @sapienspace8814 4 місяці тому

    Very interesting talk, thank you for sharing! I used to work in Reinforcement Learning (RL) over 26 years ago on an inverted pendulum control system that was first simulated, then applied to a real pendulum. I was amazed at how well and how quickly it was able to learn how to balance. It seemed a medium resolution of the state space was best for learning rates. I had Dr. Barto's and Dr. Sutton's book, the first edition, on Reinforcement Learning that was released in 1997. Did some k-means clustering on the input space to focus attention of learning on regions of interest in the state space and also used Fuzzy Logic (a sort of merging with language and math to Fuzzify and De-Fuzzify the state space) with Radial Basis Functions as approximators. With the seemingly sudden take off of RL again now in the last few years, I have now bought their 2nd edition of RL and am slowly going through it, also bought their key reference "The Hedonistic Neuron" by Klopf.

  • @rfernand2
    @rfernand2 4 місяці тому

    Thanks - I haven't been tracking RL for past several years, so this is a nice high-level update on things, with linked papers for details. Given this progress, are we about to see an explosion in robotics deployment? If so, will it be mostly industrial, or some consumer impact also?

  • @MessiahAtaey
    @MessiahAtaey 4 місяці тому

    This dude is a beast

  • @aryansoriginals
    @aryansoriginals 5 місяців тому

    so flippin cool

  • @aryansoriginals
    @aryansoriginals 5 місяців тому

    love this video!!

  • @pjhae1445
    @pjhae1445 5 місяців тому

    Thank you Prof. Levine!! :)

  • @thawtar682
    @thawtar682 5 місяців тому

    Thank you so much! I used Genetic Algorithm to initialize the weights of my neural network. It is later trained using DQN and I found DQN unlearned everything trained by GA. Now I am clear about the unlearning part

  • @amortalbeing
    @amortalbeing 5 місяців тому

    Thanks a lot. This is great

  • @amortalbeing
    @amortalbeing 5 місяців тому

    Thanks a lot professor

  • @amortalbeing
    @amortalbeing 5 місяців тому

    Thanks a lot

  • @thawtar682
    @thawtar682 6 місяців тому

    As a computational fluid dynamics engineer and RL beginner, I am very grateful of this video. A lot of opportunities for RL as an engineering tool.

  • @dailygrowth7967
    @dailygrowth7967 6 місяців тому

    Thank you for uploading these lectures!

  • @ninatko
    @ninatko 6 місяців тому

    Awesome lectures Sergei, Thank you!

  • @HelloHelloe
    @HelloHelloe 6 місяців тому

    Lillicrap LOL

  • @prof_shixo
    @prof_shixo 6 місяців тому

    Many thanks Prof. Levine for sharing this valuable knowledge to students/reseachers around the world!

  • @prof_shixo
    @prof_shixo 6 місяців тому

    Where is the sound!

  • @snl92
    @snl92 6 місяців тому

    Still couldn't understand how total variational divergence is 2, anyone with an explanation?

    • @chenpeng4176
      @chenpeng4176 6 місяців тому

      Also confusing there. Did you figure it out?

    • @snl92
      @snl92 6 місяців тому

      @@chenpeng4176 I found through internet total variation divergence comes to be 1, but here he says 'when we sum over all of states, the absolute value is 2.' I think it shall be = number of states

    • @karanbania2785
      @karanbania2785 5 місяців тому

      Well I mean you sum over the absolute values so, if the LHS of the - sign is 1 and the RHS is zero you get a one and similarly if the LHS of the - sign is zero and the RHS is one, you again get a one; add them up and you have a 2.

    • @user-zz1th6vi4e
      @user-zz1th6vi4e 3 місяці тому

      If you think about two distributions P1(s1)=1, P1(s2)=0 and P2(s1)=0, P2(s2)=1, the divergence over all states could be 2.

    • @adeebmdislam4593
      @adeebmdislam4593 3 місяці тому

      @@snl92 The max of the absolute difference of two probability distributions equals to 2 (in the case of a disjoint distribution). That's why we have a 1/2 term to regularize the absolute difference in the total variation distance. here's an example from chatGPT. "Let's construct an example with two probability distributions \(P\) and \(Q\) over a discrete space \(X = \{1, 2, 3, 4, 5\}\), where these distributions are completely disjoint, to illustrate the concept over a larger set. Here are the distributions: - \(P\) places all its probability mass on the odd numbers, and \(Q\) places all its probability mass on the even numbers. Specifically: - \(P(1) = 0.4\), \(P(2) = 0\), \(P(3) = 0.4\), \(P(4) = 0\), \(P(5) = 0.2\). - \(Q(1) = 0\), \(Q(2) = 0.5\), \(Q(3) = 0\), \(Q(4) = 0.5\), \(Q(5) = 0\). These distributions are designed such that \(P\) and \(Q\) are completely disjoint: \(P\) has probability mass only on odd \(x\) values, while \(Q\) has probability mass only on even \(x\) values. The sum of the absolute differences \(|P(x) - Q(x)|\) across all \(x \in X\) is: \[ \sum_{x \in X} |P(x) - Q(x)| = |P(1) - Q(1)| + |P(2) - Q(2)| + |P(3) - Q(3)| + |P(4) - Q(4)| + |P(5) - Q(5)| \] Substituting the given values: \[ = |0.4 - 0| + |0 - 0.5| + |0.4 - 0| + |0 - 0.5| + |0.2 - 0| \] \[ = 0.4 + 0.5 + 0.4 + 0.5 + 0.2 = 2.0 \] Thus, the raw sum of the absolute differences across all \(x\) in \(X\) equals 2, showing that each distribution contributes a total difference of 1 across the spaces where the other has zero probability mass. Applying the normalization factor to compute the TV distance: \[ TV(P, Q) = \frac{1}{2} \cdot 2 = 1 \] This example with 5 discrete \(x\) values demonstrates how the total variation distance (before normalization) reaches its theoretical maximum of 2 when the distributions are completely disjoint, and then after normalization, the TV distance is 1, indicating maximal dissimilarity between \(P\) and \(Q\)."

  • @matthewpublikum3114
    @matthewpublikum3114 6 місяців тому

    Alphago learns through self play, which means move 37 is in distribution but out of distribution with respect to human play.