Komon Trust

Zuckerberg’s latest 20,000-word interview: Llama3, the “most powerful open source model” worth tens of billions of dollars, and everything behind it

On April 18, Meta launched Llama 3, calling it "the most powerful open source large model to date." The debut of Llama 3 once again affected the competitive landscape of AI large models and detonated the AI circle.

On the same day, an exclusive interview between Meta CEO Zuckerberg and well-known technology podcast host Dwarkesh Patel was also released simultaneously. In this 80-minute interview, it mainly focused on Llama3, artificial general intelligence (AGI), energy issues, AI security issues, The risks and implications of open source are discussed.

Zuckerberg said that AI has become the core of Meta, Meta AI is now the most intelligent artificial intelligence assistant currently available for free, and the upcoming large-scale version of Llama 3 will have more than 400 billion parameters.

In terms of training and development of AI models, Xiao Zha mentioned that the emergence of Llama 3 confirmed the importance of large-scale data and computing resources for AI models. In the future, training large-scale AI models may face challenges such as capital and energy constraints, emphasizing the importance of AI The purpose of emergence is not to replace humans but to give people more powerful tools to complete more challenging tasks. The following are the highlights of the interview:

The performance of the smallest parameter Llama3 8 billion and the largest parameter Llama2 70 billion model of the previous generation are of the same order of magnitude, while the most powerful 405 billion parameter version is still on the way.

The emergence of Llama 3 confirms the importance of large-scale data and computing resources for AI models. AI is transforming from a "question and answer" tool to a broader "reasoning" system, which requires understanding the context of the problem and integrating multiple aspects of knowledge. and use logical reasoning to draw conclusions.

Multimodality is an area of focus for Meta. One modality of particular concern is emotional understanding. If a breakthrough can be made in this area so that artificial intelligence can truly understand and express emotions, then the interaction between humans and machines will become Unprecedented naturalness and depth.

AI will indeed change the way humans work and is expected to significantly improve the work efficiency of programmers. However, the emergence of AI is not an attempt to replace humans. Rather, it hopes to use these tools to give people more powerful abilities so that they can accomplish more that was previously unimaginable. work.

Like the emergence of computers, AI will fundamentally change human life and bring many new applications that were previously impossible. Inference will profoundly change almost all product forms.

When AI development encounters GPU bottlenecks or insufficient funds, it will first encounter energy problems. If humans can solve the energy problem, it is entirely possible to build a larger computing cluster than now.

I think that META AI general assistant products will appear in the future. Every enterprise hopes to have an AI that represents their interests. AI will promote progress in science, health care, and various fields, and will ultimately affect all aspects of products and the economy.

I think the potential risks of over-centralization of AI in the future could be as much as if it were widespread, and if one institution has more powerful AI than everyone else, is that also a bad thing?

I think there are many possibilities for the development of training, and commoditization is really one of them. Commoditization means that with more choices on the market, the cost of training will be greatly reduced and become more accessible to the people.

The issue of existential risks does deserve our in-depth attention. At present, we are more concerned about content risks, that is, models may be used to create violence, fraud, or other behaviors that harm others.

Open source is becoming a new and powerful way to build large models. Although specific products will continue to develop, appear and disappear over time, their contributions to human society are lasting.

Meta may soon be able to train large models on its own chips, but Llama-4 may not be able to do so yet.

The following is the full text of the interview:

Llama 3 top version is still training

Dwarkesh Patel: Mark, welcome to the podcast.

Mark Zuckerberg: Thank you for having me. I'm a big fan of your podcast.

Dwarkesh Patel: Thank you very much for the compliment. Let’s first talk about the products that will also be released when this interview is released. Can you tell me about the latest developments in Meta AI and related models? What are the exciting aspects?

Mark Zuckerberg: I think what most people will pay attention to is the new version of Meta AI. The most important thing we're doing is upgrading the model. We released Llama-3. We provide it to the developer community as an open source, and it will also provide support for Meta AI. There's a lot to talk about about Llama-3, but I think the most important point is that we now consider Meta AI to be the smartest AI assistant people can get for free, and we've also integrated Google and Bing for real-time knowledge.

We're going to make it more prominent in our apps, at the top of Facebook and Messenger, where you can ask questions directly using the search box. We've also added some creative features that I think are really cool and that people will love. I think animation is a great example, you can basically take any image and make it move.

What people are going to find really amazing is how quickly it can now generate high-quality images, actually generating and updating them in real time as you type. You type in your query and it adapts, like "Show me a picture of a cow standing in a field with mountains in the background, eating macadamia nuts and drinking beer," and it updates the image in real time, which is very Cool, I think people will really like it. I feel like this is something most people will relate to in the real world. We're rolling it out, not everywhere, but we're starting with a handful of countries and will expand in the coming weeks and months. I think it's going to be an amazing thing and I'm really excited to get it into people's hands. This is a big step forward for Meta AI.

But if you want to dig a little deeper, Llama-3 is clearly the most technically interesting. We are training three versions: We actually trained three versions, 8 billion, 70 billion and 405 billion dense models, of which the 405 billion model is still being trained, so we are not releasing it today. But I'm very excited about the performance of 8 billion and 70 billion, which are leading by their size. We're going to put out a blog post with all the benchmark results and people can go check it out for themselves, and it's obviously open source so people have a chance to try it out.

We have a roadmap for a new version that will bring multimodality, more multilinguality, and a larger contextual window. Hopefully later this year we can launch a 405 billion parameter version. Judging from the current training situation, it has reached about 85 points on MMLU, and we expect it to have leading results in many benchmark tests. I'm very excited about it all. The 70 billion version is also great. We're releasing it today. It is about 82 points on MMLU, with leading scores in mathematics and reasoning. I thought it would be really cool to get it into people's hands.

Dwarkesh Patel: Interesting, this is the first time I heard about MMLU as a benchmark. This is so impressive.

Mark Zuckerberg: The 8 billion parameter version is almost as powerful as the largest version we released, Llama-2. So the smallest Llama-3 is basically as powerful as the largest Llama-2.

Dwarkesh Patel: Before we get into these models, I want to go back in time. I assume you started purchasing these H100s in 2022, or you can tell me exactly when. The stock price was hit hard at the time. People ask what's going on with all this capital spending. People aren’t buying into the Metaverse. I guess you spend the capex to buy these H100's. How did you know you wanted to buy the H100? How did you know you needed a GPU?

Mark Zuckerberg: I think it was because we were developing Reels. We always hope to have enough computing power to build something in the future that we can’t see yet. We encountered a situation when developing Reels where we needed more GPUs to train the model. This is a major evolution of our service. Instead of just ranking content from people or Pages you follow, we're starting to feature what we call non-relevant content, which is content from people or Pages you don't follow.

The pool of content candidates we might be able to show you has expanded from thousands to millions. It requires a completely different infrastructure. We started working on it but were limited in terms of infrastructure and couldn't catch up with TikTok as fast as we wanted. I basically looked at it like this, I thought: "Hey, we have to make sure we don't get into this situation again. So let's order enough GPUs to do what needs to be done in terms of Reels, content ranking, and feed flow. But let's just Double. "Again, our general principle is that there is always something in the future that we can't see yet.

The road to AGI

Dwarkesh Patel: Did you know that would be AI?

Mark Zuckerberg: We think that's going to be something to do with training large models. At the time I thought it might have something to do with the content. It's just a pattern matching of running a company, there's always another direction to go in, and at the time I was mired in trying to make recommendation systems for Reels and other content work well. This is a huge breakthrough for Instagram and Facebook, being able to show people interesting content from people they don't even follow.

But in hindsight, this decision was very correct. This decision stemmed from our backwardness. It's not like, "Oh, I'm overthinking this." In fact, most of the time we make decisions that look good later on because we messed up before and just don’t want to make the mistake again.

Dwarkesh Patel: This is completely off topic, but I wanted to ask it now. We will return to the topic of AI in a moment. You didn't sell for $1 billion in 2006, but I guess you had a price in mind that you were willing to sell for, right? Did you ever think to yourself, "What do I think Facebook was actually valued at that time, and they The price they gave you is not reasonable"? If they offer $5 trillion, of course you will sell. So how did you weigh this choice at the time?

Mark Zuckerberg: I think some of it is just on a personal level. I don't know if I was savvy enough to do that kind of analysis at the time. Everybody around me was making all kinds of arguments for a billion dollars, like, "We need to generate this much revenue, we need to be this big. This is obviously years away." This was way beyond what we had at the time. scale. I didn't really have the financial expertise to engage in that kind of debate.

Deep down, I believe in what we are doing. I did some analysis of, "What would I be doing if I wasn't doing this? Well, I really like creating things and I like helping people communicate. I like understanding what's going on and interacting between people. So I Thinking, if I sell this company, I'll probably build a similar one, and I kind of like this one, so why bother?" I think a lot of the biggest bets people make are just based on belief and. values. In fact, it is often very difficult to do forward-looking analysis.

Mark Zuckerberg: I don't know what the timeline is. I think these things will gradually progress over time.

Dwarkesh Patel: But ultimately: Llama-10.

Mark Zuckerberg: I think there's a lot to this question. I'm not sure if we're replacing people or more giving people the tools to do more.

Dwarkesh Patel: With Llama-10, will the programmers in this building become 10 times more productive?

Mark Zuckerberg: I hope it's more than 10 times. I don't think there is a single intelligence threshold for humans because people have different skills. I think at some point AI may be able to surpass humans at most things, depending on how powerful the model is.

But I think it's gradual, I don't think AGI is just one thing. You're basically adding different capabilities. Multimodality is a key focus now, initially with photos, images and text, but eventually extending to video. Because we're so focused on the Metaverse, 3D type stuff is also important. One modality that I'm very focused on, and I don't see a lot of other people in the industry focusing on, is emotional understanding. There are so many parts of the human brain that are just dedicated to understanding people, understanding expressions and emotions. I think this is a complete modality in itself, enabling artificial intelligence to truly understand and express emotions, so the interaction between humans and machines will become more natural and deeper than ever before.

So in addition to the big improvements in reasoning and memory, there are a lot of different capabilities that you want to train the model to focus on, and memory is a whole thing in itself. I don't think in the future we'll be primarily cramming stuff into a query context window to ask more complex questions. There will be different memory stores or different custom models that will be more personalized. These are just different abilities. And obviously make them bigger or smaller. We focus on both. If you're running something like Meta AI, it's very server-based. We also hope it works on smart glasses, where there isn't a lot of space. So you want to have something very efficient to achieve this.

Dwarkesh Patel: If you're using intelligence at an industrial scale to do inference worth tens of billions of dollars, or ultimately hundreds of billions of dollars, what are the use cases? Is it a simulation? Is it artificial intelligence in the metaverse? What will we use the data center for?

Mark Zuckerberg: Our bet is that it's going to change basically every product. I think there will be a Meta AI universal assistant product. I think it's going to move from something more like a chatbot where you ask a question and it formulates an answer, to you giving it more complex tasks and then it goes off and completes those tasks. So it requires a lot of reasoning and a lot of computation and other means.

And then I think interacting with other people's other agents is going to be a big part of what we do, whether it's for businesses or creators. One of my big theories about this is that there won't be a single AI that you interact with, every business is going to want an AI that represents their interests. They won’t want to interact with you primarily through an AI that sells a competitor’s product.

I think creators are going to be a big group. There are approximately 200 million creators on our platform. They basically have this model where they want to appeal to their community, but they're constrained by time. Their communities often want to attract them, but they don't know that they are restricted by daylight hours. If you can create something where creators can basically own the AI, train it the way they want, and get their community involved, I think that would be very powerful as well, and there would be a ton of use in all of those things participate.

These are just consumer use cases. My wife and I run our foundation, the Chan Zuckerberg Initiative. We do a lot of work in science, and obviously there's a lot of AI work that's going to advance science, health care, and all these things. So it ends up affecting products and basically every area of the economy.

Dwarkesh Patel: You mentioned that AI can do some multi-step things for you. Is this a larger model? For example, for Llama-4, will there still be a 70 billion parameter version, but you just have to train it on the right data and it will be very powerful? What does the progress look like? ?Is it vertical scaling? Or like you said, same size but different database?

Mark Zuckerberg: I don't know if we know the answer to that question. One thing that I think seems to be a pattern is that you have the Llama model and then you build some kind of other application-specific code around it. Some of this is fine-tuning for use cases, but some is, for example, the logic of how Meta AI should use tools like Google or Bing to bring in real-time knowledge. This is not part of the base Llama model. For the Llama-2, we have something like that, which is more hand-engineered. Part of our goal with Llama-3 is to incorporate more of this into the model itself. For Llama-3, as we start getting into more of these agent-like behaviors, I think some of them will be more hand-engineered. Our goal with Llama-4 will be to incorporate more of this into the model.

At every step, you'll get a sense of what's possible on the horizon. You start tinkering with it, doing some hacks around it. I think it helps you hone your intuition on what you want to try to train in the next version of your model. This makes it more general, because obviously with anything you hand-code, you can unlock some use cases, but it's inherently brittle and non-generic.

Dwarkesh Patel: When you say "incorporate into the model itself," do you mean train it on what the model itself wants? What do you mean by "incorporate into the model itself"?

Mark Zuckerberg: With Llama-2, the tooling was very specific, and Llama-3 is much better at tooling. We don't have to manually code everything to make it use Google and search. It can do this directly. Similarly, the same goes for coding and running code and many similar things. Once you gain this ability, you can get a glimpse of what we can start doing next. We don't necessarily have to wait until Llama-4 comes out to start building these features, so we can start doing some hacks around it. You do a lot of hand coding, which, at least in the interim, will make the product better. This then helps give direction to what we want to build in the next version of the model.

Dwarkesh Patel: Which community tweak in Llama-3 are you most looking forward to? Maybe not the one that is most useful to you, but the one that you enjoy playing the most. They tweaked it in ancient times and you'd be talking to Virgil and stuff like that. What are you interested in?

Mark Zuckerberg: I think the nature of this kind of stuff is that you're going to be surprised. Any specific thing that I think has value, we're probably building. I think you'll get the distilled version. I think you'll get the smaller version. One thing is, I don't think 8 billion is small enough to satisfy a large number of use cases. Over time, I'd love to get a 1-2 billion parameter model, or even a 500 million parameter model, and see what you can do with it.

If with 8 billion parameters we are almost as powerful as the largest Llama-2 model, then with 1 billion parameters you should be able to do some interesting things, and much faster. It's great for classification, or a lot of the fundamental things people do in understanding user query intent, before feeding it to the most powerful models to refine what the prompt should be. I think this may be a gap that the community can help fill. We're also thinking about starting to distill some of this stuff ourselves, but now GPUs are being used to train 405 billion models.

Dwarkesh Patel: You have all these GPUs, I think you said there will be 350,000 by the end of the year.

Mark Zuckerberg: That's the whole series. We built two, I think 22,000 or 24,000 clusters, and that's a single cluster that we use to train large models, obviously in a lot of the things that we do. A lot of our stuff goes into training Reels models, Facebook news feeds, and Instagram feeds. Reasoning is a big thing for us because we serve a large number of people. Given the sheer size of the community we serve, the ratio of inference compute to training we require is probably much higher than most other companies doing these things.

Dwarkesh Patel: One thing that was interesting in the material that they shared with me beforehand was that you were using more data in training than just computationally optimal data for training. Inference is a big problem for you, but also for the community, and it makes sense to put trillions of tokens in it.

Mark Zuckerberg: One of the interesting things, even with the 70 billion parameter model, is that we think it's going to be more saturated. We trained it with about 15 trillion tokens. I think our prediction at the beginning was that it would be more asymptotic, but even at the end it was still learning. We probably could have given it more tokens and it would have gotten better.

At some point, you're running a company and you need to do these meta-reasoning questions. Do I want to spend our GPUs on further training the 70 billion model? Do we want to go ahead and start testing the hypothesis of Llama-4? We need to make that decision, and I think we achieved that in this version of 70 billion A reasonable balance. There will be other 70 billion in the future, the multi-modal one, which will be launched in the next period of time. But it's fascinating that at this point, the architecture can accept so much data.

Energy bottlenecks restrict development

Dwarkesh Patel: That's really interesting. What does this mean for future models? You mentioned that 8 billion for Llama-3 is better than 70 billion for Llama-2.

Mark Zuckerberg: No, no, it's almost as good. I don't want to exaggerate. It's on the same order of magnitude.

Dwarkesh Patel: Does this mean that 70 billion in Llama-4 will be as good as 405 billion in Llama-3? What does the future look like?

Mark Zuckerberg: That's a great question, right? I don't think anyone knows. Planning for an exponential curve is one of the trickiest things in the world. How long will it last? I think we'll probably continue. I think it's worth investing tens of billions or over $100 billion to build the infrastructure and assume that if it continues to grow, you're going to get something really amazing that will create amazing products. I don't think anyone in the industry can really tell you with any certainty that it's definitely going to continue to expand at that rate. Generally speaking, historically, you hit a bottleneck at some point. With so much energy being put into this area now, maybe those bottlenecks will soon be broken. I think this is an interesting question.

Dwarkesh Patel: What would it be like in a world without these bottlenecks? Assuming progress simply continues at this pace, this seems likely. Looking at the broader picture, forget about Llamas...

Mark Zuckerberg: Well, there will be different bottlenecks. Over the past few years, I think there has been a problem with this GPU production. Even companies that have the money to buy GPUs can't necessarily get as many as they want because of all these supply constraints. Now I think that's decreasing. So you're seeing a bunch of companies now looking at investing a lot of money in building these things. I think this will continue for a while. There is a capital issue. At what point is it not worth investing capital?

I actually think you're going to run into energy constraints before we hit this problem. I don't think anyone has built a gigawatt-scale single training cluster. These things you encounter end up moving slower in the world. Obtaining energy permits is a highly regulated government function. You start with software, which is regulated to some extent, and I think it's more regulated than a lot of people in the technology world think. Obviously, if you're starting a small company, maybe you feel this. We interact with different governments and regulators around the world, and we have a lot of rules that we need to follow and make sure we're doing a good job. There is no doubt that energy is highly regulated.

If you're talking about building a big new power plant or a big expansion and then building transmission lines across other private or public lands, that's just a highly regulated thing. You're talking about years of preparation. If we want to build some large facilities, powering them is a very long-term project. I think people will do it, but I don't think it's something like getting to a certain level of AI, raising a bunch of money and putting it in, and then the model will... you do encounter things along the way. to different bottlenecks.

Dwarkesh Patel: Did you mention something that Meta wouldn't be able to afford even if its R&D budget or capex budget is 10 times what it is now? Is there such a thing, maybe an AI-related project, maybe not, even with something like Meta A company like this also doesn't have the resources? Is there something that crosses your mind, but with Meta right now, you can't even issue stock or bonds for it? Is it 10 times bigger than your budget?

Mark Zuckerberg: I think energy is one aspect. I think if we had access to energy, we could probably build larger clusters than we currently have.

Dwarkesh Patel: Is this fundamentally limited by funding at the extreme? If you have $1 trillion...

Mark Zuckerberg: I think it's a matter of time. It depends on how far the exponential curve goes. Many data centers now are around 50 megawatts or 100 megawatts, or a large data center might be 150 megawatts. You take an entire data center, fill it with everything you need to do training, and you build the biggest cluster you can. I think there's a group of companies that are doing something like this.

But when you start building a 300 megawatt, 500 megawatt or 1 gigawatt data center, no one has built a 1 gigawatt data center yet. I think this will happen. It's just a matter of time, but it won't be next year. Some of these things take years to build. Just to illustrate this, I consider a gigawatt data center equivalent to a meaningful nuclear power plant just for training a model.

Dwarkesh Patel: Amazon doesn't do that? They have 950 megawatts.

Mark Zuckerberg: I don't know exactly what they did. You have to ask them.

Dwarkesh Patel: But it doesn't have to be in the same place, right? If distributed training works, it can be distributed.

Mark Zuckerberg: Well, I think that's the big question, how is it going to work. It seems likely that in the future, what we call the training of these large models will actually be closer to inference generating synthetic data that is then fed into the model. I don't know what the ratio will be, but I think synthetic data generation is more like inference than training today. Obviously, if you're doing this to train a model, it's part of a wider training process. So it's an open question, that balance and how it's going to evolve.

Dwarkesh Patel: Is this also possible for Llama-3, maybe starting with Llama-4? Like if you put it out, if someone has a lot of computing power, then they can use the model you put out to make these things Any intelligence. Let's say there's some random country like Kuwait or the UAE that has a lot of computing power and they could actually make something smarter using just Llama-4.

Mark Zuckerberg: I do think there will be such a dynamic, but I also think there is a fundamental limit to model architecture. I think models like the 70 billion we trained with the Llama-3 architecture can get better, and it can continue to evolve. As I said, we feel like if we continue to give it more data or rotate high-value tokens again, it will continue to get better. We've seen a bunch of different companies around the world basically take the Llama-2 70 billion model architecture and then build a new model. But when you take generational improvements to Llama-3 70 billion or Llama-3 405 billion, there aren't any similar open source models today. I think this is a huge step. What people are able to build on, I don't think can grow infinitely from there. There are some optimizations that can be done to it before you get to the next step.

Where will AI develop in the future?

Dwarkesh Patel: Let's zoom in a little bit from the specific model and even the multi-year lead time you need to get energy approval. Looking at the big picture, what will happen to artificial intelligence in the next few decades? Does it feel like another technology, like the metaverse or social, or does it feel like something fundamentally different in the course of human history?

Mark Zuckerberg: I think it's going to be very fundamental. I think it will be more like the creation of the computer itself. You're going to get all these new apps just like you were getting the web or your mobile phone. People are basically rethinking all of these experiences because a lot of things that weren't possible before are now possible. So I think that's going to happen, but I think it's a much lower level of innovation. My sense is that it's going to be more like people going from not having computers to having computers.

On a cosmic scale, this would apparently happen rapidly, within a matter of decades. There are some people who are worried that it could really get out of control and go from somewhat smart to extremely smart overnight. I just think there are all these physical limitations that make this unlikely. I just don't think it's going to happen. I think we'll have time to adjust a little bit. But it's really going to change the way we work and give people all these creative tools to do different things. I think it's going to really empower people to do more of what they want to do.

Dwarkesh Patel: So maybe not overnight, but on a cosmic scale, can we think about these milestones in this way? Humanity evolved, then artificial intelligence came along, and then they went out into the galaxy. Maybe it will take decades, maybe it will take a century, but is this the grand picture of what is happening in history right now?

Mark Zuckerberg: Sorry, in what sense?

Dwarkesh Patel: In that sense, there are other technologies, like computers, and even fire, but the development of artificial intelligence itself is as important as human evolution.

Mark Zuckerberg: I think it's tricky. The history of humanity is that people basically think that certain aspects of human nature are really unique in different ways, and then accept the fact that that's not true, but that human nature is actually still very special. We think the earth is the center of the universe, but it's not, but humans are still awesome and unique, right?

I think another bias that people tend to have is the idea that intelligence is somehow fundamentally connected to life. It's not actually clear whether this is the case. I don't know if we have a clear enough consciousness or definition of life to fully examine this. There's all this science fiction about creating intelligence, and it's starting to take on all these human-like behaviors and stuff like that. The current incarnation of all this stuff feels like it's moving in a direction where intelligence can be quite separate from consciousness and agency and things like that, which I think just makes it a super valuable tool.

Mark Zuckerberg: Obviously, it's hard to predict which direction these things will go over time, which is why I don't think anyone should be dogmatic about how to develop it or what to do. You have to look at it with each release. We're obviously very supportive of open source, but I'm not yet committed to releasing everything we do. I'm basically very inclined to think that open source is good for the community and good for us because we benefit from innovation. However, if at some point, the capabilities of this thing undergo some qualitative changes, and we feel that it is irresponsible to open source it, then we will not open source it. All this is difficult to predict.

Open source risk balance

Dwarkesh Patel: If you saw any specific qualitative change when training Llama-5 or Llama-4 that made you think, "You know what, I'm not sure I want to open source this"?

Mark Zuckerberg: It's a little difficult to answer this question in the abstract, because any product can exhibit negative behaviors, and as long as you can mitigate those behaviors, it's fine. There are bad things about social media, and we work hard to mitigate them. There are downsides to Llama-2, and we spent a lot of time trying to make sure it didn't help people commit violent acts or anything like that. That doesn't mean it's an autonomous or intelligent agent, it just means it learns a lot about the world and it can answer questions that we wouldn't think would be helpful to ask it to answer. I think the issue isn't what behaviors it exhibits, but what we can't mitigate after it exhibits those behaviors.

I think there are so many ways that things can be good or bad that it's hard to list them all beforehand. Look at the situations and the various harms we have to deal with in social media. We've basically identified about 18 or 19 categories of harmful things that people will do, and we've basically built artificial intelligence systems to identify what those things are and try to make sure those things don't happen on our network. Over time, I think you'll be able to break it down into a more detailed breakdown as well. I think that's something that we spent time looking at because we wanted to make sure that we understood that.

Dwarkesh Patel: In my opinion, this is a good idea. I would be disappointed if, in the future, AI systems are not widely deployed and everyone loses access to them. At the same time, I'd like to better understand mitigations. If the mitigation is nudges, the thing about open weighting is that you can remove nudges, which are often surface features on top of those capabilities. If it's like talking to biology researchers on Slack... I think the model is nowhere near that. Now, they're like Google searches. But if I can show them my petri dish and they can explain why my smallpox sample isn't growing and what needs to be changed, how do you mitigate that problem? Because someone can just fine-tune it in, right?

Mark Zuckerberg: It's true. I think most people will choose to just use off-the-shelf models, but there are some nefarious people who may try to exploit these models for bad behavior. On the other hand, one of the reasons why I am so philosophically supportive of open source is that I think If artificial intelligence is over-centralized in the future, its potential risks may be no less than its widespread spread. Many people are thinking: "If we can do this, will the widespread use of these technologies in society be a bad thing?" At the same time, another question worth thinking about is what if one institution has more powerful than everyone else? Artificial Intelligence, is this also a bad thing?

A security analogy comes to mind, there are so many security vulnerabilities in so many different things. If you could go back a year or two, let's say you only had a year or two more knowledge about security vulnerabilities. You can hack into almost any system. This is not artificial intelligence. So it's not entirely fanciful to believe that a very smart AI might be able to identify some vulnerabilities, basically like a human could go back a year or two and break all of these systems.

So how do we as a society deal with this? An important part of this is open source software, which allows when software is improved, it is not limited to one company's product, but can be widely deployed to many different systems , whether it’s something from a bank, hospital or government. As software gets more powerful, it's because more people can see it and more people can knock on it, there are some standards for how these things work. The world can be upgraded together very quickly.

I think in a world where AI is very widely deployed, where it has been progressively hardened over time, all the different systems are going to be constrained in some way. In my opinion, that's fundamentally much healthier than being more concentrated in this situation. So there are risks on all fronts, but I think it's a risk that I don't hear people talking about as much. There is a risk that AI systems will do bad things. But what worries me all night is the idea of an untrustworthy actor possessing super-powerful AI, whether it's a hostile government, an untrustworthy company, or something else. I think that's probably a much bigger risk.

Dwarkesh Patel: Because they have a weapon that no one else has?

Mark Zuckerberg: Or just create a lot of chaos. My gut feeling is that these things end up being very important and valuable for economic, security, and other reasons. If someone you don't trust or an opponent gets something more powerful, then I think this could be a problem. Perhaps the best way to mitigate this is to have good open source AI, make it the standard, and in many ways become the leader. It just ensures that it's a more level and balanced playing field.

Dwarkesh Patel: That seems reasonable to me. If this becomes a reality, that would be a future I prefer. I want to understand mechanistically, how does the fact that there are open source AI systems in the world prevent someone from using their AI system to cause chaos? In the specific example of someone carrying a bioweapon, is that something we would do elsewhere in the world? Doing a bunch of R&D to figure out a vaccine quickly? What happened?

Mark Zuckerberg: If you take the security issue I mentioned as an example, I think someone with weaker AI trying to hack into a system protected by stronger AI is less likely to succeed. As far as software security is concerned.

Dwarkesh Patel: How do we know that everything in the world is like this? What if bioweapons aren't like this?

Mark Zuckerberg: I mean, I don't know that everything in the world is like this. Biological weapons are one of the areas of concern for people who are most worried about these kinds of things, and I think that makes sense. There are some mitigation measures. You can try not to train certain knowledge into the model. There are different ways of doing it, but to some extent, if you get a really bad actor and you don't have other AI to balance them and understand what the threats are, that can be a risk. This is one of the things we need to pay attention to.

Dwarkesh Patel: What do you see when you deploy these systems? Like you're training Llama-4 and it fools you because it thinks you're not paying attention, and then you're like, "Wow, what's going on here?" thing?" That's probably unlikely in a system like Llama-4, but can you imagine any similar situation where you'd really worry about deceptiveness, and billions of copies of this spreading in the wild?

Mark Zuckerberg: I mean, we're seeing a lot of hallucinations these days. More of that. I think it's an interesting question, how do you differentiate between hallucination and deception. There are many risks and things to consider. At least in running our company, I try to balance these long-term theoretical risks with what I actually believe are pretty real risks that exist today. So when you talk about deception, the form I'm most concerned about is people using this to create misinformation and then spreading it through our network or other networks. The way we combat this harmful content is to build AI systems that are more intelligent than adversarial.

This is also part of my theory on this. If you look at the kinds of harm people do or try to do through social networks, some of it is not very confrontational. For example, hate speech isn't super confrontational in the sense that people don't get better at being racist. At this point, I think AI in general is becoming more sophisticated much faster than people are getting at it on these problems. We have problems on both sides. People do bad things, whether they're trying to incite violence or whatever, but we also have a lot of false positives, basically things that we shouldn't be censoring. I think this understandably annoyed a lot of people. So I think over time it would be good to have an AI that is more and more accurate in this regard.

In these cases, I still consider the ability for our AI systems to become more sophisticated at a faster rate than they can. It's an arms race, but I think we're winning it at least for now. This is something I spend a lot of time thinking about.

Yes, whether it's Llama-4 or Llama-6, we need to think about the behavior we observe

Dwarkesh Patel: Part of the reason you made it open source is that there are a lot of other people working on this.

Mark Zuckerberg: So, yes, we want to see what others are observing, what we are observing, and what we can improve. We will then evaluate whether we can open source it. But I think for the foreseeable future, I'm optimistic that we can do that. In the short term, I don't want to ignore the actual bad things that people are trying to do with these models today, even though they're not out there, but they're like pretty serious day-to-day hazards that we're familiar with and running our services. In fact, I think this is something we have to spend a lot of time on.

I find the synthetic data thing really weird, I'm actually interested in why you don't think like the current model, why it makes sense that doing synthetic data over and over again might have asymptotes. If they get smarter and adopt the kind of techniques I mention in papers or blog posts that will be widely used on launch day, it will lead to the right chain of thought. Why doesn't this form a cycle?

Of course, this doesn't happen overnight, but takes months or even years of training. A smarter model might be used, which gets smarter, produces better output, then gets smarter again, and so on. I think this is achievable within the parameters of the model architecture.

To some extent, I'm not sure, I think like today's eight billion parameter models, I don't think you're going to be as good as the state-of-the-art hundreds of billions of parameter models that incorporate new research into the architecture itself. But these models will also be open source, but I think that depends on all the issues we just discussed.

We hope that will be the case. However, at every stage, like when you're developing software, there's a lot you can do with software, but to some extent you're limited by the chip that's running it, so there's always going to be different physical limitations. The size of the model will be limited by the energy you can access and use for inference. So I'm also very optimistic that these things will continue to improve rapidly.

I'm more cautious than some people, I just think it's unlikely that something out of control will happen. I think it makes sense to keep options open. There are so many unknowns facing us. There is a situation where maintaining a balance of power is really important. It's like there's an intellectual explosion and they love to win. Many things seem possible. Just like keeping your options open, it seems reasonable to consider all options.

Dwarkesh Patel: Meta as a big company. You can do both. As for other dangers of open source, I think you make some really valid points about the balance of power, and the harms that we can eliminate through better alignment technology or other means. I wish Meta had some kind of framework. Other labs have frameworks like this where they say "If we see this specific thing, it can't be open source and maybe it can't even be deployed." Just write it down so the company is prepared and people are prepared for it. What to expect and so on.

Mark Zuckerberg: That's a good point about existential risk. Now we're more focused on the types of risks that we're seeing today, more of these content risks, where we don't want models to do things that help people commit violence, commit fraud, or harm people in different ways. It might be more intellectually interesting to talk about existential risk, but I actually think the real harm that requires more energy to mitigate is someone taking a model and doing something that hurts someone else. In practice, with the current model, I guess the next generation model, and even the next generation model after that, these are the more mundane harms that we see today, like people defrauding each other and things like that. I just don't want to underestimate that. I think we have a responsibility to make sure we do a good job on that.

Dwarkesh Patel: Meta is a big company. You can do both.

Mark Zuckerberg: Exactly.

Views on the Metaverse

Dwarkesh Patel: Let's talk about other things. Metaverse. What period in human history would you most like to visit? From 100,000 B.C. to the present day, and you just want to see what it was like back then?

Mark Zuckerberg: Does it have to be the past?

Dwarkesh Patel: Yes it must have been in the past.

Mark Zuckerberg: I'm very interested in American history and classical history. I'm also interested in the history of science. I actually thought it would be interesting to see and try to understand more about how some of the major developments happened. All we have is some limited knowledge about these things. I'm not sure the Metaverse lets you do that, since it's hard to go back in time for things we don't have records of. I'm actually not sure if going back in time would be a big deal. I think it would be cool for a history class or something, but it's probably not the use case I'm most excited about for the Metaverse as a whole.

The main thing is to be able to feel together with people, no matter where you are. I think that would be fatal. A lot of the conversations we have about artificial intelligence are about the physical limitations behind all of this.

I think one of the lessons of technology is that as much as possible, you want to move things out of the realm of physical constraints and into software, because software is much easier to build and evolve. You can make it more democratized because not everyone is going to have a data center, but a lot of people can write code and modify open source code. The goal of the metaverse version is to achieve true digital existence. It's going to be an absolutely huge difference so people don't feel like they have to be together for a lot of things. Now I think there might be something better together. These things are not black and white. It's not like, "Okay, now you don't have to do this anymore." But in general, I think it's great for socializing, connecting with people, work, certain parts of industry, medicine, and a lot of other things will be very powerful.

Dwarkesh Patel: I want to go back to something you said at the beginning of the conversation. You don't sell the company for $1 billion. Regarding the Metaverse, you know you're going to do it, even if the market slams you for it. I'm curious. What's the source of this advantage? You say "Oh, values, I have this intuition", but everyone says that. If you were to say something unique to you, how would you express it? Why are you so convinced of the Metaverse?

Mark Zuckerberg: I think these are different issues. What drives me? We’ve covered a lot of topics. I just really enjoy creating things, and I particularly enjoy creating things around how people communicate and understanding how people express themselves and work. I studied computer science and psychology in college, and I think a lot of other people in the industry studied computer science. So the intersection of those two things has always been important to me.

This is also a very deep driving force. I don’t know how to explain it, but I feel in my heart that if I don’t create something new, I’m doing something wrong. Even as we make the business case for investing $100 billion in artificial intelligence or investing huge amounts of money in the metaverse, we have plans, and I think those plans are very clear, and if our stuff works, it will be a good one invest. But you can't know from the beginning, and again, people have all kinds of arguments, whether it's with advisors or different people.

Dwarkesh Patel: Well, how could you, how do you have enough confidence to do this? You can't be sure from the start. People have all kinds of arguments, discuss it with advisors or different people. How do you feel confident enough to do this?

Mark Zuckerberg: The day I stop trying to create new things, I'm done and I'll go somewhere else and create new things. I fundamentally can't run something, or in my own life, without trying to create new things that I think are interesting. To me, it's not even a question of whether we're going to try to create the next thing. I just can't help it, I don't know.

I feel like this in every aspect of my life. Our family built this ranch on Kauai and I helped design all the buildings. We started raising cattle and I was like "Okay, I want to raise the best cattle in the world, so how do we design this ranch so that we can figure out and build all the things that we need to try to do. I don't know, this it is me.

Dwarkesh Patel: I'm not sure, but I'm actually curious about another thing. At 19 years old, you read a lot of ancient and classical works, including during high school and college. What important lesson did you learn from this? Not just the interesting things you find, but like... by the time you're 19, you're not consuming a lot of tokens. A lot of it is about the classics. Obviously, this matters to some extent.

Mark Zuckerberg: You're not consuming a lot of tokens... That's a good question. That's one of the things that I think is really interesting. Augustus became emperor and he tried to establish peace. There was no real concept of peace at that time. Peace is understood to be the temporary period between when your enemy inevitably attacks you. So you can get a short break. He had this view of changing the economy from something that was mercenary and militaristic to actually being a positive-sum game. This was a very novel idea at the time.

This is a very fundamental thing: the limits of what one could imagine at the time as a way for reason to work. This applies to both the Metaverse and the AI stuff. Many investors and others can't understand why we want to open source. It's like, "I don't get it, it's open source. This must be just a temporary period when you make things proprietary, right?" I think that's a very profound thing in technology that actually creates a lot of winners.

I don’t want to overemphasize the analogy, but I do think that a lot of times, there are patterns in how things are built that people don’t typically understand. They cannot understand how this could be a valuable thing for people, or how it could be a legitimate state of the world. I think there's a lot more to it than people think.

Dwarkesh Patel: That's very interesting. Can I tell you what I was thinking? About what you might get out of it? That might be completely wrong, but I think the point is that some of these people have big roles and how young they are in the Empire. For example, Caesar Augustus, by the time he was 19, was already one of the most important figures in Roman politics. He is leading the fight to form the Second Triumvirate. I wonder if you, at 19, were thinking "I can do this because Caesar Augustus did it."

Mark Zuckerberg: That's an interesting example, both in a lot of history and in American history. One of my favorite quotes is this by Picasso, all children are artists, the challenge is to remain an artist as you grow older. It's easier to have crazy ideas when you're young. There are all these analogies to the Innovator's Dilemma in your life, and in your company or anything you build. You're earlier in the trajectory, so it's easier to pivot and embrace new ideas without destroying other commitments to something different. I think that's a fun part of running a company. How do you stay dynamic?

Open source a $10 billion model

Dwarkesh Patel: Let's go back to investors and open source. $10 billion model, assuming it's completely safe. You've already done these evaluations, and unlike this example, the evaluator can also fine-tune the model, hopefully in future models as well. Will you open source this $10 billion model?

Mark Zuckerberg: As long as it helps us, yes.

Dwarkesh Patel: But will it help? $10 billion in R&D, now it's open source.

Mark Zuckerberg: This is also a question that we need to evaluate over time. We have a long history of open source software, but we do not tend to open source our products. We will not open source the Instagram code.

We took a lot of the underlying infrastructure and made it open source. Probably the biggest one in our history was our Open Compute project, where we took the design of all our servers, network switches, and data centers and made it open source, and it ended up being very helpful. While many people can design servers, the industry has now adopted our design standards, which means the supply chain is essentially built around our designs. So production goes up, it's cheaper for everyone, it saves us billions of dollars, which is awesome.

So there are multiple ways in which open source might help us. One is if people figure out how to run models cheaper. Over time, we're going to spend tens of billions of dollars or more on all of this stuff. So if we can be 10% more efficient, we can save billions or tens of billions of dollars. That in itself is probably worth a lot. Especially if there are other competing models out there, our stuff isn't giving away some crazy advantage.

Dwarkesh Patel: So is your view that training will become commoditized?

Mark Zuckerberg: I think there are many ways this could go, and this is one of them. So "commoditized" means it will become very cheap because there are many options. Another direction in which this might develop is in qualitative improvements. You mentioned fine-tuning. Right now, you're very limited in what you can do with tweaks to other major models. There are some options, but they're usually not available on the largest models. Having the ability to do that, different application-specific things or use-case-specific things, or build them into specific tool chains. I think this will not only lead to more efficient development, but potentially a qualitative difference.

Here's an analogy. I think one of the problems with the mobile ecosystem in general is that you have these two gatekeeping companies, Apple and Google, that tell you what you're allowed to build. There's an economic version, where it's like we build something and then they take a bunch of your money. But there's a qualitative version, which actually upsets me more.

There's been a lot of times where we've launched or wanted to launch some feature and Apple has been like, "No, you can't launch it." That sucks, right, so the question is, have we built a world like this for artificial intelligence? You're going to get a handful of How many companies are running these closed models that will control the API and therefore be able to tell you what you can build?

For us, I can say that it's worth building a model ourselves just to make sure we're not in that position. I don't want any other company telling us what we can build. From an open source perspective, I think a lot of developers don't want those companies telling them what they can build either.

So the question is, what is the ecosystem built around this? What are some interesting new things? To what extent does this improve our product? I know there are many cases where if this ends up being our database or caching system or architecture, we will get valuable contributions from the community that will make our product better. And then the application-specific work that we do is still so different that it doesn't really matter, right?

Maybe the model ends up being more like the product itself, in which case I think it becomes a more complex economic calculation whether to open source or not, because to do so is to a large extent commoditizing yourself. But from what I've seen so far, we don't seem to be at that level yet.

Dwarkesh Patel: Do you expect to make significant revenue from licensing your model to cloud providers? So they have to pay a fee to actually offer the model.

Mark Zuckerberg: We'd love to have that, but I don't know how important it would be. This is basically our license for Llama, which in many ways is a very permissive open source license, except that we have a restriction on the largest companies using it. That's why we set this limit. We're not trying to prevent them from using it. We just want them to come and talk to us if they're going to basically take what we built, resell it and make money from it. If you're a company like Microsoft Azure or Amazon, if you're going to resell that model, then we should have a piece of that pie. So before you do it, come talk to us. That's how things went.

So for Llama-2, we have deals with basically all of these major cloud companies, and Llama-2 is available on all of these clouds as a managed service. I'm assuming that as we release larger and larger models, this will become a bigger thing. It's not the main thing we're doing, but I think it makes sense that if these companies are going to sell our models, we should share the benefits in some way.

Dwarkesh Patel: In terms of other dangers of open source, I think you make some really valid points, about the balance of power, and the harms that we can eliminate through better alignment technology or other means. I wish Meta had some kind of framework. Other labs have frameworks like this where they say, "If we see this specific thing, it can't be open source and maybe it can't even be deployed." Just write it down so the company is prepared and people are prepared for it. What to expect and so on.

Mark Zuckerberg: That's a good point about existential risk. Now we're more focused on the types of risks we're seeing today, more of these content risks. We don't want models to do things that help people commit violence, commit fraud, or harm people in different ways. While it might be more intellectually interesting to talk about existential risk, I actually think the real harm that requires more energy to mitigate is someone taking a model and doing something that harms someone else. In practice, for the current model, I guess the next generation model, and even the next generation model after that, these are the more common harms that we see today, like people defrauding each other and things like that. I just don't want to underestimate that. I think we have a responsibility to make sure we do a good job on that.

Dwarkesh Patel: As far as open source is concerned, I'm curious about whether you think the impact of open source projects such as PyTorch, React, and Open Compute on the world is likely to exceed Meta's impact on social media? I've talked to users of these services and they think this is a possibility, after all, much of the Internet's operation depends on these open source projects.

Mark Zuckerberg: Our consumer products do have a huge user base around the world, covering almost half of the world's population. However, I think open source is becoming a new and powerful way to build. It could be like Bell Labs, where they originally developed the transistor to enable long-distance calls, a goal they actually achieved and brought them considerable profits. But five to 10 years from now, when people look back at the inventions they are most proud of, they may mention other technologies that had a far-reaching impact.

I firmly believe that many of the projects we build, such as Reality Labs, certain AI projects, and some open source projects, will have a lasting and profound impact on the progress of mankind. Although specific products will continue to develop, appear and disappear over time, their contributions to human society are lasting. This is also an exciting part that we as technology practitioners can participate in together.

Self-developed on-chip training model

Dwarkesh Patel: Regarding your Llama model, when will it be trained on your own custom chip?

Mark Zuckerberg: Very quickly, we are working hard to move this forward, but Llama-4 may not be the first model to be trained on a custom chip. The approach we take is to develop a self-developed custom chip to first handle our ranking and recommendation type reasoning tasks, such as Reels, news source ads, etc. Once we are able to offload these tasks onto our own chips, we can use the more expensive Nvidia GPUs for training more complex models.

In the near future, we will hopefully have our own chip that we can use to first train some simpler things and then eventually train these very large models. At the same time, I will say that this project is going very well, we are moving forward in an orderly manner, and we have a long-term road map.

What if Xiao Zha became the CEO of Google+?

Dwarkesh Patel: Last question. This is completely off topic, if you were named CEO of Google+, would you make it successful?

Mark Zuckerberg: Google+? oh. Well, I don't know. I don't know, that's a very difficult counterfactual.

Dwarkesh Patel: Okay, so the real last question is: did anyone in the office say "Carthago delenda est" (Carthage must be destroyed) when Gemini was launched?

Mark Zuckerberg: No, I think we're more moderate now. This is a good question. The problem is that Google+ doesn't have a CEO. It is just a department within the company. You asked before what the scarcest commodity is, but you asked in terms of dollars. I actually think that for most companies of this size, the scarcest thing is focus.

When you're a startup, maybe you're more limited in terms of funding. You only focus on one idea, and you may not have all the resources. At some point, you cross a threshold into the essence of what you do. You're building multiple things, you're creating more value between them, but you're becoming more limited in the amount of energy you can put into them.

There are always situations where something cool happens randomly in an organization and I don’t even know about it. Those are great. But I think generally speaking, the capabilities of an organization are largely limited by what the CEO and management team are able to oversee and manage. This has always been a focus for us. As Ben Horowitz says, we should put the main things first and try to focus on your key priorities.

Dwarkesh Patel: Excellent, thank you very much. Mark, you did a great job.