June 17, 2026

Natalie Buda Smith on AI in libraries, human at the center, deeper storytelling, and language recreation (AC Ep47)

“There is a lot of information, more that’s accessible than ever before, and you can access it quicker than ever before, process it quicker than ever before.”

–Natalie Buda Smith

Robert Scoble

About Natalie Buda Smith

Natalie Buda Smith is Director of Digital Strategy (Artificial Intelligence) at the U.S. Library of Congress, where she leads initiatives to enhance the accessibility, usability, and enjoyment of the library’s digital products and services.

What you will learn

  • How access to human knowledge has evolved from physical artifacts to digital interfaces
  • The differences and challenges between accessing primary sources versus information intermediated by AI and APIs
  • Why maintaining multiple digitization versions preserves valuable historical context
  • The growing trend toward highly personalized information delivery centered on individual users
  • How the Library of Congress is empowering staff with a suite of AI tools to improve quality and efficiency
  • The shifting boundaries between public and proprietary data in the age of AI and open APIs
  • Transformative examples of AI enabling creative storytelling and community-based knowledge preservation
  • Collaborative projects using AI to surface new connections in historical data from diverse institutional collections

Episode Resources

Transcript

Ross Dawson: Natalie. It’s wonderful to have you on the show.

Natalie Buda Smith: Great to be here.

Ross: So, Information is, you know, what we have captured in writing, through words, through everything—it’s the foundation of civilization. We started off capturing writing, we’re built on that, and that is now something which is the boon to humanity, a gift we have given ourselves.

Now, you know, we used to have places we went into and accessed that information. Now there are obviously many new ways to be able to access that extraordinary boon of human knowledge that we’ve accumulated.

So what is changing today in the ways in which we can all access that wealth of human knowledge that we have captured?

Natalie: Right, there is a lot of information, more that’s accessible than ever before, and you can access it quicker than ever before, process it quicker than ever before. If you think about the dissemination of information, we’re in the United States, we’re celebrating our 250th anniversary, and I recently got to see the draft Declaration of Independence. We have that here at the Library.

It’s two very rich pieces of parchment. There was a pen—there was no such thing as white-out or erasable pen—so they crossed through, they put pieces of paper to patch over it. But to think about in the last 250 years how we’ve come from that parchment for something official to now a digital signature, or you can access data and information so much more quickly than you could even 250 years ago. It seems not that long ago, so the availability of information is more than before, but it’s also about what types of information and when that information was created.

We do have a lot of information that’s been created in the last, you know, 20 or 30 years, and a lot of that’s being used to train LLMs, but there is a history of language and a human history that is very, very rich and has many creative outcomes—very creative thought—that has not really been considered for inclusion in some of the ways that we use AI or the training of AI. So there’s a lot to human history and language that than ever before, but it goes back pretty far.

Ross: It does, it does. So, I guess that’s an interesting question. I suppose, what is the nature or difference between information accessed at source, as it were, versus intermediated by an LLM? Because many people are now predominantly accessing information intermediated by an LLM, as opposed to directly from source.

Natalie: Yeah, there is a big difference. There are even more layers of translation in between what we like to call primary sources and the person using the primary source. There are more layers in between than ever before.

While you can take a rare book or a manuscript and digitize it, when it goes through an API, there are structures around it that you need for the API in order to find it, and then it is presented, translated in a chat interface with some of these AI tools. You have separation there, and it does create distance.

When I started here at the Library, I learned that we often re-digitize artifacts because we may have, let’s say, digitized them in the 1970s and the technologies have improved greatly. So, we’ll go ahead and, for some items, we’ll re-digitize so we have higher quality scans, for example. But the Library keeps every scan. We don’t go and say, “Oh, well, this one’s so much better, let’s get rid of the 1971.”

It’s because the decisions around how to scan that artifact in 1970 gave us information that is relevant and very interesting, so we save multiple versions. For example, we may have digitized it and shown the rigid corners, or we may have shown the printing marks, and we keep all that in the digitization because it tells a story of context that is really important. So when you have these layers of translation between the original object and how you are receiving it, you sometimes lose a lot of the context that you would have had if you looked at the original item. It’s an interesting question for our times to really think about what we are losing when we have those layers of translation.

Ross: You just mentioned before the phrase “human at the center,” and I think information, books, artifacts—these are artifacts of humanity, but then we are the consumers as well.

So, if we think about the human as the center of saying, well, how is it that we best access, consume, integrate, make sense of all of this world of information, then it really becomes about the interface, which relates to some of the things you were saying.

So, what are the ways in which we can build better interfaces where people can access those artifacts of our knowledge?

Natalie: Right. We see a trend happening in that, because the technology is so much more accessible and it’s easier to develop applications and deliver them, we are seeing more and more personalization. We’re able to deliver information for an individual, not just for personas. Back in the 1990s, you had your website, it was the business or institution’s point of view, and then we went to personas—like, we have to shape this so different people can understand or get what they’re looking for. Now we’re really moving towards this individual point of view, or the human at the center, because technology can be malleable and deliver those experiences for the human.

It’s interesting to see that trend and how that’s going to continue. You can have artifacts or information presented to you in the ways that you want them and engage with them in the ways that you want, which is most likely unique to you. It doesn’t have to be the same experience for the majority of people engaging with that artifact. It’ll be interesting to see how more of the individual or the human is at the center of these technologies, and how there are going to be so many different options that it might hopefully lead to more creativity—that’s what I hope—and more discovery, definitely.

Ross: So, what’s the state of the art in your work or what you see around you in that personalization of information delivery, and how is that delivered?

Natalie: Right, so right now, when it comes to using AI at the Library of Congress, we’re putting staff at the center. We’ve had to make some decisions, mainly because we are a source of truth, a source of authenticity.

We are putting staff at the center and really allowing staff—a suite of tools to pick from and use in order to do their work in a way that’s really high quality. That’s the stance we’re taking at the Library, which is similar, but it’s not just an experience delivered to the public; it’s an experience to do your work. I know there are a lot of individual researchers who have a similar approach or use similar tools to do their work. For example, if you’re looking at a collection and it’s in a language you’re not familiar with, you can use some AI services to translate it into another language.

So giving them that in a suite of tools, or if you have historical materials that have handwritten text—something that many people can’t read that well anymore—it takes some effort, so giving them an AI service to translate that into OCR text, or text that you can then use in many different ways. Giving staff that toolbox to do that work is the approach we’re taking right now.

Ross: So, a long time ago, I said, you know, librarians are becoming information professionals, and so in that context, this is a professional service where you are choosing to put the professional in front as the interface to the customer and giving them the tools as well—the ways to complement them in being able to serve the customer.

Natalie:  Our library is unique. We have so many different items—over 179 million physical items. I know some libraries might have more, but that, in combination with all of our digital collections, makes us quite large.

A lot of times we’re processing things for the customers that aren’t there yet—they might come next year or in the next five years. So we’re not only responding to our patrons’ requests for things that they need now, we have to prepare them to be available for years to come. Putting staff at the center there helps with delivering those services both now and in the future.

Ross: So, of course, the Library deals with public data—essentially, that’s its nature. It is a public service, provides public data, and there is a world out there of proprietary data.

I’m interested in your perspectives, as you know, in your role, or more generally in terms of government as well. How is that evolving, that boundary or perhaps sometimes blurring between public and proprietary data?

Natalie: That is a very big question with a big answer that could take different class, but making public data available in ways that people want to use it. We’re finding more and more that people want to engage with digital data collections and digital formats, and then do processing or analysis, or use them in digital forms and in different ways—even for creative arts and such. They want to engage not only with seeing items on the website anymore, but also through an API. They’ll often use their own MCP server, or they use our API, and they’ll look for a set of things based on criteria. That’s more efficient for them, especially if they’re using or creating large data sets, than to go through and click through a website.

More and more are wanting to use our collections through those means versus traditional web browsing, so we’ve shifted our delivery. We have several APIs that we make available for the public to use when they’re doing research or looking for items, and we’ve put a lot of thought into constructing them well so people can easily use them.

So that is one way there.

But yes, there’s public data, and in the United States, each region or country has different copyright protections and laws, international protection laws, so it depends on the country that you’re in. In the United States, we have the US Copyright Office, which sets policies for rights.

It is an interesting shift, especially in the age of AI, because of using data in training, and it’s still unsettled here in the United States. We do see a shift towards a lot of these large foundational models licensing content, which is really interesting. They’ll create licensing deals with large publishers or large media companies so that they can use those in the training of their models. We see that happening more and more, but we also see people starting to protect data more and more. For example, some publishers are finding creative ways—maybe using blockchain or some other encryption—so that it’s not as accessible.

We’re in an interesting time with both the claiming of rights and then thinking of ways technically to implement it, so that there are some technical protections along with the rights. Especially being a federal library—for example, we are in the legislative branch, and so we serve Congress—one of our well-used APIs is for legislative data. You can get the full history, almost 250 years of legislative history. Actually, the first Congress was less than 250 years ago, but we have other types of data. You can get our legislative history through our API, and that is all publicly available. We’re really proud that we do that. We share our entire legislative history—all laws that have been passed, and many that have been introduced—so that’s all accessible. That is public data that a lot of people are interested in. So it’s a lot of conversation around data and protections and availability.

Ross: Yeah, which I think the Library plays a critical role in. So, more generally around this framing of humans plus AI, and that’s where, again, I think in this context of libraries being places where it retains these repositories of knowledge—or notable information, more accurately—and humans being the people who are accessing that in order to augment their own thinking, to build on that, and in turn to be able to feed that back into further things that they are sharing.

One of the things I’m particularly interested in is going beyond accessing information to how you bring it into your own thinking to be more knowledgeable, to improve your own cognition or sense-making or decision-making. Is there anything you’re doing or you’re aware of, or you see the potential for AI to assist in going beyond the delivery of information to how we can think and learn and be and know better?

Natalie: Definitely, definitely. I think it is really transformational, mainly because of the speed and the amount that you can process.

If you were doing research back in the 1990s compared to today, it’s such a different experience, mainly because of the compute. You can have a research question and think, “Okay, what is this trend over the last decade?” and you can basically do that processing within probably minutes if you have the right data, or hours, as long as you have the right data sources. Decades ago, it took much longer. So the amount that you can consider and process is definitely transformational, and hopefully leads to new discoveries and more creative outcomes because of that.

We’re definitely seeing that, and getting these tools—and a lot of these tools, especially beyond the foundational models, people are making very interesting tools on top of those foundational models—so people have the ability to be more creative. If you’re just trying to get your ideas out, you can now do them in a much more engaging way, with deeper storytelling, more engaging storytelling. It’s really interesting, because you’re able to process, or the AI can assist you to do more with less time. You can actually bring in that more engaging storytelling element, where before it may have been such a large effort just to produce something creative. Now you can really put that extra effort in, and it’s really wonderful to see people and the things that they’re making now—it’s quite astonishing.

Ross: So, for example, what things have you seen which stand out to you?

Natalie: A lot. One of my favorite projects was a group that we worked with—they’re actually a tribe in Long Island, and they had basically lost their written language. It was being carried through older tribe members, and it was really down to just oral histories that the older tribe members were trying to teach the rest of the tribe—the language they had used for hundreds and hundreds of years. There was nothing written anymore. They lost all of that, mainly because they were displaced.

We had a group that’s trying to restore that community. They found that we had a Bible—we have a collection of Bibles—and it was created by missionaries in Long Island who wanted to convert this tribe. What they had done is they had translated a Bible from that tribe’s native language into English so that they could communicate. It’s such a really interesting story because of all those factors involved—trying to change, they had to go through change, but then they’re trying to recreate this. It was so rich there. They actually were able to use that Bible to then recreate the language, and then they created a virtual reality installation with the language and with some other recreations of what their land was like at the period of time when the tribe lived there, and then had younger generations go through that to experience it.

To me, that hit everything that you want to see in something that’s creative and community-based, has that longevity, and just a rich creative experience using emerging technology. I love to see communities and individuals do more and more of that, because it just makes our world a richer place.

Ross: Yeah, absolutely, and that’s part of the thing—some information is relevant to everyone, but there are many aspects of it which are foundational to communities.

So this sort of flows on a little bit to a wonderful project I saw that you’re doing with the Smithsonian around surfacing stories. I think this way you’ve raised around this idea of stories is this interface between information and knowledge or experience—it’s very important, very powerful.

And I think there are a lot many actually steps along the way where AI can assist us, and the first one has been in surfacing those. So I’d love to hear about that.

Natalie: Yeah, sure thing. The main overall program is called Revolution Crossroads, and it is a program that we are working on with the Smithsonian. We are using a very large collection from the National Archives and Records Administration, so it’s third, because we’re using their data, but we’ve been working with the Smithsonian for a while. We’re taking collections from each of our institutions. We decided to use our Chronicling America collection—it’s a very, very large collection of American newspapers, and there’s many different languages.

A lot of people use it for genealogy research, but for other things as well. We took that slice of the American Revolutionary War and created a special data set from our Chronicling America newspaper collection. The Smithsonian has contributed data sets from some physical items they have around the American Revolution, and then the National Archives is using their war pension data set, which is quite large and extensive. Those that fought in the American Revolutionary War were eligible to receive a pension from their time fighting the war. It’s a great resource for understanding history and that time period, but for genealogy as well. You can see if your family fought in the Revolutionary War—not only them recorded, but a lot of times that pension was transferred to other generations, so it went further than just that one individual.

So we have these three very different sets of data that we are using to start to find connections. If you have the newspaper from that time and maybe even a person in that newspaper, you find their war pension record and their family history, and then you see the artifacts or physical items that were around during that time from the Smithsonian. It creates this really interesting story. We’re putting a lot of effort into making those data sets publicly available and easy to use and well documented, and that’s taken a lot longer than one would think, because there’s a lot of work in cleaning up data and making it accessible.

We’re doing some work on our own, but we’re also working with other groups—we’re working with some private companies, we’re working with some universities—and we’re helping them think about creative ways to use these data sets and come up with new ideas, and see if they discover anything new from that time period by looking at these data sets, and then thinking of creative ways to present it. So it’s both with us as our institutions, but then we’re doing a lot of external outreach to let people know they’re there and to help them use it and watch what they’re doing with it, and some really interesting outcomes.

Ross: So, AI is originally used in that process from your side, but it sounds like potentially you’re also assisting others—not just access APIs, but helping them use that in interesting ways, and obviously the extensive data processing required.

Natalie: Yeah, so we actually have a space on Hugging Face, where you can see the data and all the documentation around that. For example, we worked with one university in the United States, and their Masters of Computer Science program used the data, and then they built retrieval augmented generation—RAG—architecture for training. It’s kind of late, so I’m trying to remember the last word—RAG architecture—in order to add context to it with additional data sources, and then created a chatbot in order to have people ask questions about the American Revolutionary War and find different citations and primary sources. It’s really interesting to see how these three data sets have different manifestations.

Another university in New York used a data set and they analyzed silhouettes—the black cut-out portraits, the silhouette portraits from people during that time. It was really interesting to see the analysis of all of them together, and what different shapes they are in comparison to each other. It was really cool to see them all laid out together, and visual analysis of those collections—some really creative uses of them.

Ross: So, what excites you for the potential forward from now in applying AI to your institution, to libraries more generally, to the public sector? Where is the biggest potential?

Natalie: To me, it’s really with the amount of work you can do with the same resources or time. Especially at libraries and other institutions like the Smithsonian, a lot of people are here because they really love the work we do. We’re really here for the mission, and so there’s always this tension between “if only I could do more” or “if only I could offer that.”

These tools are allowing us to do new things and more. Right now, it’s a little bit overwhelming, because there’s so much to learn and so much to do, but it will allow us to provide more and provide higher quality services, because you can process more. But it’s within that umbrella of making sure we have the right guardrails, and we do this responsibly, and make sure that it’s authentic and trustworthy.

It’s really exciting to be able to give a patron or fulfill a request in a way that’s delightful. Sometimes a patron might ask, “I’m looking for my grandparents, where they were from.” For example, we had a woman who was asking for a town in Sicily where her grandparents were from, and she was trying to do research on newspapers, and when they came over, and some locations and some context. We were able to help her fill the gaps of the story, and to see someone’s eyes light up because you’re giving them more than they ever expected—that’s just wonderful.

To be able to do that and give people that delight because we have these tools where we can do more and offer more—that’s the best.

Ross: Fantastic. So where can people go to find out more about your work?

Natalie: Our main flagship website is loc.gov, and then specifically about artificial intelligence, you can go to loc.gov/digital-strategy. We have content there around what we’re doing with artificial intelligence. Another space is LinkedIn—I’m always trying to promote the great work that the Library is doing, so LinkedIn with my name, Natalie Buda Smith, and you can see some of our updates there.

Ross: Wonderful. Thank you so much for all of the wonderful work you’re doing, and for sharing with us today.

Natalie: Thank you. Thanks for the discussion. It was great to be able to share that.