AFA
AFA

Natural Language Processing in AI

E18 | With Xyonix's Deep Dhillon
Updated Jan 29, 2024

Natural Language Processing in AI

AI For All
|
E18
October 12, 2023
On this episode of the AI For All Podcast, Deep Dhillon, Founder and Chief Data Scientist at Xyonix, joins Ryan Chacon and Neil Sahota to discuss natural language processing (NLP) in artificial intelligence. They talk about the evolution of NLP, knowledge graphs, search engines, the impact of ChatGPT on content creation, prompt engineering, mind states, computer vision, multimodality, and what insights businesses can extract using NLP and AI technologies.
About Deep Dhillon
Deep Dhillon has a wealth of expertise on AI, ML, and NLP - all of which he brings to bear as Founder and Chief Data Scientist at Xyonix. Deep helps clients accelerate AI-driven innovation in their products by teaching machines how to read, see, listen, and extract valuable insights from their business data.
Interested in connecting with Deep? Reach out on LinkedIn!
About Xyonix
Xyonix teaches machines how to read your content, see what is in your imagery, watch what is in your videos, and understand what is in your data. With this enhanced ability, your custom-built AI and machine learning models can generate insights and make high-value predictions.
Key Questions and Topics from This Episode:

Transcript:
- [Ryan] Welcome everybody to another episode of the AI For All Podcast. I'm Ryan Chacon with my co-host, Neil Sahota, the AI Advisor to the UN and founder of AI for Good. Neil, how's it going?
- [Neil] I'm doing all right. How's everybody else doing?
- [Ryan] Doing well. And we also have Nikolai here as well.
- [Nikolai] Hello.
- [Ryan] So on today's episode, we're going to be discussing some pretty exciting topics that I know a lot of our audience has probably heard of, may not all be as familiar.
But we're going to be discussing the use of natural language processing to generate insights, talking about how ChatGPT is impacting the world, computer vision, and using AI for good. Joining us today is Deep Dhillon, the Founder and Chief Data Scientist of Xyonix, a company that teaches machines to understand content, so text, images, video, to extract valuable insights from business data. Deep, thanks for being on the podcast.
- [Deep] Thank you so much for having me.
- [Ryan] Yeah, it's great to have you. I'm really looking forward to this chat. So let's just kick it off by having you talk to us about what is natural language processing exactly. If I was, let's say, not the most tech savvy and new to this space, how would you describe it and talk someone through what it is?
- [Deep] At a high level, natural language processing is the art and science of getting machines to understand natural language. That natural language can originate in written form. It can be, originate, people talking into mics via transcription. A lot of different ways. But ultimately, it's how do we get machines to understand what people are saying. And generative aspects of that more recently like ChatGPT and large language models are often around how do we get them to respond and say things back, which results in a lot of different types of systems. You have conversational systems, you have analysis systems. There's parts of natural language processing involved in search and being able to find information. So a pretty broad field.
- [Ryan] And obviously this is something that has evolved in not only in the technology itself but the applications and uses. Can you talk us through the evolution of it as you know it to be from your experience in this space and then how the application of NLPs have grown and changed, and where we are now?
- [Deep] Yeah, sure. If we rewind maybe three decades or so ago, a lot of work was fairly simplistic. So, I don't know how technical you want me to get, but I'll try to, I'll try to just give folks a flavor for it. So there was a lot of activity around, let's say you have a simple sentence, like person eats pie, and then the next sentence was person goes to restaurant. In the early days, there was a lot of simple activities going on where we were simply trying to keep track of words disassociated from their sequential positioning and the resulting meaning that happened between them. And you can do things in that world like keep track of which words are most frequently occurring, which would be stop words, so words like and, the, that kind of stuff. And then you could also do things like figure out which words are more discriminating and potentially meaningful.
So early search engines, for example, starting in the kind of early mid 90s, started doing things like keeping term frequency count matrices, and then that evolved into things like term frequency but with respect to how often they occur in particular documents. And that sort of resulted in early search engines.
Then post search engines, there's been like a long push to try to summarize content, so be able to read a bunch of stuff and summarize it. Now this stuff, we've made massive leaps in the last five or six months, but even before that, the last three or four years, and those seem like really well solved problems at this point, but this was absolutely not the case for decades.
And so after simple term frequency counting, that kind of activity, machine learning started making an entry, I would say, probably started getting a lot of traction in the kind of maybe early 90s where we started using things like support vector machines, like different types of algorithms that are capable of being given a training dataset. So, you have a bunch of training data and you have some kind of objective. So for example, like a classic application would be spam detection.
You've got a bunch of maybe a few thousand examples of proper, just regular email communication between people, and then you have a few thousand examples of cases where it's clearly spam. And now folks were building out models, using classical machine learning techniques to try to automatically detect what's spam and what's not spam.
And so we had quite a bit of progress there. Search engines, of course, became a lot more powerful, leveraging a lot of stuff even beyond the natural language, but the natural language understanding of webpages and what was on them played a huge role. I think jumping to something, then we had a huge break in kind of 2010 to 2013 timeframe where we had deep neural networks. So we've had neural networks for many decades, but we had a huge breakthrough back in around 2010, 2011 where we had deep neural networks that were, that we were able to train up on much larger scales and then we had a rapid series of progressions. So if you think back to when I was talking about words, a lot of the early work, we were, the community was spending a lot of time doing things like stemming words. So dogs and dog get stemmed at the same root. So dealing with plurality, dealing with morphology, a lot of things that linguists had come up with for the even prior decades, and a lot of computer scientists were latching on to that and taking activity. The huge shift happened when we came up with this sort of insight that in hindsight seems obvious, but at the time was absolutely not obvious, which was, so one of the challenges with machine learning in the earlier days, even after we came up with deep neural networks, was how do you come up with a training dataset that lets you have sufficiently large example spaces so that you can get these models to learn well.
The example I always use is the way you teach these machines to work is, there's a couple of ways, but one of the classic ways is through example. So if you imagine a toddler, like a human toddler, they're two and a half years old and they just, they're trying to learn the word furniture. You might point to some chairs, and you might say furniture. You might point to a sofa and say furniture. And then you take them outside, and you show them a wicker chair, and they say, yeah, that's not furniture because now we're outside, like the context has changed. So, we have to build these datasets of thousands and thousands or hundreds of thousands or even millions of examples in order to get things to learn.
And that's a very cumbersome, human constrained process. And so the big breakthrough was when we realized, hey, we can just take unstructured text, meaning think everything on the internet that's ever been written, that's publicly available, and train the models to predict the next word in a sequence of words.
And so what that let us do, and then, and is have training data, but it turns out that in order for you to really successfully predict the next word, the fallout of that in the early days was you learned what actual word meanings were, and then as we advanced from trying to predict next word to predicting next sequences of words, next sentences, then we basically you learn human language, and then not only do you use, do you learn human language, though, it turns out that if you're predicting subsequent sequences, you're learning non human languages like computer programming languages and all of the above. So the models started basically getting a lot more powerful and learning what a given human can't really learn in, very well, unless they're like an abnormally gifted polyglot. But these things are mastering all languages, human or otherwise, you know, in a fairly short order, you know a few weeks time frame with a few million bucks in training. And now we have the output of that, of those rudimentary models, which started coming out a few years ago in the form of large language models. And then we started building layers on top of that, and that's where we got the really big perceived breakthroughs from OpenAI and a lot of others now, Google, Microsoft.
- [Neil] Deep, if you could help our audience, they probably hear a lot of these different phrases, natural language processing, natural language understanding, large language models, knowledge graphs. What's the difference between some of these things and why is that actually important?
- [Deep] So natural language processing, I think of as the historic term. It's the geek term too that refers to just everything, every time you're taking machines, giving them text, and having them try to understand it and take actions on it. Natural language understanding was a term that maybe emerged a while ago that sort of started capturing the advances over those historic bag of words understandings that I was talking about where we were really doing pretty straightforward statistical interpretations where words were divorced from their meaning. So imagine taking a sentence, cutting up every word into like little pieces of, take a sentence, put it on a piece of paper, cut up all the words, throw them in a bag, and shake them around, like now you pull out a word, you don't actually know its association to any of the other words. That was a huge chunk of natural language processing for a while.
So natural language understanding started, folks started using that term when we started building these embedding spaces that would represent in the form of a few hundred point data vector that could capture the meaning of the word king or the word queen or the word dog so that it encapsulated all of the synonyms of those terms but also had even like deeper understandings, like knowing that king referred to male gender and queen a female gender, and that sort of thing.
Let's see, large language models. So this term is really like when we started taking and training models to predict future sequences of words, and then we got them quite large to the point where we're building these billions of parameter plus models. So these neural networks are massive. The number of parameters in them are massive. The number of machines it takes to train them is significant. It's very difficult to run them on the, like prior models, you could easily run on your Mac or something, and you could get stuff to work. Now, we're talking about a lot more machines, a lot more hardware, and in some cases, millions and millions of dollars to train up the large model, like the really large models, like GPT-4 is an incredibly large model that's, I don't know for sure, but word on the street is it's like a trillion plus parameters. You've got a knowledge graph that's got entities in it, like people, places, organizations, and then you've got these relationships between them like company A is a division of company B, or person A works for company C.
And those relationships can be derived in different ways. They might originate through the mining of unstructured text, they might originate through people mining places like Wikipedia and building it and a lot of, like most search engines, for example, like Google or Bing, they'll have entire groups and divisions that just work on the knowledge graph.
And then if you think about like sometimes in search, for example, you might search for an NBA player, and their stats and everything will appear. That's information that's typically just being pulled straight out of the knowledge graph and presented.
- [Nikolai] Why is a Google so much better than other search engines? Does it have something to do with the engineering that you're talking about here with knowledge graphs?
- [Deep] To be fair to the folks at Bing, I would say that with the one exception of the massive difference in training data that Google gets, all the other machinery and pieces are pretty much in place at Microsoft and in Bing. So, it's actually a quite a great search engine from that vantage. But, it comes down to the sheer corner cases and diversity in feedback. So when, if you have, Google captures, I don't know what the percentage is now, but like 90 plus percentage of the market, so they're getting billions and billions of queries of people looking for incredibly obscure things.
And when, if you've got a thousand people looking for an obscure thing, and you know that whatever, 800 of them clicked on link A versus link B and were happy with those results, then that model that is used to do the ranking at that, in that particular context, is just going to get a lot better than a model that has a lot less results. And maybe if you've got, I think last I checked, bing had 6 percent of the market, and so you're just, you just get a lot less click throughs, and so you get a lot less of that human validation of what's good and what's bad. But the models themselves and the algorithms themselves are, they're pretty competitive at this point.
- [Ryan] Deep, let me ask you real quick. I want to transition a little bit into kind of talking about one of the applications around a lot of the technologies we're talking about, new capabilities, ChatGPT, which has been brought up. If we were to talk about it from the perspective of being able to create content utilizing these technologies and tools, how has ChatGPT helped make that more possible? I know this is something that you're very close and working with with your company, but just if you, just at a high level, explain to our audience how this has opened up a new frontier when it comes to content creation.
- [Deep] The simple answer is every college kid and probably high school kid soon can just jump in there and get an essay written on anything. I think that's clear that the quality of writing is at least a solid B or something. And so that's like the brain dead obvious entry point, you know. So, I think a lot of the garbage that we were seeing on the internet with really low quality writing but being done in volume and getting slurped up and the task of a Google is to filter through that and try to get the stuff that made some sense, that stuff's, yeah, that's, a lot of those folks probably just jumped on ChatGPT and started like cranking out low level drivel, and I feel like it gets a bad rap in that sense. But, I've, but I think any of us who work in high tech are probably hitting ChatGPT like 10, 20, 30, 40 times a day at this point, and it quickly gets to, like humans have this sort of amazing capacity to move beyond what's easy and rearrange the sort of definition of what's good, right?
If you think about, the example I often use is like photography and realism. So, you rewind a hundred plus years ago, painters would sit around and render photorealistic paintings and that benchmark of a good artist was all about your ability to render reality. And then the camera comes along, the camera obscura and people start shooting photos and suddenly the ability to render reality wasn't nearly as important as it was when you had all these aristocrats like posing for their photos in their lobbies or whatever.
And so all of a sudden, you have the birth of, like people start asking the question of like what makes a good artist? What's the difference between craft and art? And I think the same thing's happening with writing right now. It's not enough to, like no one's going to make a Hollywood blockbuster that just goes into ChatGPT and uses a fairly simple set of prompts and spits out a script and then goes to the next incarnations of like generative video and spits out some stuff and sticks it in front of everybody because it's just too easy. Everybody can do that, and that's not what Hollywood does. They don't spend nothing and build a film that makes hundreds of millions or billions of dollars. So the bar gets raised, and that's what's happening right now with this content generation. And so you have to go deeper. You have to I think leverage it more as a thinking engine. Like how do I, like what's the right out, like what's the topic I should pursue. Like what really matters in this intellectual space, and you go back and forth and you converse, and you dig in on something that you might be writing about, and you, if you're doing it right, you should write something way better than you would write without it in a lot less time. And I think that's generally what's starting to emerge. But overall, I think that quality bar is going up and will continue to do so as humans become de facto better writers because the bots are helping a lot.
- [Neil] That's the interesting challenge that I think people face is that to do that, then you have to understand some of the parameters of writing well. The AI for Good Summit, we had DJ LJ Rich, actually MC, she had created an AI system that any song could be covered by any band in the style the band plays. But now she's actually using generative AI. She was talking about the importance of prompts. And so when she now creates like new music, she says like she created in two days like 50 different songs. That shows you the power and the scale but look at the prompts that she was sharing, she's getting into things like the melody, what kind of chords to use, the rhythm, all these different things that, I know a lot of people think if I just write a few words, hopefully I'll get something good generated. The best prompts I've seen, they're like 250, 300 words long, trying to capture as many of these parameters as possible.
- [Deep] Yeah and even prompts are just so much more dynamic now. Like the systems we're building at Xyonix, they're not just like one prompt. It's like thousands of prompts. They're highly dynamic. Like we're leveraging, doing stuff like go into a database, pull out a profile of somebody, represent it somehow, represent that in the prompt, take the history, not just, because ChatGPT gives you immediate history like five or 10 conversational back and forth, but your long memory that you have to build into your system yourself for your user profile, what do they care about? What are they interested in? All that stuff ends up manifesting it in the form of a prompt, and that can change like a ton, even in the course of one interaction with one human. And there's also this idea of using different mind states and answering in different ways. Like Marvin Minsky, decades ago, wrote a pretty seminal work in AI about how the human mind actually works, and in essence, we have all these other, according to this theory, other distinct minds, each of which sort of compete to formulate the thought and the next response. Like you might have the mind like your entertainment mode when you're out in a pub with your friends and that's how you answer a question, which is totally different than when you're sitting in a lecture hall as a student in front of people. That's like your formal student mode. Or as somebody who's working, there's like your, we all have these different minds. And so that's stuff you can represent also very easily. So there's just like a lot of activity going on, and the level of output you can get from these models, personally, it's like stuff that we would propose even just a year ago that might, after three or four years, we can get that kind of output in under a day now. A multi million dollar, machine learning, multi year project before, we're getting that output in a few hours now. So, it's like, and it's not like that's good enough, right? Cause then, cause every client we have now, the first thing they're going to do is take your thing and stick it in ChatGPT themselves and say your thing sucks, it's not as good, and you're like, well, and so if you're making all these crazy excuses like well, I don't have a gigantic model, I don't have a trillion parameter plus model in my back pocket that I can run for you. And if you're going about things the way we used to in machine learning world, then you just can't, there's no way you can keep up, so you have to go to the very front and start with the latest, greatest stuff and that's another big shift that's happening right now in machine learning is we went from a world where everybody ran everything on their own machines, like every data scientist pretty much did, to a world where maybe there's a Databricks cluster somewhere that they're running on to, but always still like totally controlled by the company, to one where we're starting to lean on these LLMs the way 20, 15 years ago, we started leaning on other people's cloud services.
And so that shift is also happening because it's just not cost effective for one startup company to build their own and train their own LLM just because their investors are obsessed with them completely owning their own destiny and okay, sure, but are you going to give me that two million bucks to train up. Like no. Okay. Then we're not doing that.
- [Neil] I know we talked a lot about the language side. What about like computer vision and how does this tie to some of the other work that's going on?
- [Deep] The vision models are similar in analogy to the, language is easier to understand in many ways, you know. You can, everyone can picture reading a set of text and like predicting the next word, but you have similar things happening in the vision space.
And if you look at, and the models are being trained to learn the relationships between words and imagery in a way that we haven't really seen before this generation of models and the output is, it's quite stunning. You can take a terse language prompt, and you can go look for imagery in your stock imagery places like Getty and Shutterfly and all of that, and maybe you get the thing and maybe you don't. And if you do, it'll be high quality generally, but it'll have a certain kind of feel to it. But if you want to start doing really wild stuff like co-mingling particular artistic perspectives, like a cubist rendering of an Office scene or something, like nobody was going to go hire an artist to go render something like that. And that's the kind of stuff where these models, we can learn styles, we can transfer styles, and we can now take like these sort of style transfer approaches and apply them to totally different imagery, and you know, we can, and I think one area that gets like a lot of, doesn't get quite as much credit, but it's probably the place that we'll see the first really practical examples is just in filters and in imagery filters and photo filters, like being able to really effectively recolorize some black and white film, like in, like that kind of stuff, being able to like take a picture of a model in front of a waterfall, yank it out, and put a, put the same person in a different context with different clothes and like a pair of glasses. Those are all things that we just, you know, that these, the same, the same kind of core approach because we have such a deeper understanding of imagery that, you know, we can suddenly make possible. And so you're seeing a lot of capabilities coming out like every day. It's crazy to keep up with, but there's a lot happening in the imagery space, but also just in the multimodal space in general.
These are systems that don't just train on text or don't just train on imagery, but they, they train up on text and images and audio all at the same time, and we're going to see huge advances in these multimodal models in the next year even.
- [Nikolai] When you're training all these different modalities, what are like the differences and even maybe some similarities in that process?
- [Deep] Let's take something like audio. It's a very similar thing. You guys might have seen, I think it was like a couple years ago, somebody put out a new song by Nirvana that was all built through this generative approach. And the idea is very similar. You have a sequence of audio samples, right? In this case, you take the whole archive of tons of music and then you maybe fine tune on just Nirvana and you're just training to predict the future sample point. If you think back to a microphone, you talk into the microphone, you have pressure waves that move back, move this film back and forth, and you get an electrical signal on it, you transfer that into two dimensions, and that's the thing you see when you look at a stereo, and you see your audio signal. So that audio signal then gets sampled at whatever, 44 kilohertz or whatever, and now you have discrete numbers in time.
You can think of those just like words in a sentence. You now have these guys, and you have to predict the future sample points, and that becomes the basis because in order to learn how to predict the future sample points across a large collection of audio, the world's supply of public audio, you've suddenly learned how to process audio information. And what's wild about it is that you've processed it in a way that's not like instrument specific or even voice specific or any of that, it's just at this incredibly like raw level that humans can't really do.
So now, you're seeing folks back out and try to do that on instrument, on like first splitting channels out and doing it on an instrument level. And so you'll see like a whole new, there's going to be a whole new generation of home audio stuff. And the analogy is quite similar in the imagery space now. So now you've got, you've got this two dimensional input in the form of an image, and then that two dimensional input might be moving in time in video, for example, and each pixel you're tracking and forecasting on. So same general concepts, you have a gigantic collection of unstructured media assets. Now, you're trying to get these neural networks to learn through this unstructured process, how to forecast, and through that process, they really learn to understand the media on a level that we haven't seen before.
- [Ryan] Deep, one of the last questions I wanted to ask you before we wrap up here is if we tie all of this back to businesses that are listening to this, either owners, employees, you name it, what are the insights that companies are now able to extract from these different technologies that we're talking about? How can they extract these insights for, from the business data with the ability to use text, images, video, all the different stuff we've been talking about today, how do you summarize that up of the benefits and the value of what this is going to enable for businesses out there? Not just generally the public as a whole.
- [Deep] Anywhere there's a pattern that a human mind can look at without a ton of highly specialized expertise and background knowledge and identify that pattern and characterize it and project from it, you can use current machine learning AI techniques to do. A slightly more business-y way of putting it is imagine yourself with an army of B+ high school students. Anything, any, and you could super efficiently, you don't have to pay them anything or near nothing, and you can organize them into like kind of work products of some sort. So, you could stick them on looking at everything in like all the images in your repository or reading everything that you've got, and you can put them on the task of figuring it out. I would say three, even six months ago, you were talking about B plus high school students, but now I think you're talking about, we're in a narrow defined task. We're outperforming even our really bright college graduate student, college students on specified, on specific tasks. Like the machines are just outperforming them. So we've even largely stopped using our humans for training data. And so, now what does that mean for a business? I think one lens you can use to try to think about it is sit down with your products and ask yourself, where are, do I not have efficiency, like where am I losing money or not able to perform as well because there's some inherent inefficiencies in the process. Like stuff's just happening too slow. Like I've got a bunch of people bottlenecked, and they can't quite get stuff out fast enough. That's a good place to look for candidates. The other sort of thing to do is to do like an inventory of your data assets. So like figure out what kind of data you have. Do you have a giant collection of text? Are you, do you have giant collections of images or videos or if it's business data, what is it? It's like, oh, it's all your customer history, it's all of your potential vendors or partners, and that historic data. And then ask yourself do I have some kind of intuition of what I could do with this? If I sat down and could read every single one of these, like all millions of these, like what kinds of things could I do? I'm like most business and product folks have a lot of intuition there. If you're running an insurance company, you know that you've got millions of hours of conversations between somebody making a claim and somebody trying to see if the claims real or bogus. And all that stuff is sitting there. And you know that a lot of insights can come out of there. Like you can start to figure out what characterizes a bogus claim? What characterizes a legit claim? Like, how do they talk? How do they pause? How do they, how many times do they call? So you'll, so you have to trust those intuitions, and then you're going to have to grab the data, look at it, take samples, start diving in, start trying to, start trying to actually model some stuff that, you know, maybe a few of you have an intuition around. That's like a good starting point. And get a prototype together. Once you get a prototype together, start figuring out how to, how you would wrap it up and turn it into a service, make it secure, and integrate it into your UX experience over time. And then it will usually change how you think about your product too.
- [Neil] Just one last question, Deep. How could people learn more about you, your company and the great work that you're doing?
- [Deep] We're Xyonix, x y o n i x. You can come to our website, check it out. We've got, we also have a podcast. It's called Your AI Injection. We talk a lot about stuff like we talked about today. We tend to geek out quite a bit and go long on topics. So that's another way. I'd say those are probably your two best bets. Or just Google us, yeah, x y o n i x.
- [Ryan] Deep, thanks so much. Really appreciate the time. I think our audience is going to get a ton of value out of this. We covered a lot of very interesting and relevant topics. So really appreciate you taking the time to do this. Hopefully we'll be able to talk again in the future.
- [Deep] All right. Thanks a ton, guys.
Special Guest
Deep Dhillon
- Founder and Chief Data Scientist, Xyonix
Hosted By
AFA
AI For All
Subscribe to Our Podcast
YouTube
Apple Podcasts
Google Podcasts
Spotify
Amazon Music
Overcast