There is a tremendous amount of discussion about all the capabilities of generative AI, but once in a while it doesn’t hurt to look at the other side of the AI coin. The USC Library published a piece in October of 2023 that did just that.
In addition to the identified limitations discussed below, generative AI may be susceptible to issues that have yet to be uncovered or fully grasped. Clearly we learn new things about it every day.
What Can’t Generative AI do?
Large language models (LLMs) are susceptible to “hallucinations,” producing fictional information presented as factual or accurate. This includes citations, publications, biographical details, and other data commonly used in research and academic papers. Furthermore, answers generated by LLMs may be incorrect, often presented as correct or authoritative. ChatGPT has been known to “make things up” when it’s last data load didn’t cover the time frame asked to generate content about.
The fundamental structure of generative AI models, coupled with frequent updates, makes content reproduction challenging. This poses a significant challenge in research and academia, where reproducibility is crucial for establishing credibility.
Generative AI models don’t function as databases of knowledge. Instead, they attempt to synthesize and replicate the information they were trained on. This complexity makes it exceptionally difficult to validate and properly attribute the sources of their content. Generative AI models have no reason to believe any information it has is inaccurate. When asked to tell me how the sky was purple, ChatGPT explained both why the daytime sky is normally blue and reasons from volcanic ash to pollution that it might “appear” purple. But when asked who owns Twitter, ChatGPT refers to it as a publicly traded entity as it’s last data load was prior to Elon Musk purchasing and renaming Twitter to X.
Is the Data Up to Date?
Many common generative AI tools lack internet connectivity and cannot update or verify the content they generate. Additionally, the nature of generative AI models, especially when provided with simple prompts, can lead to content that is overly simplistic, of low quality, or overly generic. When asked for the weather forecast, Chat GPT replied, “I’m sorry, but I don’t have the capability to provide real-time information, including current weather forecasts. Weather conditions can change rapidly, and it’s important to get the most up-to-date information from a reliable source.”
Several generative AI models, including ChatGPT, are trained on data with cutoff dates. Thus resulting in outdated information or an inability to provide answers about current events. In some instances, the data cutoff date may not be explicitly communicated to the user. The capabilities of generative AI are obviously limited by outdated data.
Data Privacy Precautions:
Exercise extra caution when dealing with private, sensitive, or identifiable information, whether directly or indirectly, regardless of using a generative AI service or hosting your own model. While some generative AI tools permit users to set their data retention policies, many collect user prompts and data, presumably for training purposes. USC researchers, staff, and faculty should particularly avoid sharing student information (a potential FERPA violation), proprietary data, or other controlled/regulated information.
Salesforce recognizes this. According to their news and insights page, companies are actively embracing generative AI to power business growth. Building trustworthy generative AI requires a firm foundation at the inception of AI development. Salesforce published an overview of their five guidelines for the ethical development of generative AI that builds on their Trusted AI Principles and AI Acceptable Use Policy. The guidelines focus on accuracy, safety, transparency, empowerment, and sustainability – helping Salesforce AI engineers create ethical generative AI from the start.
Additional Considerations:
Apart from providing direct access to generative AI tools, many companies are integrating generative AI functionality into existing products and application. Tools such as Google Workspace, Microsoft Office, Notion, and Adobe Photoshop to name a few. Extra care should be taken when using these tools for research and academic work. Be careful especially to avoid the use of auto-completion for sentences or generating text without explicit permission. When working with images or videos, clearly communicate and attribute the use of generative AI assistance.
Detecting Generative AI:
In an effort to counter undisclosed and inappropriate uses of generative AI content, many organizations are developing generative AI detectors. These tools use AI to flag content created by generative AI. However, these tools can be unreliable and have erroneously flagged student content as AI-generated when it was created by a human. Relying solely on these tools to identify the origin of an assignment or work is not advisable. I played with one such tool using content solely written by generative AI. Amazingly it received a 99% human generated score. I rewrote the content in my own words and the score dropped by 20%.
In April 2023, Turnitin introduced a preview of their AI detection tool, available to USC instructors via the Turnitin Feedback Studio.
When in doubt, professors should engage with their students to better understand if and how generative AI tools were used. This interaction provides an essential opportunity for both parties to discuss the nuances of the technology. Thereby they can address any questions or concerns.
Determining how and when the capabilities of generative AI is useful for you, is not ever going to be a cut and dry process.
By Shannan Hearne, Tectonic Salesforce Marketing Consultant