Einstein Discovery Dictionary
Familiarize yourself with terminology that is commonly associated with Einstein Discovery. Actionable VariableAn actionable variable is an explanatory variable that people can control, such as deciding which marketing campaign to use for a particular customer. Contrast these variables with explanatory variables that can’t be controlled, such as a customer’s street address or a person’s age. If a variable is designated as actionable, the model uses prescriptive analytics to suggest actions (improvements) the user can take to improve the predicted outcome. Actual OutcomeAn actual outcome is the real-world value of an observation’s outcome variable after the outcome has occurred. Einstein Discovery calculates model performance by comparing how closely predicted outcomes come to actual outcomes. An actual outcome is sometimes called an observed outcome. AlgorithmSee modeling algorithm. Analytics DatasetAn Analytics dataset is a collection of related data that is stored in a denormalized, yet highly compressed, form. The data is optimized for analysis and interactive exploration. AttributeSee variable. AverageIn Einstein Discovery, the average represents the statistical mean for a variable. BiasIf Einstein Discovery detects bias in your data, it means that variables are being treated unequally in your model. Removing bias from your model can produce more ethical and accountable models and, therefore, predictions. See disparate impact. Binary Classification Use CaseThe binary classification use case applies to business outcomes that are binary: categorical (text) fields with only two possible values, such as win-lose, pass-fail, public-private, retain-churn, and so on. These outcomes separate your data into two distinct groups. For analysis purposes, Einstein Discovery converts the two values into Boolean true and false. Einstein Discovery uses logistic regression to analyze binary outcomes. Binary classification is one of the main use cases that Einstein Discovery supports. Compare with multiclass classification. CardinalityCardinality is the number of distinct values in a category. Variables with high cardinality (too many distinct values) can result in complex visualizations that are difficult to read and interpret. Einstein Discovery supports up to 100 categories per variable. You can optionally consolidate the remaining categories (categories with fewer than 25 observations) into a category called Other. Null values are put into a category called Unspecified. Categorical VariableA categorical variable is a type of variable that represents qualitative values (categories). A model that represents a binary or multiclass classification use case has a categorical variable as its outcome. See category. CategoryA category is a qualitative value that usually contains categorical (text) data, such as Product Category, Lead Status, and Case Subject. Categories are handy for grouping and filtering your data. Unlike measures, you can’t perform math on categories. In Salesforce Help for Analytics datasets, categories are referred to as dimensions. CausationCausation describes a cause-and-effect relationship between things. In Einstein Discovery, causality refers to the degree to which variables influence each other (or not), such as between explanatory variables and an outcome variable. Some variables can have an obvious, direct effect on each other (for example, how price and discount affect the sales margin). Other variables can have a weaker, less obvious effect (for example, how weather can affect on-time delivery). Many variables have no effect on each other: they are independent and mutually exclusive (for example, win-loss records of soccer teams and currency exchange rates). It’s important to remember that you can’t presume a causal relationship between variables based simply on a statistical correlation between them. In fact, correlation provides you with a hint that indicates further investigation into the association between those variables. Only with more exploration can you determine whether a causal link between them really exists and, if so, how significant that effect is .CoefficientA coefficient is a numeric value that represents the impact that an explanatory variable (or a pair of explanatory variables) has on the outcome variable. The coefficient quantifies the change in the mean of the outcome variable when there’s a one-unit shift in the explanatory variable, assuming all other variables in the model remain constant. Comparative InsightComparative insights are insights derived from a model. Comparative insights reveal information about the relationships between explanatory variables and the outcome variable in your story. With comparative insights, you isolate factors (categories or buckets) and compare their impact with other factors or with global averages. Einstein Discovery shows waterfall charts to help you visualize these comparisons. CorrelationA correlation is simply the association—or “co-relationship”—between two or more things. In Einstein Discovery, correlation describes the statistical association between variables, typically between explanatory variables and an outcome variable. The strength of the correlation is quantified as a percentage. The higher the percentage, the stronger the correlation. However, keep in mind that correlation is not causation. Correlation merely describes the strength of association between variables, not whether they causally affect each other. CountA count is the number of observations (rows) associated with an analysis. The count can represent all observations in the dataset, or the subset of observations that meet associated filter criteria.DatasetSee Analytics dataset. Date VariableA date variable is a type of variable that contains date/time (temporal) data.Dependent VariableSee outcome variable. Deployment WizardThe Deployment Wizard is the Einstein Discovery tool used to deploy models into your Salesforce org. Descriptive InsightsDescriptive insights are insights derived from historical data using descriptive analytics. Descriptive insights show what happened in your data. For example, Einstein Discovery in Reports produces descriptive insights for reports. Diagnostic InsightsDiagnostic insights are insights derived from a model. Whereas descriptive insights show what happened in your data, diagnostic insights show why it happened. Diagnostic insights drill deeper into correlations to help you understand which variables most significantly impacted the business outcome you’re analyzing. The term why refers to a high statistical correlation, not necessarily a causal relationship. Disparate ImpactIf Einstein Discovery detects disparate impact in your data, it means that the data reflects discriminatory practices toward a particular demographic. For example, your data can reveal gender disparities in starting salaries. Removing disparate impact from your model can produce more accountable and ethical insights and, therefore, predictions that are fair and equitable. Dominant ValuesIf Einstein Discovery detects dominant values in a variable, it means that the data is unbalanced. Most values are in the same category, which can limit the value of the analysis. DriftOver time, a deployed model’s performance can drift, becoming less accurate in predicting outcomes. Drift can occur due to changing factors in the data or in your business environment. Drift also results from now-obsolete assumptions built into the story