Familiarize yourself with terminology that is commonly associated with Einstein Discovery.
Actionable Variable
An actionable variable is an explanatory variable that people can control, such as deciding which marketing campaign to use for a particular customer. Contrast these variables with explanatory variables that can’t be controlled, such as a customer’s street address or a person’s age. If a variable is designated as actionable, the model uses prescriptive analytics to suggest actions (improvements) the user can take to improve the predicted outcome.
Actual Outcome
An actual outcome is the real-world value of an observation’s outcome variable after the outcome has occurred. Einstein Discovery calculates model performance by comparing how closely predicted outcomes come to actual outcomes. An actual outcome is sometimes called an observed outcome.
Algorithm
See modeling algorithm.
Analytics Dataset
An Analytics dataset is a collection of related data that is stored in a denormalized, yet highly compressed, form. The data is optimized for analysis and interactive exploration.
Average
In Einstein Discovery, the average represents the statistical mean for a variable.
Bias
If Einstein Discovery detects bias in your data, it means that variables are being treated unequally in your model. Removing bias from your model can produce more ethical and accountable models and, therefore, predictions. See disparate impact.
Binary Classification Use Case
The binary classification use case applies to business outcomes that are binary: categorical (text) fields with only two possible values, such as win-lose, pass-fail, public-private, retain-churn, and so on. These outcomes separate your data into two distinct groups. For analysis purposes, Einstein Discovery converts the two values into Boolean true and false. Einstein Discovery uses logistic regression to analyze binary outcomes. Binary classification is one of the main use cases that Einstein Discovery supports. Compare with multiclass classification.
Cardinality
Cardinality is the number of distinct values in a category. Variables with high cardinality (too many distinct values) can result in complex visualizations that are difficult to read and interpret. Einstein Discovery supports up to 100 categories per variable. You can optionally consolidate the remaining categories (categories with fewer than 25 observations) into a category called Other. Null values are put into a category called Unspecified.
Categorical Variable
A categorical variable is a type of variable that represents qualitative values (categories). A model that represents a binary or multiclass classification use case has a categorical variable as its outcome. See category.
Category
A category is a qualitative value that usually contains categorical (text) data, such as Product Category, Lead Status, and Case Subject. Categories are handy for grouping and filtering your data. Unlike measures, you can’t perform math on categories. In Salesforce Help for Analytics datasets, categories are referred to as dimensions.
Causation
Causation describes a cause-and-effect relationship between things. In Einstein Discovery, causality refers to the degree to which variables influence each other (or not), such as between explanatory variables and an outcome variable. Some variables can have an obvious, direct effect on each other (for example, how price and discount affect the sales margin). Other variables can have a weaker, less obvious effect (for example, how weather can affect on-time delivery). Many variables have no effect on each other: they are independent and mutually exclusive (for example, win-loss records of soccer teams and currency exchange rates). It’s important to remember that you can’t presume a causal relationship between variables based simply on a statistical correlation between them. In fact, correlation provides you with a hint that indicates further investigation into the association between those variables. Only with more exploration can you determine whether a causal link between them really exists and, if so, how significant that effect is
.Coefficient
A coefficient is a numeric value that represents the impact that an explanatory variable (or a pair of explanatory variables) has on the outcome variable. The coefficient quantifies the change in the mean of the outcome variable when there’s a one-unit shift in the explanatory variable, assuming all other variables in the model remain constant.
Comparative Insight
Comparative insights are insights derived from a model. Comparative insights reveal information about the relationships between explanatory variables and the outcome variable in your story. With comparative insights, you isolate factors (categories or buckets) and compare their impact with other factors or with global averages. Einstein Discovery shows waterfall charts to help you visualize these comparisons.
Correlation
A correlation is simply the association—or “co-relationship”—between two or more things. In Einstein Discovery, correlation describes the statistical association between variables, typically between explanatory variables and an outcome variable. The strength of the correlation is quantified as a percentage. The higher the percentage, the stronger the correlation. However, keep in mind that correlation is not causation. Correlation merely describes the strength of association between variables, not whether they causally affect each other.
Count
A count is the number of observations (rows) associated with an analysis. The count can represent all observations in the dataset, or the subset of observations that meet associated filter criteria.DatasetSee Analytics dataset.
Date Variable
A date variable is a type of variable that contains date/time (temporal) data.Dependent VariableSee outcome variable.
Deployment Wizard
The Deployment Wizard is the Einstein Discovery tool used to deploy models into your Salesforce org.
Descriptive Insights
Descriptive insights are insights derived from historical data using descriptive analytics. Descriptive insights show what happened in your data. For example, Einstein Discovery in Reports produces descriptive insights for reports.
Diagnostic Insights
Diagnostic insights are insights derived from a model. Whereas descriptive insights show what happened in your data, diagnostic insights show why it happened. Diagnostic insights drill deeper into correlations to help you understand which variables most significantly impacted the business outcome you’re analyzing. The term why refers to a high statistical correlation, not necessarily a causal relationship.
Disparate Impact
If Einstein Discovery detects disparate impact in your data, it means that the data reflects discriminatory practices toward a particular demographic. For example, your data can reveal gender disparities in starting salaries. Removing disparate impact from your model can produce more accountable and ethical insights and, therefore, predictions that are fair and equitable.
Dominant Values
If Einstein Discovery detects dominant values in a variable, it means that the data is unbalanced. Most values are in the same category, which can limit the value of the analysis.
Drift
Over time, a deployed model’s performance can drift, becoming less accurate in predicting outcomes. Drift can occur due to changing factors in the data or in your business environment. Drift also results from now-obsolete assumptions built into the story on which the model is based. To remedy a model that has drifted, you can refresh it by adjusting story settings, retraining it on newer data, and redeploying it.
If Einstein Discovery detects a duplicate condition in your data, it means that two or more explanatory variables are highly correlated (for example, City and Postal Code). These variables have a duplicate impact on the outcome. Einstein Discovery recommends choosing just one variable to improve results. Consider keeping the most descriptive field (for example, City) to make insights more easily interpretable. This condition is also known as multicollinearity.
Ethical Use
Ethical use reflects the application of AI and machine learning for fair and unbiased purposes. With Einstein Discovery, it’s the practice of producing ethical and accountable stories, insights, and predictions. For an overview, take the Responsible Creation of Artificial Intelligence Trailhead module.
Explanatory Variable
An explanatory variable is a variable that you explore to determine whether, and to what degree, it can influence the outcome variable for your story. Einstein Discovery calculates statistical associations between explanatory variables and the outcome variable. Based on the strength of the association, you can investigate further whether and how that explanatory variable affects the outcome variable. An explanatory variable is sometimes called an input variable, a feature, a predictor variable, or an independent variable.
Feature
See explanatory variable or predictor variable.
Feature Selection
Feature selection involves choosing the optimum set of features (explanatory or predictor variables) in a model. Ideally, a model contains the number of explanatory variables that best explain variations in the outcome variable. A model with too few explanatory variables can be too vague to detect underlying patterns in the data, resulting in an underfitting model. A model with too many explanatory variables can be overly specific and too complex to filter out noise in the data, resulting in an overfitting model. Successful feature selection includes the most influential explanatory variables with no significant lurking variables (important explanatory variables that are missing from the model).
First-Order Analysis
In an insight, a first-order analysis examines how one explanatory variable explains variation in the outcome variable. First-order analysis is sometimes called bivariate analysis.
Generalized Linear Model (GLM)
General Linear Model is a regression-based modeling algorithm that Einstein Discovery uses to build a model.
Goal
A goal specifies the desired outcome for your model. A model’s goal includes its outcome variable plus your preferred direction (minimize or maximize) for the outcome. For example, your goal could be to maximize margin or to minimize costs. Einstein Discovery uses the model goal to orient its analysis and explain the insights it uncovered from the analysis.
Gradient Boosting
Gradient Boosting is a decision tree-based ensemble machine learning algorithm that Einstein Discovery uses to build a model. Also called Gradient Boosting Machine (or GBM). Hint See signal.
Identical Values
If Einstein Discovery detects identical values in your data, it means that all values for a variable belong to the same category. Having identical values increases complexity—but no benefit—to the analysis of your data.
Importance
Importance is the relative influence of a variable on the model’s predicted outcome. In Einstein Discovery, importance indicates how much the model chooses to use a variable when predicting the outcome. The level of importance is quantified as a percentage. The higher the percentage, the greater the impact. Importance is an advanced metric that considers interactions between variables. If two variables are highly correlated and contain similar information, the model chooses the better variable to use. For example, when predicting energy usage the variables of temperature and number of air conditioner hours on are both highly correlated, but only one of the variables receives a high importance score.
Improvement
An improvement is a suggested action, based on prescriptive analytics, that a user can take to improve the likelihood of a desired outcome. Improvements are associated with actionable variables, which are explanatory variables that people can control. Taking a suggested action can improve the predicted outcome. An improvement is analogous to a prescription in prescriptive analytics.
Imputation
Imputation is a statistical technique for replacing missing numeric values with values that are derived from another subset of your data. With imputation enabled, observations with missing values are safely counted during analysis.
Independent Variable
See explanatory variable or predictory variable.
Insight
An insight is a finding in your data. When you create a model, Einstein Discovery analyzes the data in your dataset and generates insights based on its analysis. Insights provide a starting point for you to investigate the relationships among your model’s explanatory variables and its goal.
k-fold Cross-Validation
Model validation process in which Einstein Discovery randomly divides all the observations in the Analytics dataset into four separate partitions of equal size. Next, it completes four test passes (folds) in which three of the partitions serve as the training set and one partition serves as the test set. For each fold, Einstein Discovery compiles model metrics, then averages the metrics for all four folds.
Leakage
Leakage occurs when the data used to train your model includes one or more variables that contain the information that you’re trying to predict. This can result in models that are extremely accurate when, in actuality, they are problematic. To remedy data leakage, remove any variables from your model that are causing the leakage.
Linear Regression
In Einstein Discovery, linear regression is an analytical technique used for the numeric use case.
Logistic Regression
In Einstein Discovery, logistic regression is an analytical technique used for the binary classification use case
Lurking Variable
A lurking variable is an explanatory variable that is missing from your model but that significantly explains variations in the outcome variable.
Mean
A mean is the statistical average: the sum of all items divided by the number of items.MeasureA numeric value that quantifies something. See numeric variable.
Model
A model is the sophisticated, custom equation based on a comprehensive, statistical understanding of past outcomes used to predict future outcomes. A model accepts the values of one or more predictor variables as input and produces a predicted outcome as output, along with top factors and improvements (if requested). Einstein Discovery walks you through the steps to create a model based on the outcome you want to improve (your model’s goal), the data you’ve assembled for that purpose (in the Analytics dataset), and other settings that tell Einstein Discovery how to conduct the analysis and communicate its results.
Modeling Algorithm
A modeling algorithm is what Einstein Discovery uses to create a model. Einstein Discovery uses one of several algorithms: generalized linear model (GLM) is a regression-based algorithm, while gradient boosting machine (GBM) and XGBoost are decision tree-based machine learning algorithms.
Model Manager
The Model Manager is the Einstein Discovery tool used to manage predictions and models you’ve deployed.
Model Performance
Model performance are the metrics used to describe the performance of the predictive model. These metrics (quality indicators, which are sometimes called fit statistics) show how well the model’s predictions fit the training data in the dataset. For definitions of quality indicators shown in ModelPerformance, see Evaluate Model Quality.
Multiclass Classification Use Case
The multiclass classification use case addresses a business outcome that has between 3 and 10 outcome values, such as five possible service plans or eight possible insurance policies. Multiclass classification is one of the main use cases that Einstein Discovery supports. Compare with Binary Classification.
Noise
Noise is any data that doesn’t meaningfully explain variations in your outcome variable. See signal.
Numerical Variable
A numerical variable is a type of variable that represents quantitative values (numbers), such as revenue or price. You can do math on numeric variables, such as calculating the total revenue or the average price. A numeric value always has an associated unit of measure, such as currency, volume, or weight. A model that represents a numeric use case has a numeric outcome variable. In the Analytics dataset documentation, a numeric column is referred to as a measure.
In Einstein Discovery, the numerical use case applies to model outcome variables that are numeric. Predicting a number field is a regression problem with its own set of metrics to measure model quality. Einstein Discovery uses linear regression to analyze numeric outcomes. The numerical use case is one of the main use cases that Einstein Discovery supports.
Observation
An observation represents an instance of the data you want to analyze. An observation is analogous to a row of data in a CRM Analytics dataset, or to a record in a Salesforce object. For example, if your model’s goal is to maximize opportunity wins, then each observation represents a single opportunity.
Outcome
An outcome is the business result you’re trying to analyze or predict. An outcome is typically a key performance indicator (KPI), such as sales margin or opportunity wins.
Outcome Variable
In a model, the outcome variable is the column selected as the single, primary focus for analysis and predictions. The goal of a model is to maximize or minimize its outcome variable. An outcome variable is sometimes referred to as the response, the target variable, or the dependent variable.
Outlier
If Einstein Discovery detects outliers in your data, it means that a variable contains data points that are unusually distant from the average value (more than five times the standard deviation from the mean for that variable). Uncommonly large or small numbers, potentially from data entry errors or rare events, affect averages (means) and standard deviations, which can reduce the accuracy of insights or predictions. Outliers can be selectively excluded from a model.
Overfitting
In predictive analytics, overfitting occurs when a model performs well in predicting outcomes on the training data in the dataset, but less well when predicting outcomes for other data, such as production data. Using too many explanatory variables can result in an overly complex predictive model that captures the noise in your data. To mitigate overfitting, Einstein Discovery uses ridge regression and regularization. See also underfitting.
Performance
For predictive models, performance is a qualitative measure of how accurately a model predicts outcomes. Einstein Discovery provides model metrics that measure model performance in different ways.
Predicted Outcome
A prediction. Einstein Discovery calculates model performance by comparing how closely predicted outcomes come to actual outcomes.PredictionIn Einstein Discovery, a prediction is a derived value (produced by a model), that represents a possible future outcome. You can think of a prediction as the output of a predictive model that is based on the inputs of predictor variables that the model accepts.
Prediction Definition
In Einstein Prediction Service, a parent resource that contains one or more models. If a prediction definition contains multiple models, then each model produces predictions for a different segment of the data.
Prediction Field
A prediction field is a field where Einstein stores prediction scores for a Salesforce object. During deployment, Einstein can create this field automatically (called an automated prediction field), or a custom prediction field can be created later if needed.
Predictive Analytics
Predictive analytics is the practice of analyzing historical and current data, based on AI, machine learning, predictive modeling, and statistical techniques. Einstein Discovery uses predictive analytics to identify patterns and predict probabilistic future outcomes.
Predictor or Predictor Variable
A variable that a model expects as input. A prediction request passes values for each predictor variable that the model requires. Based on the provided input values, the model’s equation produces a prediction as output. Predictors are also known as features and independent variables.
Prescriptive analytics is the practice of suggesting actions to improve predicted outcomes.
Proxy Variable
A proxy variable is an explanatory variable that is highly correlated to another explanatory variable in relation to the outcome variable. When a proxy variable, such as a loan applicant’s street address, is highly correlated to a protected characteristic, such as ethnicity, it can reflect discriminatory practices that compromise your analysis and predictions with unwanted bias. Einstein Discovery helps you identify proxy variables so that you can remove them, and the bias they reflect, from consideration in your models, insights, and predictions.R2R2 measures a regression’s model’s ability to explain variation in the outcome. It represents the proportion of the variance in the outcome variable that is predictable from one or more explanatory variables. In general, the higher the R2, the better the model predicts outcomes. R2 is a commonly used metric for numeric use cases.
Ranked Data
Ranked data is used to distribute data by probability. Also known as cumulative fraction, ranked data is often presented as deciles or quantiles. In Einstein Discovery, gain and lift charts are plotted on an x-axis of percentage of ranked data. For example, a ranked data of 0.1 equates to the top decile, or the 10% of records with the highest scores.
Recommended Updates
When analyzing your data, Einstein Discovery looks for issues, such as outliers or duplicates, that can decrease the value of the analysis. If detected, Einstein Discovery presents you recommended updates to fix these data issues in your model.
Residual
A residual is the mathematical difference between the observed (or actual) value and the predicted value. It’s calculated as residual=observed value-predicted value
. Residuals are also known as errors, and are used to assess model quality and accuracy. A positive residual means the prediction was too low, while a negative residual means the prediction was too high.
Ridge Regression
Ridge regression is a regularization approach that Einstein Discovery uses to mitigate model overfitting by preventing coefficients from getting too large.
Sampling
Technique of randomly selecting a subset of observations to analyze for the purpose of reducing the time needed to analyze the data. The sample should be large enough to be sufficiently representative of the variability in the data.
Score
(noun) A prediction associated with an observation.
(verb) The process of predicting outcomes for a set of observations.
Second-Order Analysis
In an insight, a second-order analysis examines how the combination of two explanatory variables explains variation in the outcome variable. In second-order analysis, the combined impact of both variables together on the outcome is sometimes called the interaction effect. Second-order analysis is sometimes called multivariate analysis.
Segment
A segment is a subset of observations (rows) that meet the criteria specified in the segment filter. See segmentation.
Segmentation
Segmentation involves filtering your data to focus your prediction on a particular group, such as a customer type or region.
Sensitive Variable
A sensitive variable contains data that could potentially be associated with unfair treatment. Some examples are variables associated with race, gender, religion, national origin, sexual orientation, disability, or age. Less obvious examples include proxy variables, such as street address or ZIP code, which can reflect discriminatory practices. Einstein Discovery uses sensitive variables to detect potential bias in your data.
Signal
Signal is an indication of a statistically significant and potentially meaningful pattern in your data. For example, an insight can describe a high correlation between an explanatory variable and the outcome variable. By investigating the relationship further, you can learn whether the correlation helps explain variations in the outcome (possible signal) or not (possible noise). Sometimes referred to as a hint.
Data Insights
Data insights are a collection of descriptive and diagnostic analytics that Einstein Discovery generates based on the data and model settings.
Model Setup Wizard
The model setup wizard is the Einstein Discovery tool used to define your model settings, such as the model goal, data selections, and other preferences. Einstein Discovery uses these model settings to analyze the data, produce insights, and generate predictions.
Strongest Predictor
If Einstein Discovery detects a strongest predictor in your data, it means that a variable explains the most variation in the data. Remove the variable if there’s an obvious mathematical relationship between it and the outcome (for example Cost and Price). Similarly, remove the variable if it known only after the outcome is known (for example, Reason for Churn in a customer churn analysis). Excluding strongest predictor variables can expose more subtle patterns in your data.
Terminal State
Data that is finalized and not expected to change. An example of finalized data is the date on which an order shipped. A record that has reached its terminal state represents an actual outcome (also called observed outcome). Define the conditions under which your model’s outcome variable has attained its terminal state.
Text Variable
See categorical variable, binary outcome.
Threshold
In a binary classification model, the threshold value tells your model how to classify a binary outcome. If the calculated probability is above the threshold value, Einstein classifies the outcome one way (such as True or Positive). If the calculated probability is below the threshold value, Einstein classifies the outcome the other way (such as False or Negative). The default threshold is 0.5, but you can tune this value up or down to accommodate your use case. The threshold is sometimes called the Classification Threshold or Decision Threshold.
Top Predictors
Top predictors are the conditions that most significantly drive the predicted outcome, in decreasing order of magnitude. A condition is a data value associated with a column. In Einstein Discovery, a predictor consists of one or two conditions. See predictor variables.
Training Set
In predictive analytics, the training set is the portion of the data in your dataset that Einstein Discovery uses to train your model to make predictions. See also validation set
Underfitting
In predictive analytics, underfitting occurs when a model performs poorly in predicting outcomes on the training data in the dataset. Underfitting is often a result of an excessively simple model in which there aren’t enough variables for a statistical algorithm to capture the underlying patterns in the data. See also overfitting.
Unstructured Text
Free-form text that varies in content and length. Examples include customer comments, survey feedback, social media postings, text messages, and emails. Contrast with categorical variable.
Validation Se
tIn predictive analytics, the validation set is the portion of the data in your dataset that Einstein Discovery uses to validate the predictions generated by your trained model. See also training set.
Variable
A variable represents a characteristic of the data you’re analyzing. A variable is analogous to a column in a dataset or a field in a Salesforce object. For example, an opportunity has variables—such as the opportunity type, lead source, fiscal year, lead source, expected amount—that describe properties associated with each opportunity. Each variable has one data type (number, text, or date). Einstein Discovery analyzes relationships among two types of variables: outcome variables and explanatory variables. Data scientists sometimes refer to variables as attributes or features.
XGBoost
XGBoost is a decision tree-based, ensemble machine learning algorithm that Einstein Discovery uses to build a model.
Content updated November 2024.