Introduction

Learners make many metacognitive choices while studying, and the choices that they make dictate how much they learn (Finley et al., 2009; Thiede et al., 2003). Broadly, learners select strategies during encoding that benefit later recall, including selecting which items to restudy (Kornell & Metcalfe, 2006; Tullis & Benjamin, 2012), allocating study time (Tullis & Benjamin, 2011), and choosing retrieval practice over restudy (Tullis et al., 2018, but see Karpicke, 2009). Learners may be able to choose effective encoding practices because they have privileged access to their own idiosyncratic mental states during encoding and can therefore base choices on this privileged knowledge that others do not have (Lovelace, 1984; Underwood, 1966). While ample research shows that learners broadly make effective study choices about encoding, less research examines how effectively learners choose or create their retrieval environments (but see Finley et al., 2009).

Although less research has examined control of retrieval, creating cues to support one's memory – one way that learners control their future retrieval environment – is nevertheless a crucial aspect of metacognition. People generate memory cues regularly, including naming computer files so that one remembers their contents, writing “to-do” lists so that one remembers to complete important tasks, and creating mnemonics so that one remembers the names of the Great Lakes for an upcoming test. In fact, 73% of people report using the first letter of to-be-remembered items to create a more memorable structure for information (e.g., ROY G. BIV for the colors in the rainbow) and 57% of people report using rhymes to help them remember (e.g., i before e except after c; Harris, 1980). Students of all ages report creating mnemonics to help remember classroom information (Tullis & Maddox, 2020; Van Etten et al., 1997).

These self-generated memory cues support recall more effectively than cues generated by others across a variety of tasks (Bellezza & Poplawsky, 1974; Jamieson & Schimpf, 1980; Kuo & Hooper, 2004; Saber & Johnson, 2008; for review, see Tullis & Finley, 2018). For instance, in a classic laboratory experiment, participants remembered 91% of words when they were prompted with self-generated cues, but only 55% of targets when prompted by cues generated by others (Mäntylä, 1986). Self-generated cues are typically more effective than cues generated by peers (Tullis & Benjamin, 2015b), produced by experts (Bloom & Lamkin, 2006), or randomly selected from an experimenter-curated list (Finley & Benjamin, 2019). The benefits of self-generation persist over long retention intervals of up to 3 weeks after study (e.g., Bloom & Lamkin, 2006; Mäntylä, 1986; Mäntylä & Nilsson, 1988; but see Kibler & Blick, 1972). Further, self-generated cues have been shown to benefit memory for multiple types of materials, including simple words (Mäntylä, 1986; Mäntylä & Nilsson, 1983), foreign language vocabulary (Atkinson & Raugh, 1975), and complex science concepts (Levin & Levin, 1990; Richmond et al., 2011; Tullis & Qiu, in press). Here, we test competing ideas about why learners' self-generated cues can effectively support memory.

Why are self-generated cues effective?

One simple reason that learners’ self-generated cues benefit memory may be that generating a cue, in and of itself, bolsters memory. Decades of research provides evidence for a generation effect in human memory: When learners generate all or part of a stimulus item, they remember that material better than material they only read (Jacoby, 1978; Hirshman & Bjork, 1988; Slamecka & Graf, 1978). Meta-analysis shows that generation benefits memory across different types of tests, to-be-remembered stimuli, learner populations, and retention intervals (Bertsch et al., 2007). Theories of the generation effect suggest that generation requires more cognitive work than reading, and the extra exertion of mental effort boosts memory (e.g., McFarland Jr. et al., 1980). As applied to learner-generated cues, learners may process to-be-remembered information more deeply when generating cues than when studying others' cues. Learners may make semantic, acoustic, or visual connections between the target information and their own knowledge during cue generation (Fisher & Craik, 1980). Some empirical data hint that generation contributes to the mnemonic benefits of cue generation: Learners spend more time generating their own memory cues than reading cues generated by others (Tullis & Benjamin, 2015a).

Beyond the well-established mnemonic effects of generation, self-generated cues may also confer a metamnemonic benefit. According to this cue-selection hypothesis, self-generated cues may be more effective at supporting memory than cues generated by others in part because learners can choose cues that are tailored to their specific memory needs (Tullis & Benjamin, 2015b). Learners have special access to their own idiosyncratic cognitive states and prior experiences (Lovelace, 1984; Tullis & Fraundorf, 2017; Underwood, 1966), and this private knowledge about their own mental states may allow them to select uniquely effective cues (Kuo & Hooper, 2004; Symons & Johnson, 1997; Tullis & Finley, 2018). Prior research has not been able to untangle the differing contributions of generation and selection because participants only output a single cue for each target; in contrast, the current research requires participants to output multiple potential cues and select which self-generated cue would be most effective.

In the experiments presented here, we examined how – and how effectively – learners generate and select cues for themselves. Our primary interest was in testing three critical predictions of the cue-selection hypothesis. First, the cue-selection hypothesis implies that learner-selected cues should support memory better than unchosen cues, even when controlling for generation effects. We tested this prediction by comparing learners’ memory when given the cues they generated and selected versus memory when given cues they generated but did not select. Honoring learners’ choices from their own self-generated cues should yield better recall than dishonoring those choices. Second, the cue-selection hypothesis suggests that learners consider multiple self-generated cues before selecting an especially effective one. This hypothesis implies that learners choose effective cues rather than the first cues that come to mind. More specifically, learners should benefit from selecting their own cues even when we control for output order of those cues.

Finally, for cue selection (and privileged access to one’s own idiosyncratic mental state) to account for the efficacy of self-generated cues, learners should be particularly adept at selecting cues for their own memory as compared to others’. We tested this by comparing learners’ selections of cues for their own memory to observers’ abilities to select among the same cues. We tested these three critical predictions about cue selection by requiring cue generators to output multiple cues per target (in contrast to all prior research (Tullis & Benjamin, 2015a, 2015b; Tullis & Fraundorf, 2017), which elicited only a single cue per target). To preview, our data support each of these predictions and reliably show that cue selection contributes to the effectiveness of learner-generated cues.

Cue characteristics

While our primary interest was in whether cue selection contributes to the benefit of learner-generated cues, we further examine how learners are able to select better cues for themselves than others. The differential effectiveness of self-selected and other-selected cues may arise because learners may value different cue characteristics when choosing cues for different people.

Past work (Tullis & Benjamin, 2015a, 2015b; Tullis & Fraundorf, 2017) has identified three characteristics of effective cues that generators may value when selecting effective cues: cue-to-target associative strength, idiosyncrasy, and distinctiveness. First, cues with greater cue-to-target associative strength more effectively bring the target to mind than cues with lesser cue-to-target associative strength. Choosing cues that strongly evoke the target items may help learners easily decode their cue during retrieval (Dunlosky et al., 2005).

Second, distinctive cues may enable stronger recall than common cues because they may have fewer extra-list associations (Einstein & McDaniel, 1990) and can limit the range of potential targets. If a cue is distinct (i.e., it is not associated to many potential targets), it narrows the search field during retrieval and improves recall (Hunt & Smith, 1996). For example, beatles may serve as a good cue for the target band for me because beatles does not point to many potential targets. In contrast, school may be a poor cue for band because school points to many possible target words. Distinctiveness may be the single most important attribute in determining whether a cue effectively enables target recall or not (Nairne, 2002). Finally, when generating and selecting their own cues, learners may be able to utilize idiosyncratic cues that rely upon their own episodic experiences. Selecting cues that tie into one's own idiosyncratic past experiences may be mnemonically beneficial, as in the self-reference effect (e.g., Symons & Johnson, 1997), because cues that tie into one's own personal experiences engender organized and elaborate processing.

The cue-selection hypothesis posits that learners can select effective cues for themselves because they appropriately value these cue characteristics. More specifically, when choosing cues for themselves, learners may not place as much value on normative relations and commonality of cues because they can base their cues in their own experiences and idiosyncratic associations. However, when choosing cues for others, learners may select common cues for others that have strong normative associative relations to the targets to ensure some association exists for others. We test these ideas across the experiments by comparing characteristics of cues that learners choose for themselves with those that they choose for others.

In sum, across the experiments reported here, learners generated multiple mnemonic cues to help remember a series of target items. Learners then selected which of their generated cues would best support their memory. We compared the effectiveness of their selected cues with those picked randomly from their generated list and with those picked by a different learner. In support of the cue-selection hypothesis, learners picked more effective mnemonic cues than cues selected at random or by other learners.

Experiment 1a

In Experiment 1a, we tested the hypothesis that participants would choose mnemonic cues that support their memories from a list of self-generated options more effectively than random selections would. Participants studied a list of target words and they reported all the potential mnemonic cues that came to mind for each cue. Then, participants selected the cue that they believed would most effectively support their memory for each target word. We assessed the effectiveness of these cue selections by comparing performance on a subsequent cued recall between an honor and dishonor condition (for other research using honor/dishonor paradigms to test the efficacy of metacognitive control, see Kornell & Metcalfe, 2006; Kimball et al., 2012). In the honor condition, participants were given the cue they had thought would be most effective for supporting memory during the later cued-recall test. In the dishonor condition, participants were given a cue they had generated but did not select as most effective. If learners are adept at selecting cues that particularly support later recall – as predicted by our cue-selection hypothesis – then their memory performance should be greater when given cues that they selected than when given other, unchosen cues. Critically, all cues (even those presented in the dishonor condition) are self-generated, so any differences in cued recall across conditions must be driven by learners' selections from among the potential cues rather than the process of generation itself.

Method

Participants

Our primary interest was the within-subjects comparison between the honor and dishonor conditions. Given a medium effect size (Cohen’s d = 0.5), as suggested by prior studies (Tullis & Benjamin, 2015a), a power analysis suggested that we could obtain 80% power with 34 participants in a paired-samples t-test; our mixed-effects model should, if anything, have even higher power (Quené & van den Bergh, 2004). Thus, 34 undergraduate, introductory educational psychology students participated in order to earn partial course credit.

Materials

Fifty words were collected from the University of South Florida Free Association Norms (Nelson et al., 1998). We chose to-be-remembered words that are likely relevant to a college student's life such that participants could potentially have personal experiences with them. Targets included words like “library,” “hobby,” and “professor”; a full list of materials and data are available on the Open Science Framework at https://osf.io/muhtc/?view_only=36d20866557444fb914c3e26784fafe0.

Procedure

Participants completed the experiment on a desktop computer while up to three other participants participated on other computers in the same room. Participants were instructed to remember a list of target words for an upcoming memory test. Participants were told that they would generate a list of potential cues to help them remember each target. They were instructed that they would get a cue back during the test and would have to recall the corresponding target word. Participants could generate any cue that was a single, English word that is not a form of the target (i.e., a plural or a misspelling of the target). Participants typed in every single cue that came to mind in the order that they came to mind. Participants were required to generate at least two possible cues for each target but were encouraged to provide as many as came to mind. As participants entered each potential cue, the list of generated cues was displayed on the bottom of the screen in the order that they output the cues. When they were finished typing all the possible cues that came to mind, participants were instructed to type "finished." Next, participants had to pick the cue that they believed would be the most helpful during the test by typing in that cue. The target stayed on the screen throughout the cue generation and selection process. After they generated and selected cues for each target, they completed an unrelated motor movement task for approximately 10 min.

Finally, participants took a cued-recall test on the target information. During the cued-recall test, a cue that they generated was displayed on the screen and participants tried to recall the corresponding target word. For a random half of the targets, participants received the cue they had selected. For the other half of the targets, participants' choices were dishonored; they received one of the cues they generated but did not select. The dishonored cue was chosen at random from all the non-selected cues the participant provided.

Results

Number of cues output

First, we examined how many potential cues participants output. Participants supplied, on average, 3.81 (SD = 1.36) cues per target item; Fig. 1 depicts the distribution across participants of the average number of cues generated per target.

Fig. 1
figure 1

Distribution of the average number of cues supplied per target in Experiment 1a

Effectiveness of cue choice

Next, we examined whether participants chose cues effectively by comparing cued recall between honored and dishonored conditions. To test this, we fit a mixed logit model with the accuracy of recall as the dependent variable and the condition as an independent variable. One explanation for any potential benefit of honoring learners’ cue choices within this experiment is simply that requiring learners to output multiple cues prompts them to generate cues they would never consider as possible cues. Honoring learners' choices would then tend to favor earlier – and presumably better – cues. In the dishonor condition, by definition, cues were equally likely to come from any output order position and include ineffective cues output later than effective cues. In fact, as Fig. 2 illustrates, learners do select cues that were generated earlier at greater rates than those output later. To control for the possibility that output order is implicated in cue selection, we added output order and its interaction with condition as additional independent variables to our model.

Fig. 2
figure 2

Proportion of cues chosen by output position by the cue generator (Experiment 1a) and by an observer (Experiment 1b). Data points are jittered horizontally to display each clearly

Table 1 displays the results of this model. The model revealed a significant effect of condition: The odds of correct recall were 3.77 (95% CI: [2.84, 4.99]) times greater when a participant’s cue choice was honored (M = 0.82) than when it was dishonoredFootnote 1 (M = 0.57), z = 9.26, p < 0.001. In fact, only one participant showed better memory for dishonored cues than honored cues. Further, this effect cannot be attributed to output order; there was no significant main effect of output order, nor did it significantly interact with condition. Indeed, plotting the proportion correct cued recall by output position and honor condition (Fig. 3, below) shows that honored cues outperformed dishonored cues at every output position.

Table 1 Fixed effect estimates for mixed effects logit model of cued recall accuracy in Experiment 1 as a function of condition
Fig. 3
figure 3

Proportion of correct cued recall as a function of honor/dishonor condition and cue output position. Error bars show standard errors of the mean; error bars get progressively wider with output position due to the reduced number of cues supplied at those positions

Relation of cue characteristics to recall

Why did honoring participants’ cue choices improve recall? Participants may have chosen cues with characteristics that effectively cued the target. To test this hypothesis, we first examined which cue characteristics were associated with a higher probability of recall (e.g., diagnosticity: Van Loon et al., 2014); then, we examined whether those same cue characteristics increased the probability that participants would select those cues (i.e., utilization; van Loon et al., 2014).

As in prior research analyzing cue generation (Tullis & Benjamin, 2015a; Tullis & Benjamin, 2015b; Tullis & Fraundorf, 2017), we examined three cue characteristics. Normative cue-to-target associative strength was operationalized as the proportion of a sample of participants who freely associate the target with the cue from the South Florida Free Association Norms (Nelson et al., 1998). Cue distinctiveness was determined based on the number of words associated (in the South Florida Free Association Norms) from each cue. Finally, we computed cue commonality as proportion of participants in this experiment (calculated across all output positions) who supplied the same cue for each target item. Smaller cue commonality indicates that fewer others reported the same cue and the cue is more idiosyncratic.

Because these cue characteristics are originally measured on different scales (e.g., the number of associated words for cue distinctiveness versus a proportion of participants for cue commonality), we z-scored them so that all of the regression coefficients correspond to the effect of a 1-standard-deviation (SD) change in the cue characteristics (see also Tullis & Fraundorf, 2017). Table 2 displays the meanFootnote 2 for these cue characteristics (in both the original units and z-scores) for items that the learner eventually remembered and for items that the learner eventually did not remember. However, the bivariate relationships must be interpreted with caution; because these cue characteristics are not orthogonally manipulated, it is possible that the apparent presence or absence of one effect taken alone (e.g., an effect of cue distinctiveness) could instead result from a partial confound with some other cue characteristic (e.g., cue commonality). Rather, the cue characteristics can be better understood in a multiple regression that tests the effect of one cue characteristic while holding others constant.

Table 2 Means and standard deviations in Experiment 1a of cue characteristics for cue-target pairs that were and that were not remembered, measured in original units (top half) or standardized scores (bottom half)

We thus used a mixed logit model, displayed in Table 3, to simultaneously examine the relationship of these three cue characteristics – as well as the control variable of output order – to recall. The model indicated that all three cue characteristics influenced recall (replicating Tullis & Fraundorf, 2017), even when controlling for each other and for output order. Consistent with Nairne (2002), the strongest predictor of cue effectiveness was cue distinctiveness: a 1-SD decrease in the number of words associated with the cue (i.e., a more distinctive cue) increased the odds of correct recall by 1.52 times (95% CI: [1.33, 1.73]). There were also effects of associative strength and cue commonality: A 1-SD increase in associative strength increased the odds of correct recall by 1.31 times (95% CI: [1.09, 1.56]), and a 1-SD increase in cue commonality by 1.45 times (95% CI: [1.23, 1.71]). The order in which a cue was originally output was not significantly associated with its ability to later cue recall.

Table 3 Fixed effect estimates for mixed effects logit model of cued recall accuracy in Experiment 1 as a function of cue characteristics

Relation of cue characteristics to selection

Did learners select certain cues because they had the desirable characteristics of being distinctive, common, and strongly associated with the targets? We next examined utilization: which cue characteristics predicted whether participants would select a cue as one they wanted to cue their later memory. Table 4 displays the average characteristics of cues selected for later use versus cues not selected for later use (regardless of whether or not that selection was ultimately honored, since participants had no idea at the time of choice that some selections would not be honored).

Table 4 Means and standard deviations in Experiment 1a of cue characteristics for cues that were and that were not selected measured in original units (top half) or standardized scores (bottom half)

Table 5 displays the results of a mixed-effects model of the (log) odds that a cue would be selected for later use as a function of the cue characteristics. Participants’ selections were sensitive to all three of the cue characteristics that predicted recall: a 1-SD increase in associative strength increased the odds of cue selection by 1.27 (95% CI: [1.17, 1.38]), a 1-SD increase in cue commonality by 1.47 times (95% CI: [1.33, 1.63]), and a 1-SD decrease in the number of associates (i.e., a more distinctive cue) increased the odds of selection by 1.44 times (95% CI: [1.33, 1.57]). Further, the magnitude of these influences on cue selection were roughly similar to their influences on actual recall.

Table 5 Fixed effect estimates for mixed effects logit model of cue choice in Experiment 1a as a function of cue characteristics

In addition, however, participants’ selections were influenced by one feature that had no bearing on actual recall: output order. A 1-standard increase in output order – that is, a cue output about 1.85 serial positions later – decreased the odds that a cue would be selected by 1.89 times (95% CI: [1.73, 2.08]). In fact, this effect was of larger magnitude than any other predictor, even though output order was not significantly predictive of actual recall.

Discussion

Honoring participants' choices about which self-generated cues to present at test produced better memory than dishonoring their choices. Given the general mnemonic benefits of generation (Slamecka & Graf, 1978), the mere act of generating mnemonic cues could be expected to benefit memory through deeper processing of the targets and/or of the relationships between cues and targets (Fisher & Craik, 1980). Although generation may indeed help recall, Experiment 1 shows that specific self-generated cues have mnemonic benefits over and above generation. Both the Honor and Dishonor conditions involved a cue-generation process, but getting the particular cues requested in the Honor condition resulted in superior memory. Thus, Experiment 1 implies that learners can select self-generated memory cues that are likely to be especially effective. Requiring participants to output multiple potential mnemonic cues per target allows us to examine the processes of cue generation and selection separately, which no prior research has done. The results suggest that cue selection contributes to the mnemonic benefits of self-generated cues. The benefit of self-selected cues over non-selected cues is obtained across all output positions.

Further, learners selected fewer than half of the first cues as their chosen mnemonic cue; the percentage of times that learners selected the first cue ranged across participants from 20% to 72%, with a grand mean of 44%. These results indicate that learners do not exclusively rely on the first mnemonic cue that comes to mind to serve as their final cue but rather deliberate over multiple cues before selecting one.

Why did honoring participants’ cue selections improve memory? One reason is likely that these selections were attuned to several properties that made cues effective at cuing the target. Cue-target pairs were more apt to be recalled when the cue words were distinctive, commonly generated, and strongly associated with the target, and participants’ cue selections were appropriately sensitive to all three of these properties. Participants also strongly favored cues that they output earlier in the generation process, even though output order had no actual relation to the usefulness of a cue; we revisit this pattern in the General discussion.

So far, we have demonstrated that learners can choose cues that support their memory more effectively than the average cue. But for cue selection to contribute to the benefits of self-generated memory cues over other-generated cues, the cues that learners select for themselves must be better than the cues that others would select for them.

Experiment 1b

In Experiment 1b, we tested an additional prediction of our cue-selection hypothesis: People should be better at selecting memory cues for themselves than for other learners. Thus, we examined how – and how effectively – people would choose cues for another, different learner. New participants (observers) saw the list of potential cues that a yoked participant generated in Experiment 1 and tried to select the most effective cues for the cue generator – first by selecting among all of the generated cues, and then by choosing between just the learners’ preferred cue and its control from the dishonor condition. In Experiment 1a, honored cues more effectively supported recall than dishonored cues. Thus, if observers select the dishonored cues, that would be evidence that people are better selecting a cue for themselves than for others.

Further, to the extent that observers and generators differ in the efficacy of their cue selections, we can ask whether these differences in preferences can be accounted for the characteristics of these cues they select. In Experiment 1a, we found that cue generators favored distinctive, common, and strongly associated cues; are observers sensitive to these properties?

Method

Participants

As in Experiment 1a, 34 participants completed the experiment for partial course credit in introductory educational psychology classes.

Procedure

Each participant in this experiment was yoked to a single participant in Experiment 1a. These participants were instructed about the details of the prior experiment. We told them that prior participants had generated several potential cues to help support their memories for target words on a later cued-recall test. The new participants were instructed to choose the most helpful cue for the prior participant. This choice was made in two steps that were designed for comparison with the choices made by the original generators. First, participants saw the target word displayed at the top of the screen. All of the potential cue words that a specific prior participant generated were displayed at the bottom of the screen in the order in which generators output them. Participants selected the cue word that they believed would be most beneficial for the prior participant. This choice emulated the selection by the generators among all of their cues.

If the prior participant generated more than two possible cues, the participant completed an additional step for the target. Namely, after they selected one out of all of the possible cues, the possible cues were narrowed down to two possible cues: the one that the prior participant chose (the one displayed in the honor condition) and a random non-selected cue (the one that would have been used in the dishonor condition). Participants then selected which cue out of these two would be more effective. This step was included to provide a more direct analogue to the experimental contrast between the honor and dishonor conditions in Experiment 1. Further, reducing the number of possible cue options should simplify the choice and, therefore, provide a stronger test of the cue-selection hypothesis. For example, if observers fail to choose effectively even when presented with only two options, their poor choices cannot be attributed to choice overload (e.g., Lee & Lee, 2004).

Results

Similarity of cue choice

We first calculated how frequently the yoked participant in Experiment 1b picked the same cue as the generators in Experiment 1a. Observers chose the same cue as the generators for 46% (SD = 10%) of the targets. We compared this observed level of agreement to the proportion of agreement that would be obtained by choosing randomly out of the list of potential cues (M = 32%, SD = 9%). Observers chose the same cue as generators more frequently than random selections, t(33) = 8.78, p < 0.001, d = 1.52. In fact, only two of 34 participants chose the generator-selected cues less frequently than random selections. The distribution of observers' choices by output position is displayed in Fig. 2.

We similarly compared how often the observer selected the honor choice between the two possibilities of the honor cue and the dishonor cue. Observers selected the honor choice for 61% (SD = 9%) of the trials, which is significantly more frequently than chance (50%; t(33) = 7.06, p < 0.001, d = 1.21), but also significantly less than perfect agreement (100%; t(33) = 25.38, p < .001, d = 4.35).

Relation of cue characteristics to selection

The above analyses indicated that observers showed some, though not perfect, agreement with generators’ cue choices. Did this level of agreement obtain because of the degree to which observers did – or did not – base their selections on the same cue characteristics that generators did? Table 6 displays the average characteristics of cues selected by the observer versus those not selected.

Table 6 Means and standard deviations in Experiment 1b of cue characteristics for cues that were and that were not selected measured in original units (top half) or standardized scores (bottom half)

Table 7 displays the results of a mixed-effects model of the (log) odds that an observer would select a cue as a function of that cue’s characteristics. Observers were sensitive to many of the same characteristics as observers: a 1-SD increase in associative strength increased the odds of cue selection by 1.12 (95% CI: [1.04, 1.20]), a 1-SD increase in cue commonality by 1.81 times (95% CI: [1.65, 1.98]), and a 1-SD increase in output order decreased the odds of selection by 2.24 times (95% CI: [2.03, 2.47]). Notably, however, and unlike cue generators, there was no significant effect of cue distinctiveness; in fact, cue generators numerically favored cues that were less distinctive (i.e., had more associates).

Table 7 Fixed effect estimates for mixed effects logit model of cue choice in Experiment 2 as a function of cue characteristics

Next, we directly compared whether generators in Experiment 1a based their selections on different cue characteristics than observers in Experiment 1b. Using only the trials where the generator and observer chose different cues, we determined which characteristics predicted a cue being selected by the generator compared to the yoked observer. Table 8 displays the results of this model, which explicitly tests the differences between Table 5 and Table 7. Generators and observers differed in their use of each of the cue characteristics. More specifically, generators’ selections were more sensitive to cue-to-target associative strength, were more sensitive to the number of associates, were less sensitive to cue commonality, and were more sensitive to output order than observers’ selections were.

Table 8 Fixed effect estimates for mixed effects logit model of self-selected cues relative to observer-selected cues across Experiments 1a and 1b as a function of cue characteristics

Discussion

In Experiment 1b, we tested whether participants could select cues for another learner as effectively as the learners who had generated those cues for themselves. Observers showed an intermediate degree of agreement with the generators’ choices; they selected the generator’s choices more frequently than chance, but they showed significantly less than perfect alignment with the generator. This held true both for the choice among all the generators’ generated cues as well as a choice specifically between the generator’s chosen cue and its dishonor control. This latter comparison is important because it implies that observers not only chose different cues than the original learners, but – given that the honor cues were on average better than the dishonor cues – they also chose less-effective cues (a point we test further in Experiment 2).

Why could people better select cue for themselves than for others? Analysis of the cue characteristics that predicted observers’ cue selections suggests that observers utilized cue characteristics differently than generators did when selecting cues for themselves. Generators relied upon cue-to-target associative strength, cue distinctiveness, and output order more than observers, while observers relied more upon cue commonality. Greater reliance on cue commonality may indicate that observers do not appreciate that idiosyncratic cues (which could tie into strong personal experiences). Critically, while generators valued distinctive cues, observers picked cues that pointed to more potential targets. This failure to capitalize on distinctiveness likely damages observers’ cue selections given that our analysis of cue diagnosticity indicated that cue distinctiveness was the feature most strongly predictive of recall (consistent with past work; Nairne, 2002).

Experiment 2

Comparing the data across Experiments 1a and 1b, we can infer that cue selections by others are less effective than cue selections by oneself. Yet, we could not directly compare the efficacy of cues selected by oneself and others because we could not give observer-selected cues to the cue generator, who had already completed the experiment days or weeks earlier. In this final experiment, learners simultaneously participated in dyads. The cues present at the final cued-recall test included a generator’s selections, the paired observer’s selections, and dishonored selections. The addition of observer-selected cues in the recall test allows us to directly compare how effectively cues selected by oneself support retrieval compared to cues selected by others.

Method

Participants

As in both prior experiments, 34 participants completed the experiment for partial course credit in introductory educational psychology classes.

Procedure

Participants signed up to complete the experiment through a participant-management system online. If two participants showed up for the same session, they completed this particular experiment. Otherwise, they completed an unrelated experiment on their own. Participants were given the same instructions as in Experiment 1 about creating cues to support their memory retrieval later. Three changes were made from Experiment 1. First, participants had to supply at least three cues for each target. Second, two separate lists of 45 different targets were used. Each list had 25 targets from Experiment 1 and 20 additional new targets. The two lists did not share any target items. Two lists were needed because a distinct list was given to each participant in a pair. The third change was that participants chose cues for their partner after they finished generating cues for their list of 45 targets. This cue selection phase mirrored that from Experiment 1b closely. Participants were told that they would see the list of potential cues that another participant had generated and were asked to select the most effective cue for the other participant. Participants were not told that their partner participant was also present in the lab. Participants saw the target with the list of the cues that their partner generated in the order that their partner generated them and typed in their selection. Whenever one participant in a pair finished a phase before their partner, they played a computer puzzle game in the NetLogo programming library called Planarity, in which participants had to untangle a digital knot by moving connected nodes.

After both partners finished selecting cues for each other, their memories for the targets were tested. Targets were randomly (but equally) assigned within-subjects to three conditions: self-selected cue, observer-selected cue, or dishonored cue. In the self-selected cue condition, participants received the cue that they selected. In the observer-selected cue condition, participants received the cue that their partner selected for them. Finally, in the dishonor condition, participants received a random cue from the list that they generated, as long as it was not the cue that the participants selected for themselves.

Results

Similarity of cue choice

We first calculated the overlap between generators’ and observers’ cue selections (see Fig. 4). Regardless of output order, observers selected the same cues as the generators (M = .49, SD = .11) more than would be expected if they were randomly choosing one of the supplied cues (M = .28, SD = .05), t(33) = 11.37, p < .001, d = 1.98, indicating significant overlap between generators’ and observers’ selections. Only one of 34 participants chose the generator-selected cues less frequently than random selections.

Fig. 4
figure 4

Proportion of cues chosen by output position by the cue generator and by the observer. Data points are jittered horizontally to display each clearly

Effectiveness of cue choice

Next, we examined how recall differed between self-selected cues, observer-selected cues, and random cues. The average recall by output position and condition is shown in Fig. 5. To compare these three types of trials in our mixed logit model, we used two orthogonal contrasts. The first compared the self- and observer-selected cues to random cues to assess the benefits of human-selected cues relative to random ones. The second contrast specifically compared self-selected to observer-selected cues to determine whether participants could more effectively select a cue for themselves than for another person.

Fig. 5
figure 5

Proportion of correct cued recall as a function of cue condition and cue output position in Experiment 2. Error bars show standard errors of the mean; error bars get progressively wider with output position due to the reduced number of cues supplied at those positions

Table 9 displays the results of this model.Footnote 3 Both contrasts were significant: When given a cue that someone had selected, the odds of correct recall (M = 79%) were 3.41 (95% CI: [2.43, 4.79]) times greater than when given a random cue (M = 56%), z = 7.08, p < 0.001. There was also a smaller, but significant, benefit of the self-selected cues (M = 82%), with the odds of correct recall being 1.55 (95% CI: [1.11, 2.17]) over the observer-selected cues (M = 76%), z = 2.56, p = .01.

Table 9 Fixed effect estimates for mixed effects logit model of cued recall accuracy in Experiment 2 as a function of condition

Relation of cue characteristics to recall

Thus far, we have seen that (a) cues selected by either the generator or the observer were substantially better than random cues, and (b) self-selected cues were more effective than observer-selected cues. Can the characteristics of the selected cues explain both of these effects?

We first examined cue diagnosticity by examining which cue characteristics predicted recall. Table 10 displays the mean cue characteristics for cue-target pairs that were recalled versus those that were not, and Table 11 shows the results of the mixed logit regression. The results generally mirrored those in Experiment 1a: All three cue characteristics positively predicted recall even controlling for output order, with the strongest effect being that of cue distinctiveness. However, one difference was that, in Experiment 2, output order also predicted performance above and beyond the three beneficial cue characteristics, with the odds of recall declining by 1.19 times (95% CI: [1.05, 1.36]) for each subsequent serial position at which the cue was output. This suggests that later-generated cues may unfavorably differ in other properties, beyond distinctiveness, associative strength, and commonality, that affect recall.

Table 10 Means and standard deviations in Experiment 2 of cue characteristics for cue-target pairs that were and that were not remembered, measured in original units (top half) or standardized scores (bottom half)
Table 11 Fixed effect estimates for mixed effects logit model of cued recall accuracy in Experiment 2 as a function of cue characteristics

Relation of cue characteristics to selection

Can the differential effectiveness of self-selected, observer-selected, and random cues by explained by the characteristics of the cues? For Experiment 2, we examined cue utilization both for self-selected cues (left panel of Table 12) and observer-selected cues (right panel of Table 12).

Table 12 Means and standard deviations in Experiment 2 of cue characteristics for cues that were and that were not selected measured in original units (top half) or standardized scores (bottom half)

We ran two logit mixed models to predict cue choice as a function of cue characteristics: One predicted learners’ choices for themselves and one predicted their partners’ choices. Table 13 displays the results of these models. Both learners’ choice and partners’ choices were sensitive to all three of the beneficial cue characteristics reviewed above: cue-to-target associative strength, cue distinctivenessFootnote 4 (i.e., a low number of associates), and cue commonality. In addition, both choices were predicted by output order; this relationship is rational given that output order did predict recall in Experiment 2. Overall, then, these results suggest that learners and partners were sensitive to cue characteristics predictive of recall, which may explain at least in part why both self-selected and partner-selected cues were better than random cues.

Table 13 Fixed effect estimates for mixed effects logit model of cue choice in Experiment 2 as a function of cue characteristics for self-selected cues (left panel) and observer-selected cues (right panel)

Next, we examined the differences between self-chosen and observer-chosen cues. Similar to our analysis of Experiment 1b, we took the 785 trials where the learner and partner chose different cues and used another mixed logit model to determine which features were predictive of a cue being chosen by the learner as opposed to the observer. Table 14 displays the results of this model. The only cue characteristic that reliably differed between self- and partner-chosen cues was cue commonality: An observer’s choice of cue was more influenced by cue commonality than was the learner’s own choice, replicating the difference observed in Experiment 1. (Learners were also numerically more sensitive to cue distinctiveness, as in Experiment 1, but this effect did not reach conventional levels of significance in Experiment 2.) Overall, this analysis suggests that the benefits of self-chosen cues did not arise from greater sensitivity to normatively desirable cue characteristics because observer-chosen cues were not significantly less sensitive to these characteristics (and, indeed, were more sensitive to cue commonality). Instead, the comparatively small difference between self-chosen and partner-chosen cues may reflect learners capitalizing on more idiosyncratic features that partners would not know about.

Table 14 Fixed effect estimates for mixed effects logit model of self-selected cues relative to observer-selected cues in Experiment 2 as a function of cue characteristics

Discussion

Experiment 2 provided a particularly strong test of differences in the mnemonic cues selected for oneself versus others because observers experienced the process of cue generation and selection themselves before selecting cues for others. This design isolates the effect of selecting cues for oneself while holding constant experience with generating and selecting cues.

This experiment replicated the finding from Experiment 1 that self-selected mnemonic cues supported cued recall better than random cues (i.e., dishonored selections), and it provided direct evidence that cues that are both self-generated and self-selected are better than cues generated by oneself but selected by an observer. Generators may value idiosyncratic cues because they understand their unique knowledge and processing, while observers have no knowledge of the generator’s knowledge. Additionally, while observers in Experiment 1b did not value the distinctiveness of cues, observers in Experiment 2 showed some appreciation of distinctiveness. This may be because observers in Experiment 2 had previously generated and selected cues for themselves. Generating and selecting cues for oneself may highlight the need for distinctive cues and cause learners to see the importance of this cue characteristic for themselves and others.

General discussion

Across two experiments, we tested whether the ability to select effective cues contributes to the benefits of creating one’s own mnemonic cues. Requiring participants to output multiple potential mnemonic cues per target distinguishes the impact of cue selection from cue generation on later cued recall. Our research is the first to analyze the generation and selection of mnemonic cues separately and shows that the process of selecting cues contributes to the benefits of self-generated cues. While not excluding potential direct benefits of generation to the strength of memory traces, our results suggest that the benefits of self-generated cues are at least in part metamnemonic. When considering mnemonic cues, learners typically output more than two possible choices and selected the first cue that came to mind only half of the time. This suggests that cue generation often involves a process of selection among multiple potential cues. Further, subsequent cued recall was consistently greater when learners' cue selections were honored than dishonored, suggesting that learners could indeed identify and select particularly effective memory cues, even among a list of cues that they generated. Finally, generators selected more effective memory cues for themselves than observers could, indicating that this ability was at least partially specific to selecting cues for one’s own memory.

These differences in cue effectiveness can be explained at least partially by systematic differences in the kinds of cues that people chose. The cue selection criteria used by generators and observers somewhat overlap but differ in important ways. Specifically, generators and observers both appreciated normative cue-to-target associative strength, but generators valued distinctiveness more than observers and observers valued cue commonality more than generators. Our data also suggest that the cue characteristics carry import even when controlling for the impact of choice on recall.

An alternative explanation of our results is that the act of selecting cues itself facilitated memory for those items; recent research suggests that selected items are more memorable than unselected items (Coverdale & Nairne, 2019). However, in the Appendix, we report a supplementary analysis examining whether recall accuracy could be predicted both from our existing cue characteristics and from a variable capturing whether the cue was selected or not. The addition of the selection variable did not eliminate the impact of cue characteristics on recall. In other words, cue characteristics predicted recall, even when accounting for the beneficial effects of choice on recall.

Perspective-taking in memory

One reason that people were more effective in selecting cues for their own memory is that cue generators valued the distinctiveness of a cue more than observers. Observers' failure to value cues' distinctiveness as much as generators do echoes prior research (Tullis & Fraundorf, 2017) and may reveal a limitation of our ability to take the perspective of others (Keysar et al., 2003). The current experiments suggest that, even when distinctive cues that are generated by others are available for selections, observers fail to value distinctiveness of cues as much as generators. Although critical to retrieval (Nairne, 2002), distinctiveness may be a particularly difficult cue to utilize when selecting cues because its benefits may be only apparent during retrieval (i.e., when the target is absent). During encoding and selection, learners see both the cue and target, which can allow learners to easily notice and judge the cue-to-target associative strength. However, the presence of the target during cue generation may obscure considerations for how much the cue narrows down the field of potential targets during retrieval. Generators, then, may value this characteristic more than observers because generators see the target without cues, while observers never study the target without the presence of the cue. The underappreciation of distinctiveness may suggest that observers have a poor model of the needs of memory. But, experience with the task may alleviate some of this perspective-taking burden. When observers experienced the task themselves in Experiment 2, their cue selections became somewhat sensitive to distinctiveness.

By comparison, observers value cue commonality more than generators, and the difference in the value of cue commonality likely reveals an additional challenge to taking perspective. Observers likely chose more common cues than generators because they did not recognize how unique cues would fit with someone else’s experiences. Even though observers were shown all the possible cues for the generator, including idiosyncratic cues, they favored more generic common cues – and favored generic cues to a greater extent than generators did. Differences in the value placed upon these cue characteristics by generators and observers reveals a limitation in our ability to effectively take the perspective of others.

Output order

Beyond the beneficial effects of associative strength, cue distinctiveness, and cue commonality, participants’ cue selections were associated with at least one other variable: output order. In both experiments, both generators and observers tended to favor earlier-output cues, even when controlling for the other cue characteristics. In at least some cases, this preference may be rational: In Experiment 2, earlier output order positively predicted recall over and above the cue characteristics. Although it is possible that reflects a causal effect of output order itself on recall, it is perhaps more plausible that this reflects a correlation between output order and some other, unmeasured cue characteristic – for example, imageability or frequency – that benefits recall and correlates with output order. In other cases, namely Experiment 1, participants preferred earlier cues even when output order was fully uncorrelated with recall (see also Tullis & Fraundorf, 2017). This tendency may thus reflect the broader trend for order to influence or bias decision-making, including in the domain of self-regulated learning (e.g., Ariel et al., 2011).

Beliefs versus experience

Although we found that both generators and observers’ choices were sensitive to a broad palette of cue characteristics – some more than others – it is presently unclear whether these choices represent a form explicit or implicit knowledge. Prior work (e.g., Fraundorf & Benjamin, 2014; Kelley & Jacoby, 1996; Koriat et al., 2004) has established that metacognitive judgments are sometimes made based on learners’ explicit, verbalizable beliefs but sometimes based on their in-the-moment experience with individual memoranda, which do not necessarily align. For example, learners may have favored distinctive cues because of a naïve theory they held about what constitutes a good memory cue. But, they could have also selected these cues for other reasons: they resembled cues they have used in the past, for instance, or simply “felt right.” One way for future work to probe learners’ conscious beliefs about effective cues is to adopt a method in which participants read a description of the experiment and judge performance without personally experiencing any individual stimulus (e.g., Koriat et al., 2004; Kornell et al., 2011).

Control of the retrieval environment

Cue generation addresses an important gap in our understanding of metacognition: how learners control their retrieval environments. Learners control their study and encoding by, for instance, choosing study strategies or allocating the amount of time spent on different material (see Finley et al., 2009, for a review). Learners may also regulate their later retrieval, such as by controlling what cues will be available to them (as in the present study) or by adjusting their retrieval strategies (e.g., Fraundorf & Benjamin, 2016; Finley & Benjamin, 2021). However, effectively anticipating and controlling available retrieval routes (i.e., retrieval environments) during encoding may be an especially challenging form of self-regulated learning because learners must take the perspective of their selves in the future (e.g., Kornell & Bjork, 2009). Recent experiments have yielded conflicting results about how effectively learners exercise control over their later retrieval contexts, with some research showing that learners effectively choose test questions that match how they encoded the information (Finley et al., 2012), but other work showing a lack of such appreciation (Finley & Benjamin, 2019).

In the context of cue generation, learners must predict whether the cue they generate will lead them to the appropriate target during later retrieval (i.e., whether they will be able to decode their mediator; Pyc & Rawson, 2010). Retrieval depends upon the overlap between encoding and retrieval contexts, so successful mnemonic cues generated during encoding must partially match the learner's cognitive state at the time of retrieval (e.g., Raaijmakers & Shiffrin, 1980; Ryskin et al., 2015). Mental states naturally shift over time due to new experiences and development (e.g., Estes, 1955). If learners cannot accurately anticipate these changes when generating mnemonic cues, the effectiveness of cues may suffer. Nevertheless, despite natural shifts in our mental contexts (e.g., Kornell & Bjork, 2009), the present results suggest that generators can select cues that are more effective for themselves than others can. This provides even more evidence of the importance of allowing learners to exercise some control over their own learning (Tullis & Benjamin, 2011).

Self-generated cues allow learners to remember information efficiently and effectively. Nevertheless, while self-generated cues are more effective than other-generated cues, they do not yield perfect recall. Self-generated memory cues can fail in two ways: A learner can fail to retrieve the cue (i.e., retrieval deficiency) or the learner can forget how to interpret the cue (i.e., decoding deficiency; Dunlosky et al., 2005, Tullis & Qiu, in press). In our experiments, learners were provided with their mnemonic cue during recall. Therefore, we cannot assess how cue generation impacts learners' ability to retrieve the cue, which has sometimes been shown to limit how much learners can recall (Dunlosky et al., 2005). Future studies can examine how generating cues impacts both retrieval deficiency and cue decodability.

Applications

People generate cues to help them remember information when they take notes in meetings, write to-do lists, name computer files, learn prescription schedules, and encode new acquaintances’ names (Tullis & Finley, 2018). Additionally, students consistently report generating mnemonics to support their learning of new information across a variety of content domains (e.g., Tullis & Maddox, 2020). Successful memory across these diverse tasks may depend upon the creation and selection of effective mnemonic cues. Our results reinforce prior results showing that learners can appropriately generate cues to help their memories (e.g., Tullis & Fraundorf, 2017) and broaden the growing metacognitive literature that suggests that learners can effectively control their own learning, including allocating study time (Tullis & Benjamin, 2011), scheduling study (Benjamin & Bird, 2006), and choosing study strategies (Tullis et al., 2018).

Critically, our current results suggest that people can choose effective cues over ineffective cues, even among those that they generate themselves, and that the benefits of self-generated cues indeed come largely from appropriate selections among multiple cue candidates (rather than from the process of generation itself). Allowing learners to select a specific memory cue for themselves among a list of provided options may be less cognitively demanding than generating one’s own (and require less time) and may not sacrifice mnemonic efficacy. Consequently, teachers or textbooks could provide multiple mnemonics and students could individually choose cues that are effective for their own personal memories. Further, by providing for the selection of personal mnemonic cues among potential cue candidates, we may be able to support memory among low-performing students, who spontaneously generate fewer mnemonic cues than others (Scruggs et al., 1986), students with learning disabilities (Mastropieri & Scruggs, 2000), and elderly adults, who may struggle to generate their own cues (Verhaeghen et al., 1992). Understanding the processes, strengths, and weaknesses of cue generation may ultimately allow us to develop targeted interventions to help people utilize better cues so that they can more effectively remember and apply information.