scorer_detect {vitals} | R Documentation |
Scoring with string detection
Description
The following functions use string pattern detection to score model outputs.
-
detect_includes()
: Determine whether thetarget
from the sample appears anywhere inside the model output. Can be case sensitive or insensitive (defaults to the latter). -
detect_match()
: Determine whether thetarget
from the sample appears at the beginning or end of model output (defaults to looking at the end). Has options for ignoring case, white-space, and punctuation (all are ignored by default). -
detect_pattern()
: Extract matches of a pattern from the model response and determine whether those matches also appear intarget
. -
detect_answer()
: Scorer for model output that precedes answers with "ANSWER: ". Can extract letters, words, or the remainder of the line. -
detect_exact()
: Scorer which will normalize the text of the answer and target(s) and perform an exact matching comparison of the text. This scorer will returnCORRECT
when the answer is an exact match to one or more targets.
Usage
detect_includes(case_sensitive = FALSE)
detect_match(
location = c("end", "begin", "end", "any"),
case_sensitive = FALSE
)
detect_pattern(pattern, case_sensitive = FALSE, all = FALSE)
detect_exact(case_sensitive = FALSE)
detect_answer(format = c("line", "word", "letter"))
Arguments
case_sensitive |
Logical, whether comparisons are case sensitive. |
location |
Where to look for match: one of |
pattern |
Regular expression pattern to extract answer. |
all |
Logical: for multiple captures, whether all must match. |
format |
What to extract after |
Value
A function that scores model output based on string matching. Pass the
returned value to $eval(scorer)
. See the documentation for the scorer
argument in Task for more information on the return type.
See Also
model_graded_qa()
and model_graded_fact()
for model-based
scoring.
Examples
if (!identical(Sys.getenv("ANTHROPIC_API_KEY"), "")) {
# set the log directory to a temporary directory
withr::local_envvar(VITALS_LOG_DIR = withr::local_tempdir())
library(ellmer)
library(tibble)
simple_addition <- tibble(
input = c("What's 2+2?", "What's 2+3?"),
target = c("4", "5")
)
# create a new Task
tsk <- Task$new(
dataset = simple_addition,
solver = generate(solver_chat = chat_anthropic(model = "claude-3-7-sonnet-latest")),
scorer = detect_includes()
)
# evaluate the task (runs solver and scorer)
tsk$eval()
}