masked_targets_pred {pangoling} | R Documentation |
Get the predictability of a target word (or phrase) given a left and right context
Description
Get the predictability (by default the natural logarithm of the word probability) of a vector of target words (or phrase) given a vector of left and of right contexts using a masked transformer.
Usage
masked_targets_pred(
prev_contexts,
targets,
after_contexts,
log.p = getOption("pangoling.log.p"),
ignore_regex = "",
model = getOption("pangoling.masked.default"),
checkpoint = NULL,
add_special_tokens = NULL,
config_model = NULL,
config_tokenizer = NULL
)
Arguments
prev_contexts |
Left context of the target word in left-to-right written languages. |
targets |
Target words. |
after_contexts |
Right context of the target in left-to-right written languages. |
log.p |
Base of the logarithm used for the output predictability values.
If |
ignore_regex |
Can ignore certain characters when calculating the log
probabilities. For example |
model |
Name of a pre-trained model or folder. One should be able to use models based on "bert". See hugging face website. |
checkpoint |
Folder of a checkpoint. |
add_special_tokens |
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python. |
config_model |
List with other arguments that control how the model from Hugging Face is accessed. |
config_tokenizer |
List with other arguments that control how the tokenizer from Hugging Face is accessed. |
Details
A masked language model (also called BERT-like, or encoder model) is a type of large language model that can be used to predict the content of a mask in a sentence.
If not specified, the masked model that will be used is the one set in
specified in the global option pangoling.masked.default
, this can be
accessed via getOption("pangoling.masked.default")
(by default
"bert-base-uncased"). To change the default option
use options(pangoling.masked.default = "newmaskedmodel")
.
A list of possible masked can be found in Hugging Face website
Using the config_model
and config_tokenizer
arguments, it's possible to
control how the model and tokenizer from Hugging Face is accessed, see the
python method
from_pretrained
for details. In case of errors check the status of
https://status.huggingface.co/
Value
A named vector of predictability values (by default the natural logarithm of the word probability).
More examples
See the online article in pangoling website for more examples.
See Also
Other masked model functions:
masked_tokens_pred_tbl()
Examples
masked_targets_pred(
prev_contexts = c("The", "The"),
targets = c("apple", "pear"),
after_contexts = c(
"doesn't fall far from the tree.",
"doesn't fall far from the tree."
),
model = "bert-base-uncased"
)