sklift.datasets.fetch_criteo

Criteo Uplift Modeling Dataset

This is a copy of Criteo AI Lab Uplift Prediction dataset.

Data description

This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising.

Fields

Here is a detailed description of the fields (they are comma-separated in the file):

  • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)

  • treatment: treatment group. Flag if a company participates in the RTB auction for a particular user (binary: 1 = treated, 0 = control)

  • exposure: treatment effect, whether the user has been effectively exposed. Flag if a company wins in the RTB auction for the user (binary)

  • conversion: whether a conversion occured for this user (binary, label)

  • visit: whether a visit occured for this user (binary, label)

Key figures

  • Format: CSV

  • Size: 297M (compressed) 3,2GB (uncompressed)

  • Rows: 13,979,592

  • Response Ratio:

    • Average Visit Rate: .046992

    • Average Conversion Rate: .00292

  • Treatment Ratio: .85

This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP) This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

About Criteo

https://upload.wikimedia.org/wikipedia/commons/d/d2/Criteo_logo21.svg

Criteo is an advertising company that provides online display advertisements. The company was founded and is headquartered in Paris, France. Criteo’s product is a form of display advertising, which displays interactive banner advertisements, generated based on the online browsing preferences and behaviour for each customer. The solution operates on a pay per click/cost per click (CPC) basis.

Link to the company’s website: https://www.criteo.com/