# Data & Code

# Equity Anomaly data

## Portfolio sorts

Stocks are sorted into N portfolios. Value-weighted returns within each portfolio. NYSE breakpoints.

Current data (July 1963 -- December 2019): portfolio sorts (daily), portfolio sorts (monthly), portfolio assignments.

References:

Haddad, Kozak, Santosh (2020) "Factor Timing": data (daily), data (monthly).

Giglio, Kelly, Kozak (2020) "Equity Term Structures without Dividend Strips Data": data (daily), data (monthly).

Kozak, Nagel, Santosh (2018) "Interpreting Factor Models" use an older version of these data.

## Characteristic-managed portfolios

Portfolios are constructed by weighing each stock by its value of a characteristic signal. Firms with market equity below 0.01% of the aggregate US market cap are removed. Characteristics signals are equal to cross-sectional ranks of a given stock's characteristic, centered, and normalized by the sum of absolute values of all ranks in the cross section.

Current data (July 1963 -- December 2019): daily, monthly.

References:

Kozak, Nagel, Santosh (2020) "Shrinking the Cross-Section": data (daily), data (monthly), source code (a new version with L1L2 penalty), slides (TeX).

Kozak and Santosh (2020) "Why do Discount Rates Vary?" (used a subset of the data above).

## Characteristic signals

This panel dataset contains values of characteristics signals for for each stock at any point in time. Firms with market equity below 0.01% of the aggregate US market cap are removed. Characteristics signals are equal to cross-sectional ranks of a given stock's characteristic, centered, and normalized by the sum of absolute values of all ranks in the cross section.

Current data (July 1963 -- December 2019): characteristic signals.

References:

Kozak (2020) "Kernel Trick for the Cross-Section": data.

# synthetic Equity strip yield data (preliminary)

This dataset is constructed using the model in Giglio, Kelly, Kozak (2021) "Equity Term Structures without Dividend Strips Data". The data contain end-of-month equity yields, as defined by equation (27) in the paper (et,n). The data contain yields for the aggregate market index for maturities 1--100 years, and for the cross-section of 100 portfolios (50 long and 50 short ends of anomalies below) for maturities 1--15 years. Note that the S&P 500 strips data (tradable contracts) most closely corresponds to the sizeS cross-sectional portfolio in these data (large firms).

Current data (August 1975 -- September 2020): aggregate and cross-sectional synthetic equity strip yields.

References:

Giglio, Kelly, Kozak (2021) "Equity Term Structures without Dividend Strips Data": data.