Setting up an alpha-generating strategy from scratch: A practical example with a portfolio, made up of equity sectors, investing a simple macro signal

As a quantitative researcher, your main goal is to find new financial edges. In this article, we will show an overview of the pipeline for designing alpha-generating investment strategies, with associated python code as usual.

Here are the main steps that will be presented.

Investment rationale
Data collection
Model training
Backtesting
Strategy falsification
Strategy evaluation

Before starting, an important remark, this article is not made to be completely exhaustive, its main purpose is to share some good practices to the community and not to propose the exact secret recipe of the magic potion. 😉

In the research process, you must first think about the type of relationships you want to exploit. These can relate to company fundamentals, investor behaviors or macroeconomic data to name a few. You find a theme that interests you and strengthen your knowledge by reading existing literature, learning all underlying mechanisms.

In this article we will try to create a strategy thanks to a macroeconomic variable, the US Consumer Price Index (CPI), which is one of the main indicators used to measure US inflation. We act as an equity portfolio manager, who wants to add alpha to its fund through better sector allocation.

Our investment rationale is simple. We have observed that in periods of high or low inflation, equities have dispersed performance. For example, when the price of oil rises, energy stocks tend to benefit. We feel that the equity sectors are not equal when it comes to inflation levels and trends.

We measure inflation by the CPI growth rate, what we call “CPI Year on Year growth” (CPI YoY). We decide to use a simple signal and manually define 4 inflation regimes, based on the difference between CPI YoY versus its 12M average – and the 12M CPI YoY rate of change.

Regimes	CPI YoY – CPI YoY 12M average	CPI YoY 12M rate of change	Frequency
1 – Lower than average & Falling	–	–	42.4%
2 – Lower than average & Rising	–	+	8.8%
3 – Higher than average & Falling	+	–	6.3%
4 – Higher than average & Rising	+	+	42.5%

To this extent, we will analyze the behavior of equity sectors through time and inflation regimes, in order to generate adequate allocations for each inflation regime. These allocations will simply follow a ranking weighting scheme and will be implemented at the close of the effective CPI release date, which is historically provided by FRED, rather than the official month-end date which would add a forward-looking bias.

Now let’s collect the data we need.

First for CPI data, we will retrieve it from the US Federal Reserve’s Economic Database called FRED.

There are already python libraries that make it easy to request data from this source, but for demonstration purposes, we coded a small class called “FredAPI” to get the data ourselves.

For the equity sector return data, we will use the Fama/French data library. There, you can find data on the well-known FF equity factors but also on equity sectors. As for the FRED API, we coded a small class, called “FFData”, to request data we need. We download the FF-5 Factors daily returns and also the 10-industry daily returns that will represent our equity sectors.

For these data we have access to a long history, since 1920s!

However, for our use case, we will use data from 1949 to 2023.

All these classes are included in a file Utilities.py that can be found at the bottom of this article.

from Utilities import *

We create our data objects to request our timeseries.

# get macro data
fred = FredAPI(api_key='your_api_key')
# get CPI historical releases dates
cpi_release_dates = fred.getReleaseDates(release_id=10) # CPI-U
# get CPI published data
cpi_ts = fred.getSerieData(serie_id='CPIAUCNS', start_date='1949-02-28', end_date='9999-12-31', flag_real=True) # CPI-U

# convert & adjust date column that should be end of month
cpi_ts['date_adjusted'] = pd.to_datetime(cpi_ts.date, format="%Y-%m-%d") + pd.offsets.MonthEnd(0)
cpi_ts['realtime_start'] = pd.to_datetime(cpi_ts.realtime_start, format="%Y-%m-%d")

# get factors & industries returns
ff = FFData()
# industries
perf_sectors_raw = ff.getIndustryDailyData(data_source='10_Industry_Portfolios', weighting_scheme='value')
# factors
factors_raw = ff.getFactorDailyData()

Model training step

Now we can prepare our data, i.e. split it in training and testing sets. We have in this extent, write a function “prepare_data” to create, align and correct all our features.

train_data = prepare_data(cpi_ts, perf_sectors_raw, cpi_release_dates, start_date='1949-02-28', end_date='1988-12-31', actual=False)
test_data = prepare_data(cpi_ts, perf_sectors_raw, cpi_release_dates, start_date='1988-12-31', end_date='2023-02-28', actual=True)

We select our testing dataset and create our assets returns and regimes timeseries.

assets = test_data.drop(columns=["Regime", "Regime_code"]) / 100
regimes = test_data[["Regime_code"]]
n_assets = len(assets.T)

We create our model allocation thanks to training dataset. We also create a benchmark allocation on the same dates (equal weight allocation).

# ranking weighting scheme
model_allocation = train_data.groupby('Regime_code').mean(numeric_only=True).rank(axis=1, ascending=True) / np.arange(1, n_assets+1).sum() 
# equal weight for benchmark
bench_allocation = model_allocation*0 + 1/len(model_allocation.columns)

Backtesting step

Now we can launch our backtest routine, we created a class “Backtest” for all the backtesting logic. We use our testing dataset. On each release date of the CPI, we check if there is a change in regime, if so we rebalance with a new allocation, otherwise we don’t change anything.

# portfolio backtest
bt = Backtest(data_returns=assets, data_trigger=regimes, data_allocation=model_allocation, start_date="1991-12-31", end_date="2023-02-28")
bt.run()
# benchmark backtest
bt_ref = Backtest(data_returns=assets, data_trigger=regimes, data_allocation=bench_allocation, start_date="1991-12-31", end_date="2023-02-28")
bt_ref.run()
# display annualized statistics
bt.compute_stats(bt_ref)

	Return	Volatility	Turnover	Sharpe Ratio (w/o cash)	Beta	Tracking Error	Information Ratio
Portfolio	12.99%	17.77%	1.50	0.73	1.00	2.64%	0.65
Reference	11.26%	17.56%	0.12	0.64	–	–	–

This result seems good, for a low frequency strategy. Interesting IR and reasonable portfolio turnover (transaction costs for US equities are very low).

Falsification step

Here we will test some alternative scenarii, i.e. falsify our strategy to prove that we really exploited a real effect rather than just being lucky. One possible way is to invert our regimes periods. Like this.

# falsification 1
regimes_false = (test_data[["Regime_code"]]-2).replace(-1,2).replace(-2,3)
bt_false = Backtest(data_returns=assets, data_trigger=regimes_false, data_allocation=model_allocation, start_date="1991-12-31", end_date="2023-01-31")
bt_false.run()
bt_false.compute_stats(bt_ref)

# falsification 2
regimes_false2 = (test_data[["Regime_code"]]-1).replace(-1,3)
bt_false2 = Backtest(data_returns=assets, data_trigger=regimes_false2, data_allocation=model_allocation, start_date="1991-12-31", end_date="2023-01-31")
bt_false2.run()
bt_false2.compute_stats(bt_ref)

# falsification 3
regimes_false3 = np.abs(test_data[["Regime_code"]]-3)
bt_false3 = Backtest(data_returns=assets, data_trigger=regimes_false3, data_allocation=model_allocation, start_date="1991-12-31", end_date="2023-01-31")
bt_false3.run()
bt_false3.compute_stats(bt_ref)

	Return	Volatility	Turnover	Sharpe Ratio (w/o cash)	Beta	Tracking Error	Information Ratio
Falsification #1	10.81%	17.73%	1.62	0.61	1.00	2.44%	-0.18
Falsification #2	10.41%	17.31%	1.50	0.60	0.98	2.32%	-0.36
Falsification #3	9.46%	17.73%	1.49	0.53	0.99	3.02%	-0.59

If our hypothesis was false, or at least had no impact on performance, we should have the same results on all backtests regardless of the inflation regime set.

Strategy evaluation step

Now that we confirmed that we have a real edge, we can assess if our strategy produces alpha versus our benchmark and the market.

# against reference benchmark
X = bt_ref.portfolio_perf.Perf.dropna()
y = bt.portfolio_perf.Perf.dropna()
model = sm.GLS(y,sm.add_constant(X))
res = model.fit()
res.summary()
print(res.params[0]*260) # annualized alpha

We have a beta (vs reference) close to 1 and a positive annualized alpha of 1.57%, the loadings are statistically robust.

We can do the same exercise, but with the FF-5 Factors instead of our reference strategy.

# against FF-5 factors
y = bt.portfolio_perf.Perf.dropna()
X = factors_raw.drop(columns=['RF']) / 100
X = X.loc[y.index]
y = y.loc[X.index]
model = sm.GLS(y,sm.add_constant(X))
res = model.fit()
res.summary()
print(res.params.const*260) # annualized alpha

With this model, we have a low market beta of 0.95, neutral exposure to Size (SMB) and Value (HML), positive exposure to Profitability (RMW) and Investment (CMA). The alpha is robust and positive, around 3.8% annually.

We may also check, on a rolling window, market beta and alpha behaviors.

# we can also check the stability of alpha and market beta coefficients through time
model_roll = RollingOLS(y, sm.add_constant(X), window=260)
res_roll = model_roll.fit()

plt.figure(figsize=(10, 6))
(res_roll.params.const*260).plot()
(res_roll.params.const*0+res_roll.params.const.mean()*260).plot(linestyle='dashed')

plt.figure(figsize=(10, 6))
(res_roll.params['Mkt-RF']).plot()
(res_roll.params['Mkt-RF']*0+res_roll.params['Mkt-RF'].mean()).plot(linestyle='dashed')

We can have a look at annual calendar performances, to see if returns were stable over time.

# calendar returns
calendar_returns = pd.concat([(bt.portfolio_perf.Value.groupby(bt.portfolio_perf.index.year).last() / bt.portfolio_perf.Value.groupby(bt.portfolio_perf.index.year).last().shift(1) - 1), 
          bt_ref.portfolio_perf.Value.groupby(bt_ref.portfolio_perf.index.year).last() / bt_ref.portfolio_perf.Value.groupby(bt_ref.portfolio_perf.index.year).last().shift(1) - 1], axis=1, ignore_index=True).transpose().dropna(axis=1).rename(index={0:'Ptf', 1:'Ref'}).transpose()
calendar_returns["Excess"] = calendar_returns["Ptf"] - calendar_returns["Ref"]

plt.figure(figsize=(10, 6))
plt.plot(test_data.value_yoy.resample('Y').last().index.year, test_data.value_yoy.resample('Y').last(), 
         linestyle="dashed", alpha=2, label="US CPI YoY", color='red')
plt.bar(calendar_returns.index, calendar_returns.Excess, label='Excess returns')
plt.legend()
plt.grid(True, linestyle="-.")
plt.show()

We can also have a look at the return distribution of our strategy.

moments = pd.DataFrame(index=["Ann. Mean","Ann. Variance","Skewness","Kurtosis"], columns=calendar_returns.columns)
moments.loc["Ann. Mean", "Ptf"] = bt.portfolio_perf.Perf.mean() * 260
moments.loc["Ann. Variance", "Ptf"] = bt.portfolio_perf.Perf.var() * 260
moments.loc["Skewness", "Ptf"] = bt.portfolio_perf.Perf.skew()
moments.loc["Kurtosis", "Ptf"] = bt.portfolio_perf.Perf.kurt()

moments.loc["Ann. Mean", "Ref"] = bt_ref.portfolio_perf.Perf.mean() * 260
moments.loc["Ann. Variance", "Ref"] = bt_ref.portfolio_perf.Perf.var() * 260
moments.loc["Skewness", "Ref"] = bt_ref.portfolio_perf.Perf.skew()
moments.loc["Kurtosis", "Ref"] = bt_ref.portfolio_perf.Perf.kurt()

moments.loc["Ann. Mean", "Excess"] = (bt.portfolio_perf.Perf - bt_ref.portfolio_perf.Perf).mean() * 260
moments.loc["Ann. Variance", "Excess"] = (bt.portfolio_perf.Perf - bt_ref.portfolio_perf.Perf).var() * 260
moments.loc["Skewness", "Excess"] = (bt.portfolio_perf.Perf - bt_ref.portfolio_perf.Perf).skew()
moments.loc["Kurtosis", "Excess"] = (bt.portfolio_perf.Perf - bt_ref.portfolio_perf.Perf).kurt()

	Ptf	Ref	Excess
Ann. Mean	0.137955	0.122172	0.0157832
Ann. Variance	0.0315724	0.0308491	0.000697226
Skewness	-0.215768	-0.297998	0.109316
Kurtosis	11.6345	11.4875	5.77481

Through the estimation of the moments of the excess return distribution of the strategy, we assessed that we have a higher mean at the cost of higher variance and higher kurtosis (higher tail risk). However our strategy (excess) is positively skewed which means that extreme events tend to be on the positive side (frequent small losses and few large gains).

From all these results, the portfolio manager finds that this strategy could be a good diversifier of his overall strategy and chooses to give it some budget in his tactical pocket!

To conclude, in this article we presented an overview of the main steps in designing alpha-generating strategies. Start by seriously thinking about the rationale of your strategy, then collect and adjust your data. Leverage data efficiently with separate training and testing sets to create a robust investment model. Back-test your results across multiple market regimes and most importantly falsify your strategy to assess the validity of the strategy. Finally check if you have produced significant and stable alpha over time while also checking your returns distribution profile.

Annex

Below are all utility classes and functions used in this article.

import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
from tqdm import tqdm
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
from zipfile import ZipFile
import tempfile
from io import StringIO
from urllib3.exceptions import InsecureRequestWarning
from urllib3 import disable_warnings

disable_warnings(InsecureRequestWarning)

class FFData():
    """A set of methods to get financial data from the FF data library"""
    def __init__(self):
        """Create the request object"""
        self.s = requests.Session()
        self.baseURL = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/"
        
    def getFactorDailyData(self):
        """ Get 5-factor data from a specific Fama French data source """
        downloadURL = self.baseURL + 'ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip'
        download = self.s.get(url=downloadURL, verify=False)

        with tempfile.TemporaryFile() as tmpf:
            tmpf.write(download.content)
            with ZipFile(tmpf, "r") as zf:
                data = zf.open(zf.namelist()[0]).read().decode()
        df = pd.read_csv(StringIO(data), engine='python', skiprows=3, skipfooter=0, decimal='.', sep=',').rename(columns={'Unnamed: 0':'Date'})
        df_filtered = df.copy()
        df_filtered.Date = pd.to_datetime(df_filtered.Date, format="%Y%m%d")
        df_filtered = df_filtered.set_index("Date")
        return df_filtered.astype(float)
        
    def getIndustryDailyData(self, data_source='10_Industry_Portfolios', weighting_scheme='value'):
        """ Get industry data from a specific Fama French data source """
        downloadURL = self.baseURL + 'ftp/%s_daily_CSV.zip'%(data_source)
        download = self.s.get(url=downloadURL, verify=False)

        with tempfile.TemporaryFile() as tmpf:
            tmpf.write(download.content)
            with ZipFile(tmpf, "r") as zf:
                data = zf.open(zf.namelist()[0]).read().decode()
        df = pd.read_csv(StringIO(data), engine='python', skiprows=9, skipfooter=1, decimal='.', sep=',').rename(columns={'Unnamed: 0':'Date'})
        if weighting_scheme == 'value':
            keep_flag = 'first' # keep value weighted returns
        else:
            keep_flag = 'last' # keep equal weighted returns
        df_filtered = df.loc[df[df.Date.str.isnumeric() == True].drop_duplicates(subset=['Date'], keep=keep_flag).index]
        df_filtered.Date = pd.to_datetime(df_filtered.Date, format="%Y%m%d")
        df_filtered = df_filtered.set_index("Date")
        return df_filtered.astype(float)
    
class FredAPI():
    """A set of methods to get economic data from the FRED® and ALFRED® websites hosted by the Economic Research Division of the Federal Reserve Bank of St. Louis. Requests can be customized according to data source, release, category, series, and other preferences."""
    def __init__(self, api_key):
        """Create the request object"""
        self.s = requests.Session()
        self.api_key = api_key 
        self.baseURL = "https://api.stlouisfed.org/fred/"

    def getAllReleases(self):
        """ Get all available releases """
        downloadURL = self.baseURL + 'releases?api_key=%s&file_type=json'%(self.api_key)
        download = self.s.get(url=downloadURL)
        return pd.DataFrame(download.json()['releases'])

    def getReleaseDates(self, release_id=10):
        """ Get release dates for a specific economic variable """
        downloadURL = self.baseURL + '/release/dates?release_id=%s&api_key=%s&file_type=json'%(release_id, self.api_key)
        download = self.s.get(url=downloadURL)
        return pd.DataFrame(download.json()['release_dates'])
    
    def getReleaseLinkedSeries(self, release_id=10):
        """ Get economic series for a specific release of economic data"""
        downloadURL = self.baseURL + '/release/series?release_id=%s&api_key=%s&file_type=json'%(release_id, self.api_key)
        download = self.s.get(url=downloadURL)
        return pd.DataFrame(download.json()['seriess'])
    
    def getSerieData(self, serie_id='CPIAUCNS', start_date='', end_date='', flag_real=False):
        """ Get observations time serie for a specific economic variable """
        if flag_real == True:
            downloadURL = self.baseURL + '/series/observations?series_id=%s&api_key=%s&observation_start=%s&observation_end=%s&realtime_start=%s&realtime_end=%s&file_type=json'%(serie_id, self.api_key, start_date, end_date, start_date, end_date)
        else:
            downloadURL = self.baseURL + '/series/observations?series_id=%s&api_key=%s&observation_start=%s&observation_end=%s&file_type=json'%(serie_id, self.api_key, start_date, end_date)
        download = self.s.get(url=downloadURL)
        df = pd.DataFrame(download.json()['observations']) 
        df["value"] = pd.to_numeric(df["value"])
        return df     
    
class Backtest():
    """ A set of methods to backtest a 10-industry portfolio following a regime portfolio allocation model  """
    def __init__(self, data_returns, data_trigger, data_allocation, start_date="1951-03-31", end_date="1989-12-31"):
        # convert to datetime
        self.start_date = pd.to_datetime(start_date, format='%Y-%m-%d')
        self.end_date = pd.to_datetime(end_date, format='%Y-%m-%d')
        # subsample data accordingly
        self.returns = data_returns.loc[self.start_date:self.end_date]
        self.trigger = data_trigger.loc[self.start_date:self.end_date]
        self.allocation_model = data_allocation
        self.portfolio = pd.DataFrame(index=self.returns.index, columns=self.returns.columns)
        self.portfolio_perf = pd.DataFrame(index=self.returns.index, columns=["Value", "Perf", "Turnover"], dtype=float)
        # initiate portfolio composition
        self.current_state = -1
        self.current_portfolio = 1/len(self.allocation_model.columns)
        self.portfolio.loc[self.start_date] = self.current_portfolio
        self.current_value = 100
        self.portfolio_perf.loc[self.start_date, "Value"] = self.current_value
        # start after first period
        self.dates_to_run = self.returns.iloc[1:].index
        
    def _event(self, dt):
        # check if there is a new state, return 1 if yes, else 0
        self.new_state = self.trigger.loc[dt].values[0]
        if self.new_state != self.current_state:
            return 1
        else:
            return 0
    
    def _rebalance(self, dt):
        # apply allocation model
        previous_ptf = self.current_portfolio
        current_returns = self.returns.loc[dt]
        new_ptf = self.allocation_model.loc[self.new_state]
        previous_ptf_drifted = previous_ptf * (1 + current_returns)
        # compute new nav and performance
        self.portfolio_perf.loc[dt, "Value"] = self.current_value * previous_ptf_drifted.sum()
        self.portfolio_perf.loc[dt, "Perf"] = (current_returns * previous_ptf).sum()
        # also compute turnover of end-of-period weights versus new weights
        self.portfolio_perf.loc[dt, "Turnover"] = np.sum(np.abs(new_ptf - (previous_ptf_drifted / previous_ptf_drifted.sum())))
        # save new weights from alloc model
        self.portfolio.loc[dt] = new_ptf
    
    def _drift(self, dt):
        # compute drifted weights
        previous_ptf = self.current_portfolio
        current_returns = self.returns.loc[dt]
        previous_ptf_drifted = previous_ptf * (1 + current_returns)
        new_ptf = previous_ptf_drifted / previous_ptf_drifted.sum()
        # compute new nav and performance
        self.portfolio_perf.loc[dt, "Value"] = self.current_value * previous_ptf_drifted.sum()
        self.portfolio_perf.loc[dt, "Perf"] = (current_returns * previous_ptf).sum()
        self.portfolio_perf.loc[dt, "Turnover"] = np.nan # no turnover       
        # save drifted weights
        self.portfolio.loc[dt] = new_ptf
        return 0
    
    def compute_stats(self, reference=None):
        # compute statistic analysis of the backtest
        stats = pd.DataFrame(index=["Portfolio", "Benchmark"], columns=["Return", "Volatility", "Turnover"], dtype=float)
        stats.loc["Portfolio", "Return"] = self.portfolio_perf.Perf.add(1).prod() ** (260 / len(self.portfolio_perf)) - 1
        stats.loc["Portfolio", "Volatility"] = self.portfolio_perf.Perf.std() * np.sqrt(260)
        stats.loc["Portfolio", "Sharpe Ratio"] = stats.loc["Portfolio", "Return"] / stats.loc["Portfolio", "Volatility"]
        stats.loc["Portfolio", "Turnover"] = self.portfolio_perf.groupby(self.portfolio_perf.index.year).sum().Turnover.mean()
        if reference is not None:
            stats.loc["Benchmark", "Return"] = reference.portfolio_perf.Perf.add(1).prod() ** (260 / len(reference.portfolio_perf)) - 1
            stats.loc["Benchmark", "Volatility"] = reference.portfolio_perf.Perf.std() * np.sqrt(260)
            stats.loc["Benchmark", "Sharpe Ratio"] = stats.loc["Benchmark", "Return"] / stats.loc["Benchmark", "Volatility"]
            stats.loc["Benchmark", "Turnover"] = reference.portfolio_perf.groupby(reference.portfolio_perf.index.year).sum().Turnover.mean()
            stats.loc["Portfolio", "Beta"] = pd.concat([self.portfolio_perf.Perf,reference.portfolio_perf.Perf],axis=1).cov().iloc[0,1] / reference.portfolio_perf.Perf.var()
            stats.loc["Portfolio", "Tracking Error"] = (self.portfolio_perf.Perf-reference.portfolio_perf.Perf).std() * np.sqrt(260)
            stats.loc["Portfolio", "Information Ratio"] = (stats.loc["Portfolio", "Return"] - stats.loc["Benchmark", "Return"]) / stats.loc["Portfolio", "Tracking Error"]
        return stats
    
    def run(self):
        for dt in tqdm(self.dates_to_run):
            if self._event(dt) == 1:
                self._rebalance(dt)
            else:
                self._drift(dt)
            self.current_state = self.new_state # assign new state
            self.current_value = self.portfolio_perf.loc[dt, "Value"] # assign new nav
            self.current_portfolio = self.portfolio.loc[dt] # assign new portfolio
            
            
def prepare_data(data_signal, data_returns, cpi_releases, start_date='1988-12-31', end_date='2023-01-31', actual=True):
    """ A function to create inflation features and choose the release date, i.e. actual or end of month """
    if actual == True:
        cpi_releases["real_date"] = pd.to_datetime(cpi_releases.date, format="%Y-%m-%d")
        mapping = cpi_releases.merge(data_signal.sort_values('realtime_start').drop_duplicates(subset='date_adjusted', keep='first').sort_values('date_adjusted')[["realtime_start", "date_adjusted"]], how='left', right_on='realtime_start', left_on='real_date')[["date_adjusted", "real_date"]].dropna()
        signal = data_signal.sort_values('realtime_start').drop_duplicates(subset='date_adjusted', keep='last').sort_values('date_adjusted')[["date_adjusted", "value"]].merge(mapping, how='left', on='date_adjusted')
    else:
        signal = data_signal.sort_values('realtime_start').drop_duplicates(subset='date_adjusted', keep='last').sort_values('date_adjusted')   
    signal = signal.loc[(signal.date_adjusted >= start_date) & (signal.date_adjusted <= end_date)]
    signal['value_yoy'] = signal.value / signal.value.shift(12) - 1
    signal['value_yoy_mean'] = signal['value_yoy'].rolling(12).mean() # mean inflation rate of last 12M
    signal['value_yoy_rate'] = signal['value_yoy'] - signal['value_yoy'].shift(12) # rate of change of inflation

    conditionList = [(signal['value_yoy'] < signal['value_yoy_mean']) & (signal['value_yoy_rate'] < 0),
                     (signal['value_yoy'] < signal['value_yoy_mean']) & (signal['value_yoy_rate'] >= 0), 
                     (signal['value_yoy'] >= signal['value_yoy_mean']) & (signal['value_yoy_rate'] < 0), 
                     (signal['value_yoy'] >= signal['value_yoy_mean']) & (signal['value_yoy_rate'] >= 0)]
    choiceList = ['Lower & Falling', 'Lower & Rising', 'Higher & Falling', 'Higher & Rising'] 
    choiceList_code = [0, 1, 2, 3] 
    signal['Regime'] = np.select(conditionList, choiceList, default=np.nan) 
    signal['Regime_code'] = np.select(conditionList, choiceList_code, default=np.nan) 
    signal.dropna(inplace=True)
    data_returns = data_returns.loc[(data_returns.index >= start_date) & (data_returns.index <= end_date)]
    if actual == True:
        prepared_data = data_returns.merge(signal[['real_date','Regime','Regime_code']] \
                                                  .set_index('real_date'), how='left', right_index=True, left_index=True) \
                                                  .fillna(method="ffill").dropna()
    else:
        prepared_data = data_returns.merge(signal[['date_adjusted','Regime','Regime_code']] \
                                                  .set_index('date_adjusted'), how='left', right_index=True, left_index=True) \
                                                  .fillna(method="ffill").dropna()
    return prepared_data

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.