In [59]:
Options are fun Kauri Beckmann Published 26/03/2025 kauri.b@outlook.co.nz
In [ ]:
Stock A is currently trading at $30. What's the probability it trades above $31 tomorrow? What's the probability it stays flat? What's the probability at X time that the price is Y or Z? If we know this, we know whether we should be long or short. Here I seek to answer this question through the lens of options-impled probability. I use two methods: Breeden Litzenberger, and Butterfly Spreads. For your sanity I have hidden sections of code that I find rather uninteresting.
In [19]:
In [20]:
Here I look at Apple call options, with price data pulled directly from Interactive Brokers. Here's a little function, so we can visualise our data. Current stock price: 223.15 Time to expiry: 10 days
In [57]:
original_df = pd.read_pickle("AAPL_2025xxxx_call_data.pkl")
original_df["mid"] = (original_df["ask"] + original_df["bid"])/2
plot_price(original_df, 10)
In [56]:
Right now we've got calls with no volume. Also note we've calculated the mid price. Everything going forward operates on the idea that we can trade at mid, which is not necesessarily a realistic assumption. This is an advantage of working with a more liquid asset. First lets start by filtering our call data. I filter out strike below 190 and above 240; makes butterfly calculations so much simpler as our strike prices are now evenly distributed.
In [23]:
def filter_df(original_df, filter):
filter_low, filter_high, filter_volume, filter_specific_values = filter
filtered_df = original_df
filtered_df = filtered_df[(filtered_df["volume"]) > filter_volume]
filtered_df = filtered_df[filtered_df["strike"] >= filter_low]
filtered_df = filtered_df[(filtered_df["strike"]) <= filter_high]
filtered_df = filtered_df[~filtered_df["strike"].isin(filter_specific_values)]
filtered_df = filtered_df.reset_index(drop=True)
return filtered_df
# filter: low, high, volume, specific values
filter = (190, 240, 0, [])
filtered_df = filter_df(original_df, filter)
plot_price(filtered_df, 10)
In [24]:
"""A few little functions"""
def mid(strike):
"""Simply pulls our mid price"""
mid = filtered_df[filtered_df["strike"] == strike]["mid"].iloc[0]
return round(mid,10)
def butterfly_spread(long1, short, long2):
"""Finds payout and profit structure for any butterfly spread"""
prices = []
payouts = []
profits = []
premium = mid(long1) - 2 * mid(short) + mid(long2)
for i in range(int((long1-(short-long1)*5)*2), int((long2+(short-long1)*5)*2)):
payout = max(i/2 - long1, 0) - 2 * max(i/2 - short, 0) + max(i/2 - long2, 0)
prices.append(i/2)
payouts.append(payout)
profits.append(payout - premium)
return prices, payouts, profits, premium
In [25]:
Butterfly Spreads Long one call, lower strike Short two calls, some strike Long one call, higher strike Lets examine what happens with this payout structure. strike mid price Long 220 5.85 2x short 222.5 4.3 Long 225 2.985
In [26]:
prices, payouts, profits, premium = butterfly_spread(220, 222.5, 225)
print("Premium: ", round(premium,10))
plot_butterfly_spread_payout(prices, payouts, profits)
Premium: 0.235
In [27]:
We generate a maximum payout of 2.5 when the call expires between 220 and 225. Note our long and short strikes should be equidistant, else we generate asymmetric payouts.
In [28]:
Now, for point of argument, let's ignore the payout structure varies between 220 and 225, and assume the payout to be 2.5. We know our premium is 0.235 By logic we can deduce our probability that the stock price will expire between 220 and 225 to be premium/average payout, Implied probability of expiring between 220 and 225 = 0.094 And, what if we repeat this across our entire option pricing data? First, we have a major constraint. Interpolation would allow us to make our spreads infinitely narrow, to generate an infinitely precise probability curve. We can't interpolate options prices. strike price Long 220 5.85 2x short 221.25 5.075 (interpolated) Long 222.5 4.3 If instead we tried to interpolate for strike 221.25, we would find: Premium = 5.85 + 4.3 - 2 x 5.075 = 0 Infinite money glitch? Well... note that this can even occur for real prices, particularly for less liquid options. Sometimes our premium actually falls below zero - some market maker out there is raking it in! But for finding implied probability, interpolating option prices doesn't work. I'll come back to this later.
In [29]:
"""More functions"""
def butterfly_probs(df):
'Calculates the probability associated with a given butterfly spread'
max_payoff = df["strike"].iloc[1] - df["strike"].iloc[0]
#print("max payoff =", max_payoff)
df['butterfly_p'] = np.nan
for i in range(1, len(df)-1):
butterfly_p = (
df.iloc[i - 1]["mid"]
- 2 * df.iloc[i]["mid"]
+ df.iloc[i + 1]["mid"]
)/max_payoff
df.at[i, 'butterfly_p'] = butterfly_p
return df
In [30]:
print("So, let's calculate our butterfly implied probability across our call data range.")
butterfly_probs(filtered_df)
total_butterfly_p = filtered_df['butterfly_p'].sum()
print("Total implied probability:", total_butterfly_p)
plot_butterfly_IP(filtered_df, 30)
So, let's calculate our butterfly implied probability across our call data range. Total implied probability: 0.9439999999999983
In [31]:
Notice how noisy our data appears to be. In part at least, this is because my approach to deriving implied probability is rather crude. A total implied probability below 1 suggests the market is underpricing the options. Obviously this is not the case. Our butterfly spreads are rather wide. We are double-counting our spreads; we calculate butterfly spread 220-225, 222.5-227.5 etc And we are double-counting the payoff; average payoff between two strikes is half of max payout (1.25 not 2.5) We have also missed a bit of data at the tail ends, which we expect to make up some fraction of probability. I intend to come back and update this section once I figure out the mathematics behind this. For now I resolve this crude approach a different way.
In [32]:
As butterflies become narrower, our model becomes more accurate. If we could interpolate, we could model infinitely-small changes in strike price. But as we've seen before, we can't interpolate strike prices... But if we convert back to the inputs for a call price, we might be able to interpolate. Recalling Black Scholes: Inputs: - S, current stock price, 223.15 - t, time to expiry, 10 days - r risk-free rate, SOFR 0.0431 - σ, variance (volatility, unknown) Outputs: - C, call price In our case, we already know the call price. We can plug this in with S, t, r to calculate our Implied Volatility. Apple pays dividends of 0.25 per share. Ignoring dividends is a pretty good assumption.
In [ ]:
def CallPrice(S, sigma, K, T, r):
"""Black-Scholes call option price formula."""
d1 = (math.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * T**0.5)
d2 = d1 - sigma * T**0.5
n1 = norm.cdf(d1)
n2 = norm.cdf(d2)
DF = math.exp(-r * T)
return S * n1 - K * DF * n2
def bisection(f, a, b, tol=1e-8):
"""Bisection method to allow us to use the Black Scholes in inverse."""
if f(a) * f(b) >= 0:
print("Bisection method fails: No sign change in interval.")# - old debug code.
#Sometimes Bisection fails and that's OK.
return None
while (b - a) / 2.0 > tol:
c = (a + b) / 2.0
fc = f(c)
if abs(fc) < tol:
return c
elif fc * f(a) < 0:
b = c
else:
a = c
return (a + b) / 2.0
def calculate_IV(row):
"""Find implied volatility using the bisection method."""
def CallPriceVol(vol):
#print(vol, CallPrice(S, vol, row["strike"], T, r) - row["mid"], row["mid"])
return CallPrice(S, vol, row["strike"], T, r) - row["mid"]
# Bisection method with a reasonable range for implied volatility
iv = bisection(CallPriceVol, 0.01, 5)
return iv
In [34]:
So, lets put these new functions to use to calculate our implied volatility!
In [35]:
# Constants
S = 223.15 # Current price
T = 10/365 # Time to maturity, 10 days
r = 0.0431 # SOFR at time of writing 0.0431
# filter: low, high, volume, specific values
filter = (190, 250, 0, [])
filtered_df = filter_df(original_df, filter)
# Apply to dataframe
filtered_df.loc[:, "IV"] = filtered_df.apply(calculate_IV, axis=1)
plot_IV(filtered_df, "IV", 30)
In [36]:
Noting the volatility smile :) Well, it's a bit messy - I'll clean it up with a cubic spline.
In [37]:
spl = splrep(filtered_df["strike"].values, filtered_df["IV"].values, k=5, s=0.01)
filtered_df["smoothed_IV"] = splev(filtered_df["strike"].values, spl)
# Plot the result
plt.plot(filtered_df["strike"].values, filtered_df["smoothed_IV"].values, label="Spline Fit")
plt.plot(filtered_df["strike"].values, filtered_df["IV"].values, label="Observed Data", linestyle='--')
plt.xlabel("Strike Price")
plt.ylabel("Implied Volatility")
plt.title("Spline Fit of Volatility Smile")
plt.legend()
plt.show()
In [38]:
Sanity check: Comparing this to the implied volatility on Interactive Brokers, I confirm the validity of my results. Now we can interpolate! I generate some thousand data points for strike and IV.
In [39]:
interpolated_df = pd.DataFrame({
"strike": np.linspace(filtered_df["strike"].iloc[0], filtered_df["strike"].iloc[-1], 1000)
})
interpolated_df['IV'] = splev(interpolated_df["strike"].values, spl)
In [40]:
Now let's go back to Black Scholes and convert back to price space
In [41]:
interpolated_df["mid"] = interpolated_df.apply(
lambda row: CallPrice(S, row["IV"], row["strike"], T, r), axis=1
)
plot_price(interpolated_df, 0.1)
In [42]:
'''More functions :) '''
def breeden_litzenberger_PDF(S, sigma, strike, T, r, delta_K=0.001):
'''Calculate the second derivative which gives us our Breeden Litzenberger PDF.'''
C_plus = CallPrice(S, sigma, strike + delta_K, T, r)
C_minus = CallPrice(S, sigma, strike - delta_K, T, r)
C = CallPrice(S, sigma, strike, T, r)
scaling_factor = np.exp(r * T)
breeden_litzenberger_PDF = scaling_factor * (C_plus - 2 * C + C_minus) / (delta_K ** 2)
return breeden_litzenberger_PDF
def integrate_BL_PDF(df, normalized, strike_low, strike_high):
'''Intergrate our Breeden Litzenberger PDF'''
if normalized == True:
type = "breeden_litzenberger_PDF_normalized"
else:
type = "breeden_litzenberger_PDF"
valid_strikes = (df['strike'] >= strike_low) & (df['strike'] <= strike_high)
# Use Simpson's rule directly on the valid data points
return simpson(y=df.loc[valid_strikes, type], x=df.loc[valid_strikes, 'strike'])
def integrate_butterfly_PDF(df,normalized, strike_low, strike_high):
'''Intergrate our Butterfly PDF'''
if normalized == True:
type = "butterfly_p_normalized"
else:
type = "butterfly_p"
df_without_na = df.dropna(subset=[type, 'strike'])
valid_strikes = (df_without_na['strike'] >= strike_low) & (df_without_na['strike'] <= strike_high)
max_payoff = df["strike"].iloc[1] - df["strike"].iloc[0]
return simpson(y=df_without_na.loc[valid_strikes, type], x=df_without_na.loc[valid_strikes, 'strike'])/max_payoff
In [43]:
First I find our Breeden Litzenberger probability distribution function. Then I integrate under the curve.
In [44]:
interpolated_df["breeden_litzenberger_PDF"] = interpolated_df.apply(
lambda row: breeden_litzenberger_PDF(S, row["IV"], row["strike"], T, r), axis = 1
)
print("Breeden Litzenberger PDF integral:", integrate_BL_PDF(interpolated_df, False, 0, 10000))
Breeden Litzenberger PDF integral: 1.0915516520802058
In [45]:
Back to our butterfly spreads. With the smoothing from earlier, and very narrow strikes derived from IV, we have resolved our earlier challenges.
In [46]:
butterfly_probs(interpolated_df)
total_butterfly_p = interpolated_df['butterfly_p'].sum()
print("Butterfly implied probability:", total_butterfly_p)
print("Butterfly PDF integral:", integrate_butterfly_PDF(interpolated_df, False, 0, 10000))
plot_PDFs(interpolated_df)
Butterfly implied probability: 0.9901427023950023 Butterfly PDF integral: 0.9901001252365365
In [47]:
I'm not entirely certain why the PDF's are different. In theory both our integrals should both be = 1.0. The Breeden Litzenberger seems a bit high. The shape of the curve is rather different; butterfly spreads appear to consider volatility to be lower than I have troubleshooted and remain uncertain. I opt to normalize both data sets so their total probability = 1.0.
In [48]:
BL_normalize_factor = integrate_BL_PDF(interpolated_df, False, 0, 10000)
butterfly_normalize_factor = integrate_butterfly_PDF(interpolated_df, False, 0, 10000)
interpolated_df["breeden_litzenberger_PDF_normalized"] = interpolated_df["breeden_litzenberger_PDF"]/BL_normalize_factor
interpolated_df["butterfly_p_normalized"] = interpolated_df["butterfly_p"]/butterfly_normalize_factor
In [49]:
But now, we have everything we need to estimate probability of the stock price expiring at any given price. Looking at AAPL, current stock price 223.15, contract expiry in 10 days.
In [50]:
a = 220
b = 225
print("What's the probability the stock price expires between", a, "and", b,"?")
print("Breeden Litzenberger:", round(integrate_BL_PDF(interpolated_df, True, a, b), 3))
print("Butterfly:", round(integrate_butterfly_PDF(interpolated_df, True, a, b), 3))
print("")
What's the probability the stock price expires between 220 and 225 ? Breeden Litzenberger: 0.188 Butterfly: 0.218
In [51]:
What's the probability the stock price expires between 215.35 and 218.0 ? Breeden Litzenberger: 0.077 Butterfly: 0.073 What's the probability the stock price expires below 210 ? Breeden Litzenberger: 0.156 Butterfly: 0.074
In [52]:
What are the next steps? - Resolve what appears to be likely errors with Breeden Litzenberger PDF - Integrate puts for more data points - In constructing the implied volatility curve, I could weight influence based on option activity Ie, low-volume strikes should be less influential than high-volume strikes - Develop my trading strategy based on options-implied probability - Pretty simple... if p(future_price > current_price) > 0.5, size your long bet :)
In [58]:
Options are fun Kauri Beckmann Published 26/03/2025 kauri.b@outlook.co.nz
In [ ]: