Is it Possible to Know the Daily High or Low Intraday with 80% Accuracy?

Introduction

This is an old concept concerning the opening range. The idea is that the opening range often sets the day’s high or low within the first hour of cash equities trading (9:30 am - 10:30 am EST). Recently a trader on [Youtube] made the claim that you can know with 88% probability the high or low of the day after the first hour of trading. He managed to successfully repopularize the idea of using the opening range in a a more specific way than other methods.

In this article I set out trying to validate or reject this claim with the available intraday data I have. Ideally, if this claim is true, there should be a methodology or mechanical trading approach to exploit this phenomenon.

This idea seemingly has evidence to support it. There was paper published called [Exploratory Trading] by Adam D. Clark-Joseph where he examined the existence and value of exploratory trading models by HFT participants. He concluded that:

This paper presents empirical evidence that HFTs use exploratory trading to obtain part of the superior information that enables them, among other things, to profitably predict price movements

- Adam D. Clark-Joseph

Reading the paper, he further suggests that the market activity by these specific firms is most active during the NY cash session. We can then assume since the majority of the volume occurs in the first 30 minutes to an hour of trading and that HFT firms generate the majority of transaction volume, that the exploratory algorithms are indeed active and contribute to determining and finding the areas of liquidity. Thus these areas can act as reference points to estimate the probability of the high or low having already occurred.

Claim Number 1

To reiterate, the primary claim is that by 10:30 each day the high or low of the day has already been priced with 88% probability.

Claim Number 2

It’s unclear to me but I think the actual claim for the 88% probability rule is based on the rule set he specifies. Using 5 minute bars, by 10:30 am EST the market has created a “defining range” or in other communities, “initial balance”/”opening range”. If the m5 candle has a close above (below) the “defining range” high (low), then the low (high) of the range should not get traded through with 88% probability.

Experimental Setup

I have 1 minute intraday data for 108 different ETFs ranging from a minimum of 2004 until 2019. Based on the data we can calculate an estimate of the probability of each of these claims. To be fair there are limitations to this research. Data stops at 2019, and his claim was specifically for futures assets on days without major economic news releases. He doesn’t specify which economic news releases to ignore so I include all the available data.

First I will use SPY as the canonical example as it is most closely related to ES and the primary trading vehicle he uses (ES/SPX).

To answer claim 1, I simply extract all the daily (intraday) highs and lows that occurred in the first hour of trading, sum their lengths and divide by the length of the dataframe.

To answer claim 2, I iterate over each day computing the “defining range” statistics and then collecting all the information about first range breakout, direction, and if the other side of the range was broken or if neither side of the range was broken. I also collect the event timestamps for each of these scenarios.

SPY Result (Claim 1)

For SPY I had intraday data from May 14, 2004 until May 14, 2019 or 3774 rows. percentage of time either a high or low for the day was made in the first hour during that period was:

spy result for for first hour high or low

So we can say that over this period, that claim is mostly true as ~75% of the time SPY did make the high or low in the first hour.

SPY Result (Claim 2)

Before answering, I wanted to provide and example image of the defining range and what it looks like when extended during a trading day.

There are 2 consecutive trading days in SPY. The black lines represent the high and low formed during the first hour. The dotted red lines form the highest open or close and the lowest open or close for the same period. It is essentially the highs and lows of the candle bodies.

We can see in these examples that, on both days, the “defining range” high was broken and the low that formed during that first hour was never broken the remainder of the day.

Ok now for the results.

The first series is the aggregated data for SPY over the data period. Bull means that the “defining range” high was broken first and the defining range low was in fact the low of the day. Bear is the reverse. Both is the situation where both the defining range high and low were broken in the same session. None is the situation in which price traded within the defining range high and low.

If we add the bull and bear cases, we see that accuracy is approximately 76%. It is close to 80% but far from 88% as claimed in the video. That’s not a bad result but a little far from the initial claim. As stated before there is room for error however in that this is SPY ETF data instead of futures, and the data stops at 2019. Depending on the range of dates computed, perhaps the accuracy has improved dramatically over the last 4 years.

We can see that the contribution to opening range concept is that after the first hour, essentially once the defining range high or low is broken there is more clarity on which side the probabilities lie.

Bonus - What about the other 107 ETFs?

I ran the same analysis for the other ETFs to see what the aggregate data said.

AGGREGATE First Hour

distribution of the percentage that intraday highs and lows are set in the first hour

First thing to note here is the data is obviously skewed with some showing over 100%. This is for those ETFs with likely small ranges and/or low liquidity as both the high and the low are frequently set in the first hour and never violated for the rest of the session. Due to situations like this we see the mean is skewed slightly above 90%.

DISTRIBUTION OF THE PERCENTAGE OF HIGHS AND LOWS SET IN THE FIRST HOUR AFTER CONFIRMATION (REMOVES DOUBLE COUNTING)

In the second chart, we can see the effects of removing double counting. The distribution shifts to left skewed and the average is about 75% now (which is still high) but far from the 90% seen in the other chart or the 88% claim. It’s interesting how different these distributions are. I think the following example illustrates why this discrepancy.

Note SHV.

SHV PERCENT HIGH LOW IN FIRST HOUR (TOP 10)

SHV PERCENT HIGH OR LOW AFTER CONFIRMATION OUTSIDE OF DEFINING RANGE (BOTTOM 10)



SHV is the furthest to the far right of the Percent Highs and Lows First Hour chart and the furthest left to the Percent Highs and Lows After Defining Range chart. This happens as a result of SHV either having incorrect data or not trading on some days. When examining the data, SHV has it’s daily high and low set at 9:30 on many days or within the first hour. So in the first chart these days are double counted thus over 100% “accuracy”. On the second chart these days are moved to the none category which is the highest for SHV at ~56% of the time.

TOP 10 NONE CATEGORY MEANING NO M5 CLOSES OUTSIDE OF THE DEFINING RANGE (FIRST HOUR).

Thus less than half the time does SHV actually have breakouts out of the opening range.

This brings up another question. Should we include the none days in our count of accuracy in that high and low were set within the first hour and never traded outside the range? I think it depends on your goal or strategy.

For example if we include the none category in our SPY calculation that brings the accuracy to 81% which is much closer to the claimed 88%. Here’s what the distribution looks like if we include the none category for the other ETFs.

distribution of percentage highs and lows occuring in the first hour including days where both the high and low were set in the same first hour (no double counting).

We can see that in this aggregate distribution that the claim of 88% accuracy has validity when including days where both the high and the low were set in the first hour. Interesting result.

Final Thoughts

Before I heard the claim of 88% accuracy of the high or low I was actually a little unfamiliar with the concept of the opening range and the significance in terms of setting the high or low of the day. So the better question becomes can we create a mechanical strategy to exploit these probabilities? And if so what kinds can be created?


Are you interested in the code for this experiment? Would you be interested in joining a Discord that shares ideas, strategies and code? I’m soliciting feedback for an upcoming project. Please let me know in the comments or via email. I’m also available for contract work if you have a project you would like to collaborate on. [bcr@blackarbs.com]