Hmmm. I have some explaining to do here. The term “rowcount” is misleading you. All we’re using it for is to populate the tally table. Now, forget about rows because it’s pretty meaningless from this point on…
The 12000 is used to keep track of the position in the string of the beginning of each word. You didn’t say anything about how long the strings were going to be, and I picked 12000 because I couldn’t imagine that you would be checking a string longer than 12000 characters and figured that would be overkill. I totally nailed it (yay me) if all you have in a string is four words unless you have some excruciatingly long words.
When you set rowcount = 0, you basically told it to populate the tally table with everything it could find. So if “master.sys.allcolumns” has 6313 rows, it’s going to use a cross join to populate the tally table with 6313 * 6313 or 39853969 values. That’s the Cartesian product I was talking about, and I thought 12000 was overkill.
So, let’s throw away the first way I sent for building the tally table and use the other one so we don’t have the word “rowcount” throwing us a curve:
-- We're going to use this variable to tell how long the string
-- data is (previously, it was 12000)
declare @str_len int = 128
with things(bob) as
(select 1 as bob
union all
select bob + 1 from things where bob < @str_len
)
select bob into #tally from things option(maxrecursion @str_len)
-- That has populated the tally table with 128 rows and it is plenty big enough
-- unless we have a string that is longer than that. Now, we will use it to split
-- the string data. Do not forget to keep forgetting about rows.
-- Note that we are looking here at the "substring" of the string data. "bob" is
-- telling substring the character positions of each blank space in the string.
select substring(< column >, bob, charindex(' ', < column > + ' ', bob) - bob) as word
into #temptable
from #tally, < table >
where bob <= convert(int, len(< column >))
and substring(' ' + < column >, bob, 1) = ' '
It takes the string data of the entire table as a whole and splits each string. As dakboy said, “the query resulted in a massive improvement (we’re talking 8 minutes down to 90 seconds, and a huge reduction in I/O counts & a much improved execution plan).” That’s because it’s processing the data as a whole rather than one row at a time, and it doesn’t care how many rows are there. 1, 2, 12000, 15M. It doesn’t make any difference.
I hope that clears things up a bit better. From now on when I post some code, I’ll try to remember to explain it fully. If there’s anything that you’re not sure about, sing out.