SQL and Database: Splitting a column into individual words?

I know I’m asking a lot of questions that really expose just how ignorant I am, but hey, I may as well do it now before I’m expected to know it all, right? I do plan to take some classes, but the earliest class doesn’t start until end of May.

Changing the initial rowcount to 128 caused the query to go from over a minute and a half to about 10 seconds.

So, now I have:
SET ROWCOUNT 128
SET NOCOUNT ON

SELECT IDENTITY (INT, 1, 1) bob
  INTO #tally
  FROM MASTER.sys.all_columns c
 cross join MASTER.sys.all_columns

SET ROWCOUNT 0
SET NOCOUNT ON

(SELECT statement and the joins and stuff I added)

The SELECT IDENTITY part is creating an automatically incrementing number into column bob. How is this the location of the space in the string?
What is sys.all_columns? I’m guessing that sys.all_columns c1 is a way to differentiate the columns when the cross join happens.
Where are we comparing items in the master table to itself?

Again, thanks for all of the help!

I’m reading the article. I love this line:

Just occasionally, usually when you’re
searching through text, or you’ve inherited the work of someone whose major talents lie in other areas of life, you need
other magic.

sys.all_columns is a thing called a view, sort of a virtual table that is defined by a SELECT statement. It’s just a list of all the columns in the database. I regret using it and suggest you lose that version of generating the tally table. It’s information you’ll never need unless you become a DBA someday.

The format for the SUBSTRING function is:
SUBSTRING ( expression ,start , length ) – See the technet description at http://technet.microsoft.com/en-us/library/ms187748.aspx

The format for the CHARINDEX function is:
CHARINDEX ( expressionToFind ,expressionToSearch [ , start_location ] ) – http://technet.microsoft.com/en-us/library/ms186323.aspx

Let’s say you have a table named “S_TABLE” and a varchar column named “VARCHAR_COL”. The query would then be:

select substring(VARCHAR_COL, bob, charindex(' ', VARCHAR_COL + ' ', bob) - bob) as word
  into #temptable
  from #tally, S_TABLE
 where bob <= convert(int, len(VARCHAR_COL))
   and substring(' ' + VARCHAR_COL, bob, 1) = ' '

In the first line, CHARINDEX is used to find the next blank space in VARCHAR_COL starting at bob. If it finds the first blank space at position 5, then that’s the value of bob that it uses. Next in SUBSTRING, it takes the part of the string starting at bob = 1 because SUBSTRING doesn’t know anything about what CHARINDEX is doing, only what it’s returning. So what we end up with is:

select substring(VARCHAR_COL, 1, charindex(' ', VARCHAR_COL + ' ', 1) - 1) as word

or using the value that CHARINDEX returns:

select substring(VARCHAR_COL, 1, 5 - 1) as word

That means that the first substring is the first 4 characters of the string, but it also does this for the next iteration of bob which is 6. It will continue to do this for the entire string as long as the conditions in the WHERE clause are being met. And it isn’t just doing that for the one string, it’s doing it for every value of VARCHAR_COL in the entire table. You can see how bob is being used by running this version of the query:

select bob, substring(VARCHAR_COL, bob, charindex(' ', VARCHAR_COL + ' ', bob) - bob) as word
  from #tally, S_TABLE
 where bob <= convert(int, len(VARCHAR_COL))
   and substring(' ' + VARCHAR_COL, bob, 1) = ' '

I hope that clears things up. If you play with SUBSTRING and CHARINDEX separately, it will help you understand how they work together.

BTW, I hope everyone realizes by this time that there are no stupid questions where SQL is concerned and it’s impossible to show your ignorance about it. Can I get an Amen?

Amen, and boy am I learning a lot! :smiley:

This is odd. Based on @Brazil’s advice, I changed the top portion to:

DECLARE @str_len AS INT = 65

;WITH for_tally (spc_loc) AS
    (SELECT 1 AS spc_loc
    UNION ALL
    SELECT spc_loc + 1 from for_tally WHERE spc_loc < @str_len
    )
    SELECT spc_loc into #tally from for_tally OPTION(MAXRECURSION @str_len)

I changed bob to spc_loc and things to for_tally because that seems to describe them. I chose 65 for str_len because that is the length of the field with the names to be split into words.

Just forget all of my edits until I finish my research. I’m getting closer!''Okay, why am I getting an error saying "Incorrect syntax near ‘@str_len’? The first use has no problems WHERE spc_loc < @str_len, but the second maxrecursion @str_len is not happy. I can’t find anything that would cause this after 15 minutes of searching.

Edit: I read that the default maxrecursion is 100, so I eliminated that line entirely, and the query ran in 4 seconds.

That’s my fault. I was trying to simplify things for you, but I didn’t check carefully enough. It looks like you can’t use a variable when setting the option. I don’t know why, but I’ll try to research it. Use this instead. You’ll have to set the string length in two places instead of one:

;WITH for_tally (spc_loc) AS
    (SELECT 1 AS spc_loc
    UNION ALL
    SELECT spc_loc + 1 from for_tally WHERE spc_loc < 65
) 
SELECT spc_loc into #tally from for_tally OPTION(MAXRECURSION 65)

Per your edit: yes, for what you’re looking at right now, you can forget setting the maxrecursion at all, but if you’re saving all this stuff, be sure to make a note that you’ll have to do more than that for longer strings.

EDIT: I finally had time to look at it some more. The technet page at http://technet.microsoft.com/en-us/library/ms181714.aspx says:
“Specifies the maximum number of recursions allowed for this query. number is a nonnegative integer between 0 and 32767. When 0 is specified, no limit is applied. If this option is not specified, the default limit for the server is 100.”

So just always set maxrecursion to 0 and it’ll take whatever you’ve specified in the WHERE clause. I haven’t found any documentation that says you can’t use a variable, only ugly rumors.