SQL and Database: Splitting a column into individual words?

I have a new fun thing I’ve been asked to do. I’m trying to do an analysis of the number of times certain words appear in a field. I’ll need to join with other tables to get groups and counts.

I think the best way to do this would be to create a temp table with all of the words from a field in another table. For example, I need to take the field ‘Company Name’ and break that into a table that would have two entries for that, ie; ‘Company’, ‘Name’.

I found this online, but I don’t have a clue how to use it. Where do I reference the table and field that I’m trying to get the words out of?

Any help would be appreciated!

Export to Excel, countif.

I’m done. Maybe a Vlookup in there.

I’m not sure that’s going to do what you want, but you can give it a try.

A function is similar to a stored procedure. It’s compiled in SQL server and you can call it within your query. So if you have a table named “table1” with a column named “bubba”, you would call the function like this:

select * from table1 cross apply dbo.SplitWords(bubba)

It looks like all it’s going to return is a big list of the individual words in the column and their positions within the string.

So, it would look something like this?
SELECT *
FROM table_name
WHERE condition

CROSS APPLY @words [that looks like what the table is named in the function] (column_name)

[Copy function here]

Edit: A big list of the individual words is exactly what I need.

It would be more like:

select * from table_name
cross apply splitwords(columnname)
where condition.

And you don’t have to select *. Just grab some column that will tell you which row it came from.

Edit: I didn’t explain that very well:
select t.col1, sw.*
from table_name t
cross apply splitwords(columnwithtext) sw
where (some sort of condition)

I got an error saying "CREATE FUNCTION’ must be the first statement in a query batch.

I have:
SELECT table.column
FROM table

CROSS APPLY SplitWords(column)

[pasted create function here]

WHERE condition

This isn’t going to do anything to our production database like add a permanent new table, is it? Will I need to add a drop table at the end of it all?

Oh cheese! You shouldn’t be doing anything in your production database if you can keep from it. Yes, it’s going to create a permanent function (that can be dropped), but I would advise that you copy your data/database to a development SQL instance to do this. No one should be running ad hoc queries in production.

Gah. I’m not sure how I can do this, then. Woodman’s idea of separating them in Excel worked, but I’m left with a column with about 150,000 rows, and no way to get them into a temp table in SQL server so I can join to other tables for some analysis. Heck, I can’t even find a way to get Excel to tell me how many times these words show up!

Don’t you have a development/test server somewhere? I’m not concerned that you would mess up the data, but queries returning large amounts of data can drag production to its knees. I guess it’s possible that your don’t work in an environment where that’s a concern. I’m working from my own view point here, and we don’t let anyone do things in prod unless we have to, and then only the DBAs get to do that.

Do you have a DBA around who can do a database copy to a sandbox for you? That would be the ideal situation.

I just asked about that. My boss doesn’t seem too concerned about running it in production. He thinks that I should be able to do it all in SQL, so I just get to figure it out. Once I get the query/function to the point where it should work, I’ll run it by him before I actually run it, though, just to be sure.

If they’re okay with it…

What you’ll have to do is run the CREATE FUNCTION statement that you found. It will do just that, create a function. Once the function is in SQL Server, you can run your queries. Once you have the results you want, you will drop the function:

DROP FUNCTION SplitWords.

It’ll be like it never existed.

Thanks for all of the help. It’s the end of my work day, so I’ll run it by him tomorrow.

Just one note, as it says, the CREATE FUNCTION statement needs to go first in any batch, as does the DROP FUNCTION statement. As ridiculous as it sounds, that is possible using the GO statement

CREATE FUNCTION SplitWords
AS
statements

GO

SELECT table.column
FROM table
CROSS APPLY SplitWords(column)
WHERE condition

GO

DROP FUNCTION SplitWords

If you have a specific string you are looking for you can do =countif(A:A,“string”)

Assuming your data is in A. Or you could even do a subtotal and count them that way.

The way most of my cow-orkers would do it is to sort them then highlight count them.

I need to count the occurrence of all of them. Highlight and count just doesn’t seem practical for 150,000 rows…

I don’t know if this helps but I’ve had to deal with a crapton of data exported out by horrible people into Excel. I’ve found ASAP Utilites to be very handy in cleaning up the data into something manageable. I know most of the functions can be done through Excel but I just don’t have the time to fiddle with it. It makes it quick and easy (to a point).

If you go the Excel route, or just want to play with what you already have exported, try Pivot Tables. I think that will get you what you want - how many of each word are in a column. I’m not the office Excel guru, but have seen him do nifty things with it.

Pivot tables only work if you have more than one column…

Disclaimer: I hate that I’m doing this. It’s inelegant and embarrassing, but I’m kind of pressed for time this morning. This will at least keep you from having to install a function. This uses a beast called a cursor that operates on a thing called RBAR (row by agonizing row). Not good. Below there’s a line that reads “declare feed cursor for”. The next line is where you’ll enter the column you’re looking at and the table where it exists. This will only give you a list of the words it finds and their positions, but it’s a temp table and you can do things with it:

create table #words (pos int, value varchar(max))
truncate table #words

set nocount on
   DECLARE
      @pos int,
      @i int,
      @j int,
      @s varchar(max), @text varchar(max)

declare feed cursor for
    ***select < column > from < table >***

open feed

fetch next from feed into @text
while @@fetch_status = 0
begin

   SET @pos = 1
   WHILE @pos <= LEN(@text) 
   BEGIN 
      SET @i = CHARINDEX(' ', @text, @pos)
      SET @j = CHARINDEX(',', @text, @pos)
      IF @i > 0 OR @j > 0
      BEGIN
         IF @i = 0 OR (@j > 0 AND @j < @i)
            SET @i = @j

         IF @i > @pos
         BEGIN
            -- @i now holds the earliest delimiter in the string
            SET @s = SUBSTRING(@text, @pos, @i - @pos)

            INSERT INTO #words
            VALUES (@pos, @s)
         END 
         SET @pos = @i + 1

         WHILE @pos < LEN(@text) 
            AND SUBSTRING(@text, @pos, 1) IN (' ', ',')
            SET @pos = @pos + 1 
      END 
      ELSE 
      BEGIN 
         INSERT INTO #words 
         VALUES (@pos, SUBSTRING(@text, @pos, LEN(@text) - @pos + 1))

         SET @pos = LEN(@text) + 1 
      END 
   END 

    fetch next from feed into @text
end
close feed
deallocate feed

select * from #words

Edit: Sorry, I don’t know what’s going on with the font.

(edit by lee - I got dis!)

Edit: Thanks, that looks so much better (saving this in case I need to do something similar later)