I have a new fun thing I’ve been asked to do. I’m trying to do an analysis of the number of times certain words appear in a field. I’ll need to join with other tables to get groups and counts.
I think the best way to do this would be to create a temp table with all of the words from a field in another table. For example, I need to take the field ‘Company Name’ and break that into a table that would have two entries for that, ie; ‘Company’, ‘Name’.
I found this online, but I don’t have a clue how to use it. Where do I reference the table and field that I’m trying to get the words out of?
I’m not sure that’s going to do what you want, but you can give it a try.
A function is similar to a stored procedure. It’s compiled in SQL server and you can call it within your query. So if you have a table named “table1” with a column named “bubba”, you would call the function like this:
select * from table1 cross apply dbo.SplitWords(bubba)
It looks like all it’s going to return is a big list of the individual words in the column and their positions within the string.
This isn’t going to do anything to our production database like add a permanent new table, is it? Will I need to add a drop table at the end of it all?
Oh cheese! You shouldn’t be doing anything in your production database if you can keep from it. Yes, it’s going to create a permanent function (that can be dropped), but I would advise that you copy your data/database to a development SQL instance to do this. No one should be running ad hoc queries in production.
Gah. I’m not sure how I can do this, then. Woodman’s idea of separating them in Excel worked, but I’m left with a column with about 150,000 rows, and no way to get them into a temp table in SQL server so I can join to other tables for some analysis. Heck, I can’t even find a way to get Excel to tell me how many times these words show up!
Don’t you have a development/test server somewhere? I’m not concerned that you would mess up the data, but queries returning large amounts of data can drag production to its knees. I guess it’s possible that your don’t work in an environment where that’s a concern. I’m working from my own view point here, and we don’t let anyone do things in prod unless we have to, and then only the DBAs get to do that.
Do you have a DBA around who can do a database copy to a sandbox for you? That would be the ideal situation.
I just asked about that. My boss doesn’t seem too concerned about running it in production. He thinks that I should be able to do it all in SQL, so I just get to figure it out. Once I get the query/function to the point where it should work, I’ll run it by him before I actually run it, though, just to be sure.
What you’ll have to do is run the CREATE FUNCTION statement that you found. It will do just that, create a function. Once the function is in SQL Server, you can run your queries. Once you have the results you want, you will drop the function:
Just one note, as it says, the CREATE FUNCTION statement needs to go first in any batch, as does the DROP FUNCTION statement. As ridiculous as it sounds, that is possible using the GO statement
CREATE FUNCTION SplitWords
AS statements
GO
SELECT table.column
FROM table
CROSS APPLY SplitWords(column)
WHERE condition
I don’t know if this helps but I’ve had to deal with a crapton of data exported out by horrible people into Excel. I’ve found ASAP Utilites to be very handy in cleaning up the data into something manageable. I know most of the functions can be done through Excel but I just don’t have the time to fiddle with it. It makes it quick and easy (to a point).
If you go the Excel route, or just want to play with what you already have exported, try Pivot Tables. I think that will get you what you want - how many of each word are in a column. I’m not the office Excel guru, but have seen him do nifty things with it.
Disclaimer: I hate that I’m doing this. It’s inelegant and embarrassing, but I’m kind of pressed for time this morning. This will at least keep you from having to install a function. This uses a beast called a cursor that operates on a thing called RBAR (row by agonizing row). Not good. Below there’s a line that reads “declare feed cursor for”. The next line is where you’ll enter the column you’re looking at and the table where it exists. This will only give you a list of the words it finds and their positions, but it’s a temp table and you can do things with it:
create table #words (pos int, value varchar(max))
truncate table #words
set nocount on
DECLARE
@pos int,
@i int,
@j int,
@s varchar(max), @text varchar(max)
declare feed cursor for
***select < column > from < table >***
open feed
fetch next from feed into @text
while @@fetch_status = 0
begin
SET @pos = 1
WHILE @pos <= LEN(@text)
BEGIN
SET @i = CHARINDEX(' ', @text, @pos)
SET @j = CHARINDEX(',', @text, @pos)
IF @i > 0 OR @j > 0
BEGIN
IF @i = 0 OR (@j > 0 AND @j < @i)
SET @i = @j
IF @i > @pos
BEGIN
-- @i now holds the earliest delimiter in the string
SET @s = SUBSTRING(@text, @pos, @i - @pos)
INSERT INTO #words
VALUES (@pos, @s)
END
SET @pos = @i + 1
WHILE @pos < LEN(@text)
AND SUBSTRING(@text, @pos, 1) IN (' ', ',')
SET @pos = @pos + 1
END
ELSE
BEGIN
INSERT INTO #words
VALUES (@pos, SUBSTRING(@text, @pos, LEN(@text) - @pos + 1))
SET @pos = LEN(@text) + 1
END
END
fetch next from feed into @text
end
close feed
deallocate feed
select * from #words
Edit: Sorry, I don’t know what’s going on with the font.
(edit by lee - I got dis!)
Edit: Thanks, that looks so much better (saving this in case I need to do something similar later)