SQL and Database: Cursors and running totals?

Here’s my latest assignment. Take total number of orders for the last three months, divide the number by 4, and divide the account numbers into those four batches. It’s been suggested that I use a cursor to get this to happen.

So far, I have this:

SELECT account_#
,COUNT (DISTINCT Order#) 'Order Count'

FROM table_1

WHERE order_date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP

GROUP BY account_#
ORDER BY account_#

From what I’ve read, I would add something like this to the beginning?

DECLARE @cursorname INSENSITIVE CURSOR
FOR 

But how do I get it to stop at 1/4 of the orders and start a new batch?

First you’ll have to get a count of all the account numbers that placed orders for the three month period and divide that number by four:

declare @incr_accts int
select @incr_accts = count(distinct account_#) / 4
  from table_1
 where order_date between dateadd(M, -3, current_timestamp) and current_timestamp

As it runs through the cursor, you’ll keep a running count of each account number it processes. When that count reaches @incr_accts, you’ll reset it to 0.

I have to say that I’m not a fan of cursors. Sometimes there’s no other way than to put things through a loop, but I think there’s probably a more elegant way of doing this. I’ll think about it a bit.

Well, this ain’t all that elegant, but it keeps you from having to loop through a cursor:

create table #repos (rownum int not null, acct_num varchar(128) not null, order_count int not null, batch int null)
insert #repos (rownum, acct_num, order_count)
select row_number() over( order by account_#), account_#, count(Order#)
  from table_1
 where order_date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
 group by account_#
 order by account_#

declare @batch int
select @batch = count(*) / 4 from #repos

update #repos set batch = 1 where rownum <= @batch
update #repos set batch = 2 where rownum <= (@batch * 2) and batch is null
update #repos set batch = 3 where rownum <= (@batch * 3) and batch is null
update #repos set batch = 4 where batch is null


select sum(order_count), batch
  from #repos 
 group by batch

The way it works is to create a temporary table with the account numbers and the number of orders for each account. We use ROW_NUMBER() to populate the “rownum” column. We then get a count of the number of rows in the temp table and divide it by 4. Using that number, we set the “batch” column to be 1, 2, 3, or 4. Then you can get a total or orders for each batch. Is that what you were looking for?

EDIT: Forgot to put in your WHERE clause. Fixed now.

If a cursor is the first approach attempted/suggested for a SQL solution, it’s probably wrong.

Cursors are useful tools, when called for. 75% of the time when people use cursors, they’re not called for, and a set-based operation (did we already have the “SQL is a declarative language” and “to do SQL right, you have to think in sets, which is different from what most people are accustomed to” conversation yet?) will be much better.

Haven’t looked at the actual question yet, nor @Brazil’s answer, but I wanted to throw that out there.

Eleganter:

create table #repos (rownum int not null, acct_num varchar(128) not null, order_count int not null, batch int null)

declare @batch int
select @batch = count(distinct account_#) / 4 from table_1

insert #repos (rownum, acct_num, order_count, batch)
select row_number() over( order by account_#), account_#, count(Order#),
    case 
    when row_number() over( order by account_#) <= @batch then 1
    when row_number() over( order by account_#) <= (@batch * 2) then 2
    when row_number() over( order by account_#) <= (@batch * 3) then 3
    else 4 end
  from table_1
 where order_date BETWEEN DATEADD(M, -3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
 group by account_#
 order by account_#


select sum(order_count), batch
  from #repos 
 group by batch

I love you guys!

@Brazil, I was kind of thinking of a case statement when it was described to me but wasn’t sure how to do it. I think you just saved me hours of research.

I probably won’t get to test it until tomorrow, but I look forward to playing with it then.

In the cold light of a new dawn, I see it could have been done better:

declare @batch int
select @batch = count(distinct account_#) / 4 from table_1

;with repos (rownum, acct_num, order_count, batch) as
(
select row_number() over( order by account_#), account_#, count(Order#),
    case 
    when row_number() over( order by account_#) <= @batch then 1
    when row_number() over( order by account_#) <= (@batch * 2) then 2
    when row_number() over( order by account_#) <= (@batch * 3) then 3
    else 4 end
  from table_1
where order_date BETWEEN DATEADD(M, -3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
group by account_#
)
select sum(order_count), batch
  from repos 
 group by batch
order by batch

This uses a CTE to eliminate the temporary table. It should run more efficiently.

sigh

I need to get my running total right before I can play with @Brazil’s stuff. I’ve got running totals, but I need to get running totals ordered by the account numbers. The account numbers aren’t sequential, though, so I haven’t been able to figure out how to do it without the sequential key.

------------------------------
--Specimen count by Agency into temp table #Spec_Count
------------------------------
CREATE TABLE #Spec_Count
(
ID INT IDENTITY (1,1)
,Agency_Code INT
,Spec_Count INT
)

INSERT INTO #Spec_Count (Agency_Code,Spec_Count)
(SELECT S.Agency_Code
,COUNT (DISTINCT S.Specimen_ID)

FROM dbo.REPORT_LOG AS RL
INNER JOIN dbo.SPECIMEN AS S
ON RL.Specimen_Id = S.Specimen_ID

WHERE RL.Report_Generate_Date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP

GROUP BY S.Agency_Code
)
------------------------------
--View temp table
------------------------------
SELECT *
FROM #Spec_Count
------------------------------
--Running Total
------------------------------
;WITH 
CTE_RunningTotal
AS
(
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count AS running_total
    FROM #Spec_Count AS SC
    WHERE SC.ID = 1
    UNION all
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count + C.running_total AS running_total
    from CTE_RunningTotal AS C
        inner join #Spec_Count AS SC ON SC.ID = C.ID + 1
)
SELECT C.Agency_Code, C.Spec_Count, C.running_total
FROM CTE_RunningTotal AS C
OPTION (maxrecursion 0) 

I think I see what you’re trying to do. Maybe. Try running your CTE like this:

;WITH 
CTE_RunningTotal (ID, Agency_Code, Spec_Count, running_total)
AS
(
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count AS running_total
    FROM #Spec_Count AS SC
    WHERE SC.ID = 1
    UNION all
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count + C.running_total AS running_total
    from #Spec_Count AS C
        inner join #Spec_Count AS SC ON SC.ID = C.ID + 1
)
SELECT Agency_Code, Spec_Count, running_total
FROM CTE_RunningTotal
OPTION (maxrecursion 0)

Hmmm… Not quite. I need to order by agency_code and have a running total on that rather than the ID. For example, if I run the CTE as is, I get something like this;

Agency_Code    Spec_Count    running_total
3581     1    1
123627    2     3
20561     2     5
16199     5     10
110677     15     25

If I run it with an order by Agency_Code, I get something like this;

Agency_Code    Spec_Count    running_total
20                      29                  919451
31                      7                   972032
41                      292                 118999
44                      165                 499448
50                      117                 1094451

I need the running total by agency code.

Oh. The only thing I see is where you’re populating #Spec_Count. Try ordering the agency_code when you’re doing that:

INSERT INTO #Spec_Count (Agency_Code,Spec_Count)
SELECT S.Agency_Code
,COUNT (DISTINCT S.Specimen_ID)
FROM dbo.REPORT_LOG AS RL
INNER JOIN dbo.SPECIMEN AS S
ON RL.Specimen_Id = S.Specimen_ID

WHERE RL.Report_Generate_Date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP

GROUP BY S.Agency_Code
Order by S.Agency_Code

That way your agency_codes will be smallest-to-largest and your ID identity will be sequential.

Tried that. Got error "Incorrect syntax near the keyword ‘ORDER’. :frowning:

This is just speculation, but in your original query, you had parentheses around the SELECT statement. I didn’t put them in the rewrite, but I suspect you still have either the opening or closing member somewhere. That’s about all I can think that might give that kind of error.

やった!

That was it!

This now works the way I want it to. It gives me a running total by agency_code. Now that I have this, I can start working the case statement to give me the first agency_code where the running_total is <= 1/4 of the total specimen count. :smiley:

------------------------------
--Specimen count by Agency into temp table #Spec_Count
------------------------------
CREATE TABLE #Spec_Count
(
ID INT IDENTITY (1,1)
,Agency_Code INT
,Spec_Count INT
)

INSERT INTO #Spec_Count (Agency_Code,Spec_Count)
(SELECT S.Agency_Code
,COUNT (DISTINCT S.Specimen_ID)

FROM dbo.REPORT_LOG AS RL WITH (NOLOCK)
INNER JOIN dbo.SPECIMEN AS S WITH (NOLOCK)
ON RL.Specimen_Id = S.Specimen_ID

WHERE RL.Report_Generate_Date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP

GROUP BY S.Agency_Code
)
ORDER BY S.Agency_Code
------------------------------
--Running Total
------------------------------
;WITH 
CTE_RunningTotal
AS
(
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count AS running_total
    FROM #Spec_Count AS SC
    WHERE SC.ID = 1
    UNION all
    SELECT SC.ID, SC.Agency_Code, SC.Spec_Count, SC.Spec_Count + RT.running_total AS running_total
    from CTE_RunningTotal AS RT
        inner join #Spec_Count AS SC ON SC.ID = RT.ID + 1
)
SELECT RT.Agency_Code, RT.Spec_Count, RT.running_total
FROM CTE_RunningTotal AS RT
OPTION (maxrecursion 0)

If anyone is interested, I got the thing to work, if not completely elegantly for the batches.
The point of this mess was to get the total number of orders, then find out blocks of account numbers that made up 1/4 of the total order number. I’m going to work to get the last four blocks into a single query with batch number. Apparently the WHEN part of a case statement only works with an =?

------------------------------
--Order count by Agency into temp table #Order_Count
------------------------------
CREATE TABLE #Order_Count
(
ID INT
,IDENTITY (1,1)
,Acct_Num INT
,Order_Count INT
)
   INSERT INTO #Order_Count (Acct_Num,Order_Count)
  (
      SELECT S.Acct_Num
     ,COUNT (DISTINCT S.Order_ID)
  
     FROM dbo.REPORT_LOG AS RL 
     INNER JOIN dbo.ORDER AS S
     ON RL.Order_Id = S.Order_ID
    
     WHERE RL.Report_Generate_Date BETWEEN
     DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
 
     GROUP BY S.Acct_Num
 )
    
    ORDER BY S.Acct_Num
    
  ------------------------------
  --View #Order_Count table
  ------------------------------
  --SELECT *
  --FROM #Order_Count
  ------------------------------
  --Running Total
  ------------------------------
  ;WITH 
  CTE_RunningTotal
  AS
  (
        SELECT SC.ID, SC.Acct_Num, SC.Order_Count, SC.Order_Count AS running_total
        FROM #Order_Count AS SC
        WHERE SC.ID = 1
        UNION all
        SELECT SC.ID, SC.Acct_Num, SC.Order_Count, SC.Order_Count + RT.running_total AS running_total
    
        FROM
        CTE_RunningTotal AS RT
        INNER JOING#Order_Count AS SC ON SC.ID = RT.ID + 1
)
SELECT RT.ID, RT.Acct_Num, RT.Order_Count, RT.running_total
INTO #RunningTotal 
    
FROM CTE_RunningTotal AS RT
    
OPTION (maxrecursion 0)
------------------------------
--View #RunningTotal table
------------------------------
--SELECT * 
--FROM #RunningTotal
------------------------------
--Declare batch variable
------------------------------
DECLARE @batch INT
SELECT @batch = (SELECT COUNT (DISTINCT S.Order_ID)
   
    FROM dbo.REPORT_LOG AS RL
    INNER JOIN dbo.ORDER AS S
    ON RL.Order_Id = S.Order_ID
    
    WHERE RL.Report_Generate_Date BETWEEN
    DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP)/4
    
------------------------------
--Agency numbers by Batch 1
------------------------------
SELECT MIN(RT.Acct_Num) AS beg_acct#
,MAX(RT.Acct_Num) AS end_acct#
    
FROM #RunningTotal AS RT
    
WHERE RT.running_total <= @batch
------------------------------
--Agency numbers by Batch 2
------------------------------
SELECT MIN(RT.Acct_Num) AS beg_acct#
,MAX(RT.Acct_Num) AS end_acct#
    
FROM #RunningTotal AS RT
    
WHERE RT.running_total <= (@batch*2)
AND RT.running_Total >= @batch
------------------------------
--Agency numbers by Batch 3
------------------------------
SELECT MIN(RT.Acct_Num) AS beg_acct#
,MAX(RT.Acct_Num) AS end_acct#

FROM #RunningTotal AS RT
    
WHERE RT.running_total <= (@batch*3)
AND RT.running_Total >= (@batch*2)
------------------------------
--Agency numbers by Batch 4
------------------------------
SELECT MIN(RT.Acct_Num) AS beg_acct#
,MAX(RT.Acct_Num) AS end_acct#
    
FROM #RunningTotal AS RT
    
WHERE RT.running_total <= (@batch*4)
AND RT.running_Total >= (@batch*3)
------------------------------
--Data verification check - Count distinct Orders
------------------------------
--SELECT COUNT (DISTINCT S.Order_ID)

--FROM dbo.REPORT_LOG AS RL
--INNER JOIN dbo.ORDER AS S
--ON RL.Order_Id = S.Order_ID
     
--WHERE RL.Report_Generate_Date BETWEEN DATEADD(M, - 3, CURRENT_TIMESTAMP) AND CURRENT_TIMESTAMP
------------------------------
--Delete temporary tables
------------------------------
DROP TABLE #Order_Count
DROP TABLE #RunningTotal

That’s not necessarily true. What’s your CASE statement look like that it isn’t working?