Blog

Using SAS Hash Objects to Solve a Complicated Workload Issue

07/25/2016

On a recent project, I was challenged with automating a task assignment and workload balancing process. I had foundation SAS 9.4 available to me as my tool kit for developing an optimal solution. It was the perfect opportunity to take advantage of the powerful SAS data step hash and hash iterator objects. Before diving into the coding approach, let’s first take a look at the use case.

Use Case Background

Boiled down to the essentials and for the sake of demonstration, the automated task assignment flowed like this: retrieve a task, assign it to the most available agent, and if there are no agents available then assign it as an overflow to the most available group of agents.

There were several types of tasks and each had an associated time cost. For example, task type 1 = 2 hours to complete, task type 2 = 4 hours, task type 3 = 1 hour and task type 4 = 2 hours.

There were also about 50 agents who belong to workgroups. Each agent had a maximum of 160 total number of allocated hours for the month. But if, for example, an agent were on vacation that number would be less. An agent’s availability was computed as:

The group’s availability is computed using the same equation with the hours summed over all agents in the group.

This formula made it easy to compare agent to agent and group to group when determining most available. The higher the value, the more available. The availability represented workload. The ultimate goal was to make sure all agents and groups had approximately the same relative workload, ensuring that new tasks were fairly assigned in a balanced fashion.

Sounds simple enough, so what’s the challenge?

With each new task assigned, the most available agent must be re-determined. Each time you loop through the process you must re-sort by descending availability so the most available agent pops to the top. That’s potentially a lot of sorting.

Let’s take, for example, a set of 19 agents. Each belongs to a group, has allocated hours for the month, and is starting the process with no open tasks assigned. With the exception of Joyce who is on leave for the month, everyone is 100% available.

sas hash code example

We need to equitably assign hundreds of new tasks to these agents:

Each task type will reduce an agent’s available hours by a different amount:

Task ID_1 is read in and assigned to the first available agent, William. His available hours are reduced and his availability is recalculated. The agents are re-sorted so the agent with the most availability is first and ready for the next task assignment.

After assigning the first 18 new tasks you can see how much the order of agents by availability has changed.

How can this iterative sort-and-assign logic be efficiently coded in Base SAS?

Hash Iterator Object Made It Solution Simple

A single data step using the hash and hash iterator objects was a natural fit. The trick is to declare “availability” as the hash key and use the “ordered:’D’” option so the iterator object returns data in descending key-value order. That way the hash iterator object first method retrieves the most available agent. The hash key-value is recalculated with each record read from the new tasks data set but you cannot update a hash key on-the-fly. Therefore, as each new task is read in and assigned, update the availability value to a corresponding persistent hash table defined with a static hash key-value, ie agent name (or group name). Then, use the hash object output method to write values from that hash table to a SAS data set which is then re-loaded into the ordered availability-keyed hash table at the top of the data step.

Below is the code for this basic automatic assignment and workload balancing logic. The hash object approach handles the iterative sorting concisely and executes reasonably quickly. The actual logic I implemented was more complicated but processed and assigned 2,000 new task records in about 30 seconds and half a million in about an hour.

The hash object is a particularly good solution for this use case because there is a relatively small set of agents and groups. The larger the set to load into a hash object, the more memory required. For an in-depth explanation of the hash and hash iterator objects, Paul Dorfman’s Data Step Hash Objects as Programming Tools tutorial is an excellent place to start.

options nonotes; 
/* the hash output method generates too many notes */
data work.new_assignments(keep=task_id agent group task 
reason_assigned taskHours);  
length agent $60 group $200 
available_hours allocated_hours availability group_available_hours         
group_allocated_hours group_availability 8          
taskType $8 taskHours 8         
reason_assigned $200;  

/* hash table of agents sorted by descending available_hours - this is reloaded after processing each task     
because the hash key value is updated */  
dcl hash agents(dataset:"work.agents", ordered:"D");  

/* load by available hours in descending order */  
dcl hiter i_agents("agents");  
rc=agents.defineKey("availability","agent","group");  
rc=agents.defineData("availability","allocated_hours","available_hours","agent","group");  
rc=agents.defineDone();  

/* hash table of groups sorted by descending group_available_hours - this is reloaded after processing each     
task because the hash key value is updated */  
dcl hash Groups(dataset:"work.groups", ordered:"D");  
/* load by available hours in descending order */  
dcl hiter i_Groups("groups");  
rc=Groups.defineKey("group_availability","group");  
rc=Groups.defineData("group_availability","group_available_hours","group_allocated_hours","group");  
rc=Groups.defineDone();  if _n_=1 then do; 
/* initialize hash tables which persist over each data step iteration */    
/* hash table to hold updated agent available_hours */    
dcl hash updateAgents(dataset:"work.agents");     
dcl hiter i_updateAgents("updateAgents");    
rc=updateAgents.defineKey("agent","group");    
rc=updateAgents.defineData("agent","group","available_hours","allocated_hours","availability");    
rc=updateAgents.defineDone();    
/* hash table to hold updated group available_hours */    
dcl hash updateGroups(dataset:"work.groups");    
dcl hiter i_updateGroups("updateGroups");    
rc=updateGroups.defineKey("group");    
rc=updateGroups.defineData("group","group_available_hours","group_allocated_hours","group_availability");    
rc=updateGroups.defineDone();    
/* taskHours is used as a look up table */    
dcl hash _taskHours(dataset:"wl.taskHours");    
rc=_taskHours.defineKey("taskType");    
rc=_taskHours.defineData("taskHours");    
rc=_taskHours.defineDone();    
call missing(agent, group, allocated_hours, taskType,taskHours,                  
    group_allocated_hours, group_availability);  
end; 
/* initialize hash tables which persist over each data step iteration */  
set wl.new_tasks end=lastrec;  
/* look up number of hours to complete the task */  
taskType=task;  rc=_taskHours.find();  
/* get agent with most availability */  
rc=i_agents.first();  
/* if the agent's availability is greater than 0 then assign to agent */  
if availability > 0 then do;    
/* apply task hours to available hours*/    
available_hours=available_hours-taskHours; 
/* permits available_hours go negative  */    
availability=available_hours/allocated_hours;    

/* output updates to task data set */    
reason_assigned='task assigned to most available agent';    
/* put new hours available for this agent to the update table */    
rc=updateAgents.replace();    
/* put new hours available for this group to the update table */    
rc=updateGroups.find();    

group_available_hours=group_available_hours-taskHours;    
group_availability=group_available_hours/group_allocated_hours;    
rc=updateGroups.replace();    
output work.new_assignments;  
end;  

else do;  
/* assign to group */     

agent="";    
reason_assigned='task assigned to most available group';    
output work.new_assignments;  
end;  

/* output updated data to work.agents so can be reloaded into the agents hash table in descending order by      
available_hours on the next iteration of the data step */  

rco=updateAgents.output(dataset:"work.agents");  

/* clear agents hash table so is ready for reloading */  

rc=i_agents.last();  
rc=i_agents.next();  
rc=agents.clear();  

/* output updated data to work.groups so can be reloaded into the groups hash table in descending order by
     group_available_hours on the next iteration of the data step */  

  rco=updateGroups.output(dataset:"work.groups");  

/* clear group hash table so is ready for reloading */  

  rc=i_groups.last();
  rc=i_groups.next();  
  rc=groups.clear();
run;

options notes;

…..