Architectural Support For Efficient On-Chip Parallel Execution