Improving Data-Dependent Parallelism In Gpus Through Programmer-Transparent Architectural Support