An extensible and scalable Pilot-MapReduce framework for data intensive applications on distributed cyberinfrastructure