A Survey to Fix the Threshold and Implementation for Detecting Duplicate Web Documents