Implement ia64 optimized mutex primitives. It properly uses
acquire/release memory ordering semantics in lock/unlock path.
2nd version making them all static inline functions.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>