Spent some quality time reading the native code behind java.lang.Object, particularly checking how synchronization is implemented (really complicated code, and critically important for the whole platform as you can imagine).
It is implemented like this (ton of details left out): of the 8 bytes of the object, 4 of them, if the object is used as a lock, are interpreted as a pointer to some ObjectMonitor, basically a queue of waiting threads. The implied CAS operation associated with entering or exiting a synchronized block is targetted (unless I misread it), to a word _inside_ the bytes of the Object, not the linked structure. This basically means that if you have an "locks" Object array (initialized with "new Object()" in a loop), and you grab lock from one thread, and lock from another, these are likely to interfere with one another (CASes on the same cache line, one succeeding may spuriously cause the other to fail and retry). ReentrantLock is less susceptible to this problem, since it weighs at 40 bytes instead of 8, but still. All of this is fixable through padding.
This is not to imply that only CAS operations on nearby locations are a problem; even plain writes/reads of nearby memory locations from different threads will cause false cache sharing and increased cache coherency traffic.