Index: chap-one-side/one-side-2.tex
===================================================================
--- chap-one-side/one-side-2.tex	(revision 1363)
+++ chap-one-side/one-side-2.tex	(working copy)
@@ -93,20 +93,20 @@
 
 \MPIupdate{3.0}{270}{\MPI/ supports two fundamentally different memory models: separate
 and unified. The
-first model makes no assumption about memory consistency and is
+separate model makes no assumption about memory consistency and is
 highly portable. This model is similar to that of weakly coherent memory
 systems: the user must impose correct ordering of memory accesses
 through synchronization calls\MPIdelete{3.0}{270}{; for efficiency, the implementation can delay
 communication operations until the synchronization calls occur}. The
-second model can exploit cache-coherent hardware and
-hardware-accelerated one-sided operations that are commonly available
+unified model can exploit cache-coherent hardware and
+hardware-accelerated, one-sided operations that are commonly available
 in high-performance systems. \MPIdelete{3.0}{270}{In this model, communication can be
 independent of synchronization calls.}%
 %
 The two different models are discussed in detail in
 Section~\ref{sec:1sided-memmodel}.
 %
-Both models support a large number of synchronization calls to support
+Both models support several synchronization calls to support
 different synchronization styles.}
 
 The design of the \RMA/ functions allows implementors to take 
@@ -154,7 +154,7 @@
 \MPIupdate{3.0}{284}{\mpifunc{MPI\_WIN\_ALLOCATE\_SHARED} differs from
 \mpifunc{MPI\_WIN\_ALLOCATE} in that the allocated memory can be
 accessed from all processes in the window's group with direct load/store instructions. Some
-restrictions apply to the specified communicator.}
+restrictions may apply to the specified communicator.}
 \mpifunc{MPI\_WIN\_CREATE\_DYNAMIC} creates a window that allows the
 user to dynamically control which memory is exposed by the window.} 
 %\MPIupdate{3.0}{270}{Add here a mention of a third method that exposes all of
@@ -236,10 +236,9 @@
 process, at window creation.
 
 \begin{rationale}
-The window size is specified using an address sized integer, so as to
-allow windows that span more than 4 GB of address space.  (Even if
-the physical memory size is less than 4 GB, the address range
-may be larger than 4 GB, if addresses are not contiguous.)
+The window size is specified using an address sized integer,\MPIdelete{3.0}{0}{so as} to allow windows that span more than 4 GB of
+address space.  (Even if the physical memory size is less than 4 GB, the
+address range may be larger than 4 GB, if addresses are not contiguous.)
 \end{rationale}
 
 \begin{users}
@@ -255,13 +254,14 @@
 The following info key\MPIreplace{3.0}{270}{ is}{s are} predefined:
 
 \begin{description}
-\item{\infokey{no\_locks}} --- if  set to \constskip{true},
-then the implementation may assume that the
+\item{\infokey{no\_locks}} --- if set to \constskip{true},
+then the implementation may assume that \MPIreplace{3.0}{0}{the
 local window is never locked (by a call to 
-\mpifunc{MPI\_WIN\_LOCK}\MPIupdate{3.0}{270}{ or \mpifunc{MPI\_WIN\_LOCK\_ALL}}).
-This implies that this window is not used for
-3-party communication, and \RMA/ can be implemented with no (less)
-asynchronous
+\mpifunc{MPI\_WIN\_LOCK}\MPIupdate{3.0}{270}{ or
+\mpifunc{MPI\_WIN\_LOCK\_ALL}}).}{passive target synchronization (i.e.,
+\mpifunc{MPI\_WIN\_LOCK}, \mpifunc{MPI\_LOCK\_ALL}) will not be used on
+the given window.} This implies that this window is not used for 3-party
+communication, and \RMA/ can be implemented with no (less) asynchronous
 agent activity at this process.
 \MPIupdateBegin{3.0}{270}%
 \item{\infokey{accumulate\_ordering}} --- controls the ordering of accumulate 
@@ -278,18 +278,12 @@
 \end{description}
 
 \MPIupdateBegin{3.0}{270}%
-
 \begin{users}
-If windows are passed to libraries, the user
-needs to ensure that the info keys specified at window creation are
-communicated to the called library, which might need to constrain the
-operations on the passed window.
-% hcomment{option 2:} The info query mechanism described in
-% Section~\ref{} can be used to query the specified info arguments windows
-% that have been passed to a library. It is recommended that libraries
-% check attached info keys for each passed window.
+The info query mechanism described in Section~\ref{subsec:window-info}
+can be used to query the specified info arguments windows that have been
+passed to a library. It is recommended that libraries check attached
+info keys for each passed window.
 \end{users}
-
 \MPIupdateEnd{3.0}%
 
 The various processes in the group of
@@ -461,7 +455,7 @@
 \begin{rationale}
 By allocating (potentially aligned) memory instead of allowing the user
 to pass in an arbitrary buffer, this call can improve the performance
-for systems with remote direct memory access significantly. 
+for systems with remote direct memory access. 
 \MPIupdate{3.0}{270}{This also permits the collective allocation of memory and
   supports what is sometimes called the ``symmetric allocation'' model
   that can be more scalable (for example, the implementation can
@@ -511,8 +505,8 @@
 
 
 This is a collective call executed by all processes in the group of
-\mpiarg{comm}. On each process $i$, it allocates memory of at least size
-\mpiarg{size} bytes that is shared among all processes in \mpiarg{comm},
+\mpiarg{comm}. On each process $i$, it allocates memory of at least 
+\mpiarg{size} bytes that are shared among all processes in \mpiarg{comm},
 and returns a pointer to
 the locally allocated segment in \mpiarg{baseptr} that can be used for
 load/store accesses on the calling process. The locally allocated memory can be 
@@ -520,12 +514,10 @@
 other processes can be queried using the function
 \mpifunc{MPI\_WIN\_SHARED\_QUERY}. The call also returns a window object that
 can be used by all processes in \mpiarg{comm} to perform \RMA/ operations. The
-size argument may be different at each process and \mpiarg{size = 0} is valid;
-however, a library might allocate and expose more memory in order to create a
-fast, globally symmetric allocation.  It is the user's responsibility to
-ensure that the communicator \mpiarg{comm} represents a group of
-processes that can create a shared memory segment that can be accessed
-by all processes in the group.
+size argument may be different at each process and \mpiarg{size = 0} is
+valid.  It is the user's responsibility to ensure that the communicator
+\mpiarg{comm} represents a group of processes that can create a shared
+memory segment that can be accessed by all processes in the group.
 %
 The discussions of 
 rationales for \mpifunc{MPI\_ALLOC\_MEM} and \mpifunc{MPI\_FREE\_MEM} in
@@ -674,7 +666,7 @@
 this memory is typically allocated using \texttt{malloc} or
 \texttt{new} respectively.  In \MPIII/ RMA, the programmer must create
 a window with a predefined amount of memory and then
-implement routines for allocating memory from within that
+implement routines for allocating memory from within the window's
 memory.  In addition, there is no easy way to handle the situation
 where the predefined amount of memory turns out to be inadequate.
 To support this model, the routine \mpifunc{MPI\_WIN\_CREATE\_DYNAMIC}
@@ -805,7 +797,7 @@
 64-bit pointer) cannot be expressed as an address at the origin (for
 example, the origin uses 32-bit pointers).  For this reason, a portable
 MPI implementation should ensure that the type \mpiarg{MPI\_AINT}
-(cf.~Table~\ref{table:pttopt:datatypes:c_f} on
+(see~Table~\ref{table:pttopt:datatypes:c_f} on
 Page~\pageref{table:pttopt:datatypes:c_f}) is able to store addresses
 from any process.
 \end{implementors}
@@ -1054,11 +1046,12 @@
 \begin{implementors}
 \mpifunc{MPI\_WIN\_FREE} requires a barrier synchronization: no process
 can return from free until all processes in the group of \mpiarg{win}
-called free.  This\MPIreplace{3.0}{270}{,}{ is} to ensure that no process will attempt to access a
+called free.  This\MPIreplace{3.0}{270}{,}{ is} ensures that no process will attempt to access a
 remote window (e.g., with lock/unlock) after it was freed. \MPIupdate{3.0}{270}{The
 only exception to this rule is when the user sets the
-\infoval{no\_locks} info \MPIreplace{3.0}{xx:5/11/11}{argument}{key} to true when creating the window. In that case, the local window can be
-freed without barrier synchronization.}
+\infoval{no\_locks} info \MPIreplace{3.0}{xx:5/11/11}{argument}{key} to
+true when creating the window. In that case, an MPI implementation may
+free the local window without barrier synchronization.}
 \end{implementors}
 
 \subsection{Window Attributes}
@@ -1113,8 +1106,8 @@
 %
 \MPIupdateBegin{3.0}{283}%
 A detailed listing of the type of the pointer in the attribute value
-argument to \mpifunc{MPI\_Win\_get\_attr} and
-\mpifunc{MPI\_Win\_set\_attr} is shown in
+argument to \mpifunc{MPI\_WIN\_GET\_ATTR} and
+\mpifunc{MPI\_WIN\_SET\_ATTR} is shown in
 Table~\ref{table:c-attr-types}.
 %
 
@@ -1131,8 +1124,8 @@
 \end{tabular}
 \end{center} 
 \caption{%
-C types of attribute value argument to \mpifunc{MPI\_Win\_get\_attr} and
-\mpifunc{MPI\_Win\_set\_attr}.
+C types of attribute value argument to \mpifunc{MPI\_WIN\_GET\_ATTR} and
+\mpifunc{MPI\_WIN\_SET\_ATTR}.
 } 
 \label{table:c-attr-types}
 \end{table} 
@@ -1297,13 +1290,13 @@
 \MPIdelete{3.0}{270}{and} \mpifunc{MPI\_ACCUMULATE} \MPIreplace{3.0}{270}{updates}{and \mpifunc{MPI\_RACCUMULATE} update} locations in the target memory,
 e.g.\MPIupdate{3.0}{270}{,} by adding to these locations values sent from the caller
 memory\MPIreplace{3.0}{270}{.}{; \mpifunc{MPI\_GET\_ACCUMULATE}, \mpifunc{MPI\_RGET\_ACCUMULATE} and
-\mpifunc{MPI\_FETCH\_AND\_OP} atomically return the data
+\mpifunc{MPI\_FETCH\_AND\_OP} perform atomic read-modify-write and return the data
 before the accumulate operation; and
 \mpifunc{MPI\_COMPARE\_AND\_SWAP} performs a remote atomic compare and swap
 operation.}
 These operations are {\em nonblocking}: the call initiates
 the transfer, but the transfer may continue after the call returns.
-The transfer is completed, both at the origin and at the target, when
+The transfer is completed, at the origin or both the origin and the target, when
 a subsequent {\em synchronization} call is issued by the caller on
 the involved window object.  These synchronization calls are described in
 Section~\ref{sec:1sided-sync}, page~\pageref{sec:1sided-sync}.
@@ -1703,9 +1696,10 @@
 accumulation of 
 a sum by having all involved processes add their contribution to the
 sum variable in the memory of one process.
-\MPIupdate{3.0}{270}{The accumulate functions have slightly different semantics than
-  the put and get functions; see Section~\ref{sec:1sided-semantics}
-  for details.}
+\MPIupdate{3.0}{270}{The accumulate functions have slightly different
+semantics with respect to overlapping data accesses than
+the put and get functions; see Section~\ref{sec:1sided-semantics}
+for details.}
 
 \MPIupdateBegin{3.0}{270}%
 \subsubsection{Accumulate Function}
@@ -1973,7 +1967,7 @@
 \mpiarg{result\_addr}) must be disjoint. 
 Any of the predefined operations for \mpifunc{MPI\_REDUCE}, as well as 
 \const{MPI\_NO\_OP} or \const{MPI\_REPLACE}, can be specified as
-\mpiarg{op}.  User-defined functions cannot be used. 
+\mpiarg{op}; user-defined functions cannot be used. 
 %
 The \mpiarg{datatype} argument must be a predefined datatype.
 %
@@ -2036,8 +2030,8 @@
 
 Another useful operation is an atomic compare and swap where the
 value at the origin is compared to the value at the target,
-which is atomically replaced by a third value \MPIupdate{3.0}{270}{only} if origin and target are
-equal.
+which is atomically replaced by a third value \MPIupdate{3.0}{270}{only}
+if the values at origin and target are equal.
 
 \begin{funcdef}{MPI\_COMPARE\_AND\_SWAP(origin\_addr, compare\_addr, result\_addr, datatype, target\_rank, target\_disp, win)}
 \funcarg{\IN}{origin\_addr}{initial address of buffer (choice)} 
@@ -2331,7 +2325,7 @@
 
 
 \begin{funcdef2}{MPI\_RGET\_ACCUMULATE(origin\_addr, origin\_count,
-origin\_datatype, result\_addr,}{result\_count, result\_datatype,
+origin\_datatype, result\_addr, result\_count,}{ result\_datatype,
 target\_rank, target\_disp, target\_count, target\_datatype, op, win, request)}
 \funcarg{\IN}{origin\_addr}{initial address of buffer (choice)} 
 \funcarg{\IN}{origin\_count}{number of entries in origin buffer (non-negative integer)}
@@ -2420,8 +2414,8 @@
 \end{figure}
 
 In the \RMA/ unified model, public and private copies are identical and
-updates via put or accumulate calls are observed by load operations 
-without additional \RMA/ calls. A store access to a window is
+updates via put or accumulate calls are eventually observed by load operations 
+without additional \RMA/ calls. A store access to a window is eventually 
 visible to remote get or accumulate calls without additional \RMA/
 calls. These stronger semantics of the \RMA/ unified model allow the
 user to omit some synchronization calls and potentially improve
@@ -3316,8 +3310,8 @@
 \subsection{Flush and Sync}
 \label{sec:1sided-flush}
 
-All flush and sync functions can be called only within lock-unlock or
-lockall-unlockall epochs.
+All flush and sync functions can be called only within passive target
+epochs.
 
 \begin{funcdef}{MPI\_WIN\_FLUSH(rank, win)}
 \funcarg{\IN}{rank}{rank of target window (non-negative integer)}
@@ -3354,9 +3348,11 @@
 All \RMA/ operations issued \MPIupdate{3.0}{270}{by the calling process} to any target on the specified window
 prior to this call \MPIupdate{3.0}{270}{and in the specified window} will have
 completed both at the origin and at the target when this call
-returns. \mpifunc{MPI\_WIN\_FLUSH\_ALL} completes locally in the sense
-used in this document, meaning that the call must return without
-requiring the target processes to call any \MPI/ routine.
+returns. 
+% htor - removed after discussion at the Forum
+%\mpifunc{MPI\_WIN\_FLUSH\_ALL} completes locally in the sense
+%used in this document, meaning that the call must return without
+%requiring the target processes to call any \MPI/ routine.
 %This function can be called only within lock-unlock
 %or lockall-unlockall epochs.
 
@@ -3378,9 +3374,10 @@
 initiated by the calling process to the target process specified by rank
 on the specified window. \MPIupdate{3.0}{270}{For example, after this routine completes, the user may 
 reuse any buffers provided to put, get, or accumulate operations.}
-\mpifunc{MPI\_WIN\_FLUSH\_LOCAL} completes locally in the sense
-used in this document, meaning that the call must return without
-requiring the target processes to call any \MPI/ routine.%RMA operations 
+% htor - removed after discussion at the Forum
+%\mpifunc{MPI\_WIN\_FLUSH\_LOCAL} completes locally in the sense
+%used in this document, meaning that the call must return without
+%requiring the target processes to call any \MPI/ routine.%RMA operations 
 %issued prior to this call with rank as the target will have completed
 %at the origin when this call returns. 
 %This function can be called only
@@ -3401,9 +3398,10 @@
 All \RMA/ operations issued to any target prior to this call 
 in this window will have completed at the origin when
 \mpifunc{MPI\_WIN\_FLUSH\_LOCAL\_ALL} returns.
-\mpifunc{MPI\_WIN\_FLUSH\_LOCAL\_ALL} completes locally in the sense
-used in this document, meaning that the call must return without
-requiring the target processes to call any \MPI/ routine.
+% htor - removed after discussion at the Forum
+%\mpifunc{MPI\_WIN\_FLUSH\_LOCAL\_ALL} completes locally in the sense
+%used in this document, meaning that the call must return without
+%requiring the target processes to call any \MPI/ routine.
 % \begin{funcdef}{MPI\_WIN\_ALL\_FLUSH\_ALL(win)}
 % \funcarg{\IN}{win}{window object (handle)}
 % \end{funcdef}
@@ -3797,7 +3795,7 @@
 or \mpifunc{MPI\_WIN\_LOCK}}{\mpifunc{MPI\_WIN\_LOCK}, 
 \mpifunc{MPI\_WIN\_LOCK\_ALL}, or \mpifunc{MPI\_WIN\_SYNC}} is executed on that window by the
 window owner. \MPIupdate{3.0}{270}{In the \RMA/ unified memory model, an update by a put or
-accumulate call to a public window copy becomes visible in the private
+accumulate call to a public window copy eventually becomes visible in the private
 copy in process memory without additional \RMA/ calls.}
 \label{rma:rule:updatetoprivate}
 \end{enumerate}
@@ -3816,7 +3814,7 @@
 synchronization call on that window (\ref{rma:rule:updatetoprivate}).
 Thus, updates to process memory can always be delayed \MPIupdate{3.0}{270}{in the \RMA/
 separate memory model} until the process executes a suitable
-synchronization call\MPIupdate{3.0}{270}{, while they have to complete in the \RMA/ unified
+synchronization call\MPIupdate{3.0}{270}{, while they must complete in the \RMA/ unified
 model without additional synchronization calls}.  
 %
 \MPIreplace{3.0}{270}{Updates to a public window copy can also be delayed until the window
@@ -3853,7 +3851,7 @@
 
 \MPIupdateBegin{3.0}{270}%
 The behavior of some \MPI/ \RMA/ operations may be
-\emph{undefined} in some situations.  For example, the result of
+\emph{undefined} in certain situations.  For example, the result of
 several origin processes performing concurrent \mpifunc{MPI\_PUT}
 operations to the same target location is undefined.  In addition, the
 result of a single origin process performing multiple
@@ -3861,7 +3859,7 @@
 same access epoch is also undefined.
 The result at the target may have all of the
 data from one of the \mpifunc{MPI\_PUT} operations (the ``last'' one,
-in some sense), or bytes from some of each of the operations, or
+in some sense), bytes from some of each of the operations, or
 something else.  In \MPIII/, such operations were \emph{erroneous}.
 That meant that an \MPI/ implementation was permitted to signal an MPI
 exception.  Thus, user programs or tools that used \MPI/ \RMA/ could not
@@ -3888,7 +3886,7 @@
 \MPIupdateEnd{3.0}%
 
 %\color{green}
-A \MPIreplace{3.0}{270}{correct program}{program with well-defined outcome in the \const{MPI\_WIN\_SEPARATE} memory model} 
+A \MPIreplace{3.0}{270}{correct program}{program with a well-defined outcome in the \const{MPI\_WIN\_SEPARATE} memory model} 
 must obey the following rules.
 
 \begin{enumerate}
@@ -3991,31 +3989,33 @@
   memory, but there are no atomicity or ordering guarantees if
   more than one byte is updated.  Updates are stable in the sense that
   once data appears in memory, the data remains until replaced by
-  another update.  This permits \MPIdelete{3.0}{284}{the local process} to update memory 
-  \MPIreplace{3.0}{284}{in its local window}{with store operations} without requiring a lock/unlock or other \RMA/
-  synchronization epoch.  Users are cautioned that remote accesses to
+  another update.  This permits \MPIdelete{3.0}{284}{the local process}
+  updates to memory 
+  \MPIreplace{3.0}{284}{in its local window}{with store operations}
+  without requiring an RMA epoch.  Users are cautioned that remote accesses to
   a window that is updated by the local process has defined
   behavior only if the other rules given here and in this chapter
   are followed.
 \item
 A location in a window must not be accessed as a
 target of an \RMA/ 
-operation once an update to that location has started until the
+operation once an update to that location has started and until the
 update completes at the target. There is one
-exception to this rule: in the case where the same variable is updated
+exception to this rule: in the case where the same location is updated
 by two concurrent accumulates with the same
 predefined datatype on the same window. Additional restrictions on the
 operation apply; see the info key \mpiarg{accumulate\_ops} in
 Section~\ref{chap:one-side-2:win_create}.
 \item
 A put or accumulate must not access a target
-window once a \MPIreplace{3.0}{284}{local update}{store operation}
-or a put or accumulate update to another (overlapping) target window
-has started on the same location in the target window until the update
+window once a \MPIreplace{3.0}{284}{local update}{store,} put, or
+accumulate update to another (overlapping) target window
+has started on the same location in the target window and until the update
 completes at the target window.
-Conversely, a \MPIreplace{3.0}{284}{local update}{store operation} in process memory
+Conversely, a \MPIreplace{3.0}{284}{local update}{store operation} 
 to a location in a window must not start once a put or
-accumulate update to the same location in that target window has started until the put or accumulate
+accumulate update to the same location in that target window has started
+and until the put or accumulate
 update completes at the target.  
 \end{enumerate}
 Note that \mpifunc{MPI\_WIN\_FLUSH} and \mpifunc{MPI\_WIN\_FLUSH\_ALL}