--- /dev/null
+What: /sys/devices/system/node/nodeX/compact
+Date: February 2010
+Contact: Mel Gorman <mel@csn.ul.ie>
+Description:
+ When this file is written to, all memory within that node
+ will be compacted. When it completes, memory will be freed
+ into blocks which have as many contiguous pages as possible
a range being two hyphen-separated decimal numbers, the smallest and
largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
+A memory policy with a valid NodeList will be saved, as specified, for
+use at file creation time. When a task allocates a file in the file
+system, the mount option memory policy will be applied with a NodeList,
+if any, modified by the calling task's cpuset constraints
+[See Documentation/cgroups/cpusets.txt] and any optional flags, listed
+below. If the resulting NodeLists is the empty set, the effective memory
+policy for the file will revert to "default" policy.
+
NUMA memory allocation policies have optional flags that can be used in
conjunction with their modes. These optional flags can be specified
when tmpfs is mounted by appending them to the mode before the NodeList.
See Documentation/vm/numa_memory_policy.txt for a list of all available
-memory allocation policy mode flags.
+memory allocation policy mode flags and their effect on memory policy.
=static is equivalent to MPOL_F_STATIC_NODES
=relative is equivalent to MPOL_F_RELATIVE_NODES
--- /dev/null
+XFS Delayed Logging Design
+--------------------------
+
+Introduction to Re-logging in XFS
+---------------------------------
+
+XFS logging is a combination of logical and physical logging. Some objects,
+such as inodes and dquots, are logged in logical format where the details
+logged are made up of the changes to in-core structures rather than on-disk
+structures. Other objects - typically buffers - have their physical changes
+logged. The reason for these differences is to reduce the amount of log space
+required for objects that are frequently logged. Some parts of inodes are more
+frequently logged than others, and inodes are typically more frequently logged
+than any other object (except maybe the superblock buffer) so keeping the
+amount of metadata logged low is of prime importance.
+
+The reason that this is such a concern is that XFS allows multiple separate
+modifications to a single object to be carried in the log at any given time.
+This allows the log to avoid needing to flush each change to disk before
+recording a new change to the object. XFS does this via a method called
+"re-logging". Conceptually, this is quite simple - all it requires is that any
+new change to the object is recorded with a *new copy* of all the existing
+changes in the new transaction that is written to the log.
+
+That is, if we have a sequence of changes A through to F, and the object was
+written to disk after change D, we would see in the log the following series
+of transactions, their contents and the log sequence number (LSN) of the
+transaction:
+
+ Transaction Contents LSN
+ A A X
+ B A+B X+n
+ C A+B+C X+n+m
+ D A+B+C+D X+n+m+o
+ <object written to disk>
+ E E Y (> X+n+m+o)
+ F E+F Yٍ+p
+
+In other words, each time an object is relogged, the new transaction contains
+the aggregation of all the previous changes currently held only in the log.
+
+This relogging technique also allows objects to be moved forward in the log so
+that an object being relogged does not prevent the tail of the log from ever
+moving forward. This can be seen in the table above by the changing
+(increasing) LSN of each subsquent transaction - the LSN is effectively a
+direct encoding of the location in the log of the transaction.
+
+This relogging is also used to implement long-running, multiple-commit
+transactions. These transaction are known as rolling transactions, and require
+a special log reservation known as a permanent transaction reservation. A
+typical example of a rolling transaction is the removal of extents from an
+inode which can only be done at a rate of two extents per transaction because
+of reservation size limitations. Hence a rolling extent removal transaction
+keeps relogging the inode and btree buffers as they get modified in each
+removal operation. This keeps them moving forward in the log as the operation
+progresses, ensuring that current operation never gets blocked by itself if the
+log wraps around.
+
+Hence it can be seen that the relogging operation is fundamental to the correct
+working of the XFS journalling subsystem. From the above description, most
+people should be able to see why the XFS metadata operations writes so much to
+the log - repeated operations to the same objects write the same changes to
+the log over and over again. Worse is the fact that objects tend to get
+dirtier as they get relogged, so each subsequent transaction is writing more
+metadata into the log.
+
+Another feature of the XFS transaction subsystem is that most transactions are
+asynchronous. That is, they don't commit to disk until either a log buffer is
+filled (a log buffer can hold multiple transactions) or a synchronous operation
+forces the log buffers holding the transactions to disk. This means that XFS is
+doing aggregation of transactions in memory - batching them, if you like - to
+minimise the impact of the log IO on transaction throughput.
+
+The limitation on asynchronous transaction throughput is the number and size of
+log buffers made available by the log manager. By default there are 8 log
+buffers available and the size of each is 32kB - the size can be increased up
+to 256kB by use of a mount option.
+
+Effectively, this gives us the maximum bound of outstanding metadata changes
+that can be made to the filesystem at any point in time - if all the log
+buffers are full and under IO, then no more transactions can be committed until
+the current batch completes. It is now common for a single current CPU core to
+be to able to issue enough transactions to keep the log buffers full and under
+IO permanently. Hence the XFS journalling subsystem can be considered to be IO
+bound.
+
+Delayed Logging: Concepts
+-------------------------
+
+The key thing to note about the asynchronous logging combined with the
+relogging technique XFS uses is that we can be relogging changed objects
+multiple times before they are committed to disk in the log buffers. If we
+return to the previous relogging example, it is entirely possible that
+transactions A through D are committed to disk in the same log buffer.
+
+That is, a single log buffer may contain multiple copies of the same object,
+but only one of those copies needs to be there - the last one "D", as it
+contains all the changes from the previous changes. In other words, we have one
+necessary copy in the log buffer, and three stale copies that are simply
+wasting space. When we are doing repeated operations on the same set of
+objects, these "stale objects" can be over 90% of the space used in the log
+buffers. It is clear that reducing the number of stale objects written to the
+log would greatly reduce the amount of metadata we write to the log, and this
+is the fundamental goal of delayed logging.
+
+From a conceptual point of view, XFS is already doing relogging in memory (where
+memory == log buffer), only it is doing it extremely inefficiently. It is using
+logical to physical formatting to do the relogging because there is no
+infrastructure to keep track of logical changes in memory prior to physically
+formatting the changes in a transaction to the log buffer. Hence we cannot avoid
+accumulating stale objects in the log buffers.
+
+Delayed logging is the name we've given to keeping and tracking transactional
+changes to objects in memory outside the log buffer infrastructure. Because of
+the relogging concept fundamental to the XFS journalling subsystem, this is
+actually relatively easy to do - all the changes to logged items are already
+tracked in the current infrastructure. The big problem is how to accumulate
+them and get them to the log in a consistent, recoverable manner.
+Describing the problems and how they have been solved is the focus of this
+document.
+
+One of the key changes that delayed logging makes to the operation of the
+journalling subsystem is that it disassociates the amount of outstanding
+metadata changes from the size and number of log buffers available. In other
+words, instead of there only being a maximum of 2MB of transaction changes not
+written to the log at any point in time, there may be a much greater amount
+being accumulated in memory. Hence the potential for loss of metadata on a
+crash is much greater than for the existing logging mechanism.
+
+It should be noted that this does not change the guarantee that log recovery
+will result in a consistent filesystem. What it does mean is that as far as the
+recovered filesystem is concerned, there may be many thousands of transactions
+that simply did not occur as a result of the crash. This makes it even more
+important that applications that care about their data use fsync() where they
+need to ensure application level data integrity is maintained.
+
+It should be noted that delayed logging is not an innovative new concept that
+warrants rigorous proofs to determine whether it is correct or not. The method
+of accumulating changes in memory for some period before writing them to the
+log is used effectively in many filesystems including ext3 and ext4. Hence
+no time is spent in this document trying to convince the reader that the
+concept is sound. Instead it is simply considered a "solved problem" and as
+such implementing it in XFS is purely an exercise in software engineering.
+
+The fundamental requirements for delayed logging in XFS are simple:
+
+ 1. Reduce the amount of metadata written to the log by at least
+ an order of magnitude.
+ 2. Supply sufficient statistics to validate Requirement #1.
+ 3. Supply sufficient new tracing infrastructure to be able to debug
+ problems with the new code.
+ 4. No on-disk format change (metadata or log format).
+ 5. Enable and disable with a mount option.
+ 6. No performance regressions for synchronous transaction workloads.
+
+Delayed Logging: Design
+-----------------------
+
+Storing Changes
+
+The problem with accumulating changes at a logical level (i.e. just using the
+existing log item dirty region tracking) is that when it comes to writing the
+changes to the log buffers, we need to ensure that the object we are formatting
+is not changing while we do this. This requires locking the object to prevent
+concurrent modification. Hence flushing the logical changes to the log would
+require us to lock every object, format them, and then unlock them again.
+
+This introduces lots of scope for deadlocks with transactions that are already
+running. For example, a transaction has object A locked and modified, but needs
+the delayed logging tracking lock to commit the transaction. However, the
+flushing thread has the delayed logging tracking lock already held, and is
+trying to get the lock on object A to flush it to the log buffer. This appears
+to be an unsolvable deadlock condition, and it was solving this problem that
+was the barrier to implementing delayed logging for so long.
+
+The solution is relatively simple - it just took a long time to recognise it.
+Put simply, the current logging code formats the changes to each item into an
+vector array that points to the changed regions in the item. The log write code
+simply copies the memory these vectors point to into the log buffer during
+transaction commit while the item is locked in the transaction. Instead of
+using the log buffer as the destination of the formatting code, we can use an
+allocated memory buffer big enough to fit the formatted vector.
+
+If we then copy the vector into the memory buffer and rewrite the vector to
+point to the memory buffer rather than the object itself, we now have a copy of
+the changes in a format that is compatible with the log buffer writing code.
+that does not require us to lock the item to access. This formatting and
+rewriting can all be done while the object is locked during transaction commit,
+resulting in a vector that is transactionally consistent and can be accessed
+without needing to lock the owning item.
+
+Hence we avoid the need to lock items when we need to flush outstanding
+asynchronous transactions to the log. The differences between the existing
+formatting method and the delayed logging formatting can be seen in the
+diagram below.
+
+Current format log vector:
+
+Object +---------------------------------------------+
+Vector 1 +----+
+Vector 2 +----+
+Vector 3 +----------+
+
+After formatting:
+
+Log Buffer +-V1-+-V2-+----V3----+
+
+Delayed logging vector:
+
+Object +---------------------------------------------+
+Vector 1 +----+
+Vector 2 +----+
+Vector 3 +----------+
+
+After formatting:
+
+Memory Buffer +-V1-+-V2-+----V3----+
+Vector 1 +----+
+Vector 2 +----+
+Vector 3 +----------+
+
+The memory buffer and associated vector need to be passed as a single object,
+but still need to be associated with the parent object so if the object is
+relogged we can replace the current memory buffer with a new memory buffer that
+contains the latest changes.
+
+The reason for keeping the vector around after we've formatted the memory
+buffer is to support splitting vectors across log buffer boundaries correctly.
+If we don't keep the vector around, we do not know where the region boundaries
+are in the item, so we'd need a new encapsulation method for regions in the log
+buffer writing (i.e. double encapsulation). This would be an on-disk format
+change and as such is not desirable. It also means we'd have to write the log
+region headers in the formatting stage, which is problematic as there is per
+region state that needs to be placed into the headers during the log write.
+
+Hence we need to keep the vector, but by attaching the memory buffer to it and
+rewriting the vector addresses to point at the memory buffer we end up with a
+self-describing object that can be passed to the log buffer write code to be
+handled in exactly the same manner as the existing log vectors are handled.
+Hence we avoid needing a new on-disk format to handle items that have been
+relogged in memory.
+
+
+Tracking Changes
+
+Now that we can record transactional changes in memory in a form that allows
+them to be used without limitations, we need to be able to track and accumulate
+them so that they can be written to the log at some later point in time. The
+log item is the natural place to store this vector and buffer, and also makes sense
+to be the object that is used to track committed objects as it will always
+exist once the object has been included in a transaction.
+
+The log item is already used to track the log items that have been written to
+the log but not yet written to disk. Such log items are considered "active"
+and as such are stored in the Active Item List (AIL) which is a LSN-ordered
+double linked list. Items are inserted into this list during log buffer IO
+completion, after which they are unpinned and can be written to disk. An object
+that is in the AIL can be relogged, which causes the object to be pinned again
+and then moved forward in the AIL when the log buffer IO completes for that
+transaction.
+
+Essentially, this shows that an item that is in the AIL can still be modified
+and relogged, so any tracking must be separate to the AIL infrastructure. As
+such, we cannot reuse the AIL list pointers for tracking committed items, nor
+can we store state in any field that is protected by the AIL lock. Hence the
+committed item tracking needs it's own locks, lists and state fields in the log
+item.
+
+Similar to the AIL, tracking of committed items is done through a new list
+called the Committed Item List (CIL). The list tracks log items that have been
+committed and have formatted memory buffers attached to them. It tracks objects
+in transaction commit order, so when an object is relogged it is removed from
+it's place in the list and re-inserted at the tail. This is entirely arbitrary
+and done to make it easy for debugging - the last items in the list are the
+ones that are most recently modified. Ordering of the CIL is not necessary for
+transactional integrity (as discussed in the next section) so the ordering is
+done for convenience/sanity of the developers.
+
+
+Delayed Logging: Checkpoints
+
+When we have a log synchronisation event, commonly known as a "log force",
+all the items in the CIL must be written into the log via the log buffers.
+We need to write these items in the order that they exist in the CIL, and they
+need to be written as an atomic transaction. The need for all the objects to be
+written as an atomic transaction comes from the requirements of relogging and
+log replay - all the changes in all the objects in a given transaction must
+either be completely replayed during log recovery, or not replayed at all. If
+a transaction is not replayed because it is not complete in the log, then
+no later transactions should be replayed, either.
+
+To fulfill this requirement, we need to write the entire CIL in a single log
+transaction. Fortunately, the XFS log code has no fixed limit on the size of a
+transaction, nor does the log replay code. The only fundamental limit is that
+the transaction cannot be larger than just under half the size of the log. The
+reason for this limit is that to find the head and tail of the log, there must
+be at least one complete transaction in the log at any given time. If a
+transaction is larger than half the log, then there is the possibility that a
+crash during the write of a such a transaction could partially overwrite the
+only complete previous transaction in the log. This will result in a recovery
+failure and an inconsistent filesystem and hence we must enforce the maximum
+size of a checkpoint to be slightly less than a half the log.
+
+Apart from this size requirement, a checkpoint transaction looks no different
+to any other transaction - it contains a transaction header, a series of
+formatted log items and a commit record at the tail. From a recovery
+perspective, the checkpoint transaction is also no different - just a lot
+bigger with a lot more items in it. The worst case effect of this is that we
+might need to tune the recovery transaction object hash size.
+
+Because the checkpoint is just another transaction and all the changes to log
+items are stored as log vectors, we can use the existing log buffer writing
+code to write the changes into the log. To do this efficiently, we need to
+minimise the time we hold the CIL locked while writing the checkpoint
+transaction. The current log write code enables us to do this easily with the
+way it separates the writing of the transaction contents (the log vectors) from
+the transaction commit record, but tracking this requires us to have a
+per-checkpoint context that travels through the log write process through to
+checkpoint completion.
+
+Hence a checkpoint has a context that tracks the state of the current
+checkpoint from initiation to checkpoint completion. A new context is initiated
+at the same time a checkpoint transaction is started. That is, when we remove
+all the current items from the CIL during a checkpoint operation, we move all
+those changes into the current checkpoint context. We then initialise a new
+context and attach that to the CIL for aggregation of new transactions.
+
+This allows us to unlock the CIL immediately after transfer of all the
+committed items and effectively allow new transactions to be issued while we
+are formatting the checkpoint into the log. It also allows concurrent
+checkpoints to be written into the log buffers in the case of log force heavy
+workloads, just like the existing transaction commit code does. This, however,
+requires that we strictly order the commit records in the log so that
+checkpoint sequence order is maintained during log replay.
+
+To ensure that we can be writing an item into a checkpoint transaction at
+the same time another transaction modifies the item and inserts the log item
+into the new CIL, then checkpoint transaction commit code cannot use log items
+to store the list of log vectors that need to be written into the transaction.
+Hence log vectors need to be able to be chained together to allow them to be
+detatched from the log items. That is, when the CIL is flushed the memory
+buffer and log vector attached to each log item needs to be attached to the
+checkpoint context so that the log item can be released. In diagrammatic form,
+the CIL would look like this before the flush:
+
+ CIL Head
+ |
+ V
+ Log Item <-> log vector 1 -> memory buffer
+ | -> vector array
+ V
+ Log Item <-> log vector 2 -> memory buffer
+ | -> vector array
+ V
+ ......
+ |
+ V
+ Log Item <-> log vector N-1 -> memory buffer
+ | -> vector array
+ V
+ Log Item <-> log vector N -> memory buffer
+ -> vector array
+
+And after the flush the CIL head is empty, and the checkpoint context log
+vector list would look like:
+
+ Checkpoint Context
+ |
+ V
+ log vector 1 -> memory buffer
+ | -> vector array
+ | -> Log Item
+ V
+ log vector 2 -> memory buffer
+ | -> vector array
+ | -> Log Item
+ V
+ ......
+ |
+ V
+ log vector N-1 -> memory buffer
+ | -> vector array
+ | -> Log Item
+ V
+ log vector N -> memory buffer
+ -> vector array
+ -> Log Item
+
+Once this transfer is done, the CIL can be unlocked and new transactions can
+start, while the checkpoint flush code works over the log vector chain to
+commit the checkpoint.
+
+Once the checkpoint is written into the log buffers, the checkpoint context is
+attached to the log buffer that the commit record was written to along with a
+completion callback. Log IO completion will call that callback, which can then
+run transaction committed processing for the log items (i.e. insert into AIL
+and unpin) in the log vector chain and then free the log vector chain and
+checkpoint context.
+
+Discussion Point: I am uncertain as to whether the log item is the most
+efficient way to track vectors, even though it seems like the natural way to do
+it. The fact that we walk the log items (in the CIL) just to chain the log
+vectors and break the link between the log item and the log vector means that
+we take a cache line hit for the log item list modification, then another for
+the log vector chaining. If we track by the log vectors, then we only need to
+break the link between the log item and the log vector, which means we should
+dirty only the log item cachelines. Normally I wouldn't be concerned about one
+vs two dirty cachelines except for the fact I've seen upwards of 80,000 log
+vectors in one checkpoint transaction. I'd guess this is a "measure and
+compare" situation that can be done after a working and reviewed implementation
+is in the dev tree....
+
+Delayed Logging: Checkpoint Sequencing
+
+One of the key aspects of the XFS transaction subsystem is that it tags
+committed transactions with the log sequence number of the transaction commit.
+This allows transactions to be issued asynchronously even though there may be
+future operations that cannot be completed until that transaction is fully
+committed to the log. In the rare case that a dependent operation occurs (e.g.
+re-using a freed metadata extent for a data extent), a special, optimised log
+force can be issued to force the dependent transaction to disk immediately.
+
+To do this, transactions need to record the LSN of the commit record of the
+transaction. This LSN comes directly from the log buffer the transaction is
+written into. While this works just fine for the existing transaction
+mechanism, it does not work for delayed logging because transactions are not
+written directly into the log buffers. Hence some other method of sequencing
+transactions is required.
+
+As discussed in the checkpoint section, delayed logging uses per-checkpoint
+contexts, and as such it is simple to assign a sequence number to each
+checkpoint. Because the switching of checkpoint contexts must be done
+atomically, it is simple to ensure that each new context has a monotonically
+increasing sequence number assigned to it without the need for an external
+atomic counter - we can just take the current context sequence number and add
+one to it for the new context.
+
+Then, instead of assigning a log buffer LSN to the transaction commit LSN
+during the commit, we can assign the current checkpoint sequence. This allows
+operations that track transactions that have not yet completed know what
+checkpoint sequence needs to be committed before they can continue. As a
+result, the code that forces the log to a specific LSN now needs to ensure that
+the log forces to a specific checkpoint.
+
+To ensure that we can do this, we need to track all the checkpoint contexts
+that are currently committing to the log. When we flush a checkpoint, the
+context gets added to a "committing" list which can be searched. When a
+checkpoint commit completes, it is removed from the committing list. Because
+the checkpoint context records the LSN of the commit record for the checkpoint,
+we can also wait on the log buffer that contains the commit record, thereby
+using the existing log force mechanisms to execute synchronous forces.
+
+It should be noted that the synchronous forces may need to be extended with
+mitigation algorithms similar to the current log buffer code to allow
+aggregation of multiple synchronous transactions if there are already
+synchronous transactions being flushed. Investigation of the performance of the
+current design is needed before making any decisions here.
+
+The main concern with log forces is to ensure that all the previous checkpoints
+are also committed to disk before the one we need to wait for. Therefore we
+need to check that all the prior contexts in the committing list are also
+complete before waiting on the one we need to complete. We do this
+synchronisation in the log force code so that we don't need to wait anywhere
+else for such serialisation - it only matters when we do a log force.
+
+The only remaining complexity is that a log force now also has to handle the
+case where the forcing sequence number is the same as the current context. That
+is, we need to flush the CIL and potentially wait for it to complete. This is a
+simple addition to the existing log forcing code to check the sequence numbers
+and push if required. Indeed, placing the current sequence checkpoint flush in
+the log force code enables the current mechanism for issuing synchronous
+transactions to remain untouched (i.e. commit an asynchronous transaction, then
+force the log at the LSN of that transaction) and so the higher level code
+behaves the same regardless of whether delayed logging is being used or not.
+
+Delayed Logging: Checkpoint Log Space Accounting
+
+The big issue for a checkpoint transaction is the log space reservation for the
+transaction. We don't know how big a checkpoint transaction is going to be
+ahead of time, nor how many log buffers it will take to write out, nor the
+number of split log vector regions are going to be used. We can track the
+amount of log space required as we add items to the commit item list, but we
+still need to reserve the space in the log for the checkpoint.
+
+A typical transaction reserves enough space in the log for the worst case space
+usage of the transaction. The reservation accounts for log record headers,
+transaction and region headers, headers for split regions, buffer tail padding,
+etc. as well as the actual space for all the changed metadata in the
+transaction. While some of this is fixed overhead, much of it is dependent on
+the size of the transaction and the number of regions being logged (the number
+of log vectors in the transaction).
+
+An example of the differences would be logging directory changes versus logging
+inode changes. If you modify lots of inode cores (e.g. chmod -R g+w *), then
+there are lots of transactions that only contain an inode core and an inode log
+format structure. That is, two vectors totaling roughly 150 bytes. If we modify
+10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
+vector is 12 bytes, so the total to be logged is approximately 1.75MB. In
+comparison, if we are logging full directory buffers, they are typically 4KB
+each, so we in 1.5MB of directory buffers we'd have roughly 400 buffers and a
+buffer format structure for each buffer - roughly 800 vectors or 1.51MB total
+space. From this, it should be obvious that a static log space reservation is
+not particularly flexible and is difficult to select the "optimal value" for
+all workloads.
+
+Further, if we are going to use a static reservation, which bit of the entire
+reservation does it cover? We account for space used by the transaction
+reservation by tracking the space currently used by the object in the CIL and
+then calculating the increase or decrease in space used as the object is
+relogged. This allows for a checkpoint reservation to only have to account for
+log buffer metadata used such as log header records.
+
+However, even using a static reservation for just the log metadata is
+problematic. Typically log record headers use at least 16KB of log space per
+1MB of log space consumed (512 bytes per 32k) and the reservation needs to be
+large enough to handle arbitrary sized checkpoint transactions. This
+reservation needs to be made before the checkpoint is started, and we need to
+be able to reserve the space without sleeping. For a 8MB checkpoint, we need a
+reservation of around 150KB, which is a non-trivial amount of space.
+
+A static reservation needs to manipulate the log grant counters - we can take a
+permanent reservation on the space, but we still need to make sure we refresh
+the write reservation (the actual space available to the transaction) after
+every checkpoint transaction completion. Unfortunately, if this space is not
+available when required, then the regrant code will sleep waiting for it.
+
+The problem with this is that it can lead to deadlocks as we may need to commit
+checkpoints to be able to free up log space (refer back to the description of
+rolling transactions for an example of this). Hence we *must* always have
+space available in the log if we are to use static reservations, and that is
+very difficult and complex to arrange. It is possible to do, but there is a
+simpler way.
+
+The simpler way of doing this is tracking the entire log space used by the
+items in the CIL and using this to dynamically calculate the amount of log
+space required by the log metadata. If this log metadata space changes as a
+result of a transaction commit inserting a new memory buffer into the CIL, then
+the difference in space required is removed from the transaction that causes
+the change. Transactions at this level will *always* have enough space
+available in their reservation for this as they have already reserved the
+maximal amount of log metadata space they require, and such a delta reservation
+will always be less than or equal to the maximal amount in the reservation.
+
+Hence we can grow the checkpoint transaction reservation dynamically as items
+are added to the CIL and avoid the need for reserving and regranting log space
+up front. This avoids deadlocks and removes a blocking point from the
+checkpoint flush code.
+
+As mentioned early, transactions can't grow to more than half the size of the
+log. Hence as part of the reservation growing, we need to also check the size
+of the reservation against the maximum allowed transaction size. If we reach
+the maximum threshold, we need to push the CIL to the log. This is effectively
+a "background flush" and is done on demand. This is identical to
+a CIL push triggered by a log force, only that there is no waiting for the
+checkpoint commit to complete. This background push is checked and executed by
+transaction commit code.
+
+If the transaction subsystem goes idle while we still have items in the CIL,
+they will be flushed by the periodic log force issued by the xfssyncd. This log
+force will push the CIL to disk, and if the transaction subsystem stays idle,
+allow the idle log to be covered (effectively marked clean) in exactly the same
+manner that is done for the existing logging method. A discussion point is
+whether this log force needs to be done more frequently than the current rate
+which is once every 30s.
+
+
+Delayed Logging: Log Item Pinning
+
+Currently log items are pinned during transaction commit while the items are
+still locked. This happens just after the items are formatted, though it could
+be done any time before the items are unlocked. The result of this mechanism is
+that items get pinned once for every transaction that is committed to the log
+buffers. Hence items that are relogged in the log buffers will have a pin count
+for every outstanding transaction they were dirtied in. When each of these
+transactions is completed, they will unpin the item once. As a result, the item
+only becomes unpinned when all the transactions complete and there are no
+pending transactions. Thus the pinning and unpinning of a log item is symmetric
+as there is a 1:1 relationship with transaction commit and log item completion.
+
+For delayed logging, however, we have an assymetric transaction commit to
+completion relationship. Every time an object is relogged in the CIL it goes
+through the commit process without a corresponding completion being registered.
+That is, we now have a many-to-one relationship between transaction commit and
+log item completion. The result of this is that pinning and unpinning of the
+log items becomes unbalanced if we retain the "pin on transaction commit, unpin
+on transaction completion" model.
+
+To keep pin/unpin symmetry, the algorithm needs to change to a "pin on
+insertion into the CIL, unpin on checkpoint completion". In other words, the
+pinning and unpinning becomes symmetric around a checkpoint context. We have to
+pin the object the first time it is inserted into the CIL - if it is already in
+the CIL during a transaction commit, then we do not pin it again. Because there
+can be multiple outstanding checkpoint contexts, we can still see elevated pin
+counts, but as each checkpoint completes the pin count will retain the correct
+value according to it's context.
+
+Just to make matters more slightly more complex, this checkpoint level context
+for the pin count means that the pinning of an item must take place under the
+CIL commit/flush lock. If we pin the object outside this lock, we cannot
+guarantee which context the pin count is associated with. This is because of
+the fact pinning the item is dependent on whether the item is present in the
+current CIL or not. If we don't pin the CIL first before we check and pin the
+object, we have a race with CIL being flushed between the check and the pin
+(or not pinning, as the case may be). Hence we must hold the CIL flush/commit
+lock to guarantee that we pin the items correctly.
+
+Delayed Logging: Concurrent Scalability
+
+A fundamental requirement for the CIL is that accesses through transaction
+commits must scale to many concurrent commits. The current transaction commit
+code does not break down even when there are transactions coming from 2048
+processors at once. The current transaction code does not go any faster than if
+there was only one CPU using it, but it does not slow down either.
+
+As a result, the delayed logging transaction commit code needs to be designed
+for concurrency from the ground up. It is obvious that there are serialisation
+points in the design - the three important ones are:
+
+ 1. Locking out new transaction commits while flushing the CIL
+ 2. Adding items to the CIL and updating item space accounting
+ 3. Checkpoint commit ordering
+
+Looking at the transaction commit and CIL flushing interactions, it is clear
+that we have a many-to-one interaction here. That is, the only restriction on
+the number of concurrent transactions that can be trying to commit at once is
+the amount of space available in the log for their reservations. The practical
+limit here is in the order of several hundred concurrent transactions for a
+128MB log, which means that it is generally one per CPU in a machine.
+
+The amount of time a transaction commit needs to hold out a flush is a
+relatively long period of time - the pinning of log items needs to be done
+while we are holding out a CIL flush, so at the moment that means it is held
+across the formatting of the objects into memory buffers (i.e. while memcpy()s
+are in progress). Ultimately a two pass algorithm where the formatting is done
+separately to the pinning of objects could be used to reduce the hold time of
+the transaction commit side.
+
+Because of the number of potential transaction commit side holders, the lock
+really needs to be a sleeping lock - if the CIL flush takes the lock, we do not
+want every other CPU in the machine spinning on the CIL lock. Given that
+flushing the CIL could involve walking a list of tens of thousands of log
+items, it will get held for a significant time and so spin contention is a
+significant concern. Preventing lots of CPUs spinning doing nothing is the
+main reason for choosing a sleeping lock even though nothing in either the
+transaction commit or CIL flush side sleeps with the lock held.
+
+It should also be noted that CIL flushing is also a relatively rare operation
+compared to transaction commit for asynchronous transaction workloads - only
+time will tell if using a read-write semaphore for exclusion will limit
+transaction commit concurrency due to cache line bouncing of the lock on the
+read side.
+
+The second serialisation point is on the transaction commit side where items
+are inserted into the CIL. Because transactions can enter this code
+concurrently, the CIL needs to be protected separately from the above
+commit/flush exclusion. It also needs to be an exclusive lock but it is only
+held for a very short time and so a spin lock is appropriate here. It is
+possible that this lock will become a contention point, but given the short
+hold time once per transaction I think that contention is unlikely.
+
+The final serialisation point is the checkpoint commit record ordering code
+that is run as part of the checkpoint commit and log force sequencing. The code
+path that triggers a CIL flush (i.e. whatever triggers the log force) will enter
+an ordering loop after writing all the log vectors into the log buffers but
+before writing the commit record. This loop walks the list of committing
+checkpoints and needs to block waiting for checkpoints to complete their commit
+record write. As a result it needs a lock and a wait variable. Log force
+sequencing also requires the same lock, list walk, and blocking mechanism to
+ensure completion of checkpoints.
+
+These two sequencing operations can use the mechanism even though the
+events they are waiting for are different. The checkpoint commit record
+sequencing needs to wait until checkpoint contexts contain a commit LSN
+(obtained through completion of a commit record write) while log force
+sequencing needs to wait until previous checkpoint contexts are removed from
+the committing list (i.e. they've completed). A simple wait variable and
+broadcast wakeups (thundering herds) has been used to implement these two
+serialisation queues. They use the same lock as the CIL, too. If we see too
+much contention on the CIL lock, or too many context switches as a result of
+the broadcast wakeups these operations can be put under a new spinlock and
+given separate wait lists to reduce lock contention and the number of processes
+woken by the wrong event.
+
+
+Lifecycle Changes
+
+The existing log item life cycle is as follows:
+
+ 1. Transaction allocate
+ 2. Transaction reserve
+ 3. Lock item
+ 4. Join item to transaction
+ If not already attached,
+ Allocate log item
+ Attach log item to owner item
+ Attach log item to transaction
+ 5. Modify item
+ Record modifications in log item
+ 6. Transaction commit
+ Pin item in memory
+ Format item into log buffer
+ Write commit LSN into transaction
+ Unlock item
+ Attach transaction to log buffer
+
+ <log buffer IO dispatched>
+ <log buffer IO completes>
+
+ 7. Transaction completion
+ Mark log item committed
+ Insert log item into AIL
+ Write commit LSN into log item
+ Unpin log item
+ 8. AIL traversal
+ Lock item
+ Mark log item clean
+ Flush item to disk
+
+ <item IO completion>
+
+ 9. Log item removed from AIL
+ Moves log tail
+ Item unlocked
+
+Essentially, steps 1-6 operate independently from step 7, which is also
+independent of steps 8-9. An item can be locked in steps 1-6 or steps 8-9
+at the same time step 7 is occurring, but only steps 1-6 or 8-9 can occur
+at the same time. If the log item is in the AIL or between steps 6 and 7
+and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
+are entered and completed is the object considered clean.
+
+With delayed logging, there are new steps inserted into the life cycle:
+
+ 1. Transaction allocate
+ 2. Transaction reserve
+ 3. Lock item
+ 4. Join item to transaction
+ If not already attached,
+ Allocate log item
+ Attach log item to owner item
+ Attach log item to transaction
+ 5. Modify item
+ Record modifications in log item
+ 6. Transaction commit
+ Pin item in memory if not pinned in CIL
+ Format item into log vector + buffer
+ Attach log vector and buffer to log item
+ Insert log item into CIL
+ Write CIL context sequence into transaction
+ Unlock item
+
+ <next log force>
+
+ 7. CIL push
+ lock CIL flush
+ Chain log vectors and buffers together
+ Remove items from CIL
+ unlock CIL flush
+ write log vectors into log
+ sequence commit records
+ attach checkpoint context to log buffer
+
+ <log buffer IO dispatched>
+ <log buffer IO completes>
+
+ 8. Checkpoint completion
+ Mark log item committed
+ Insert item into AIL
+ Write commit LSN into log item
+ Unpin log item
+ 9. AIL traversal
+ Lock item
+ Mark log item clean
+ Flush item to disk
+ <item IO completion>
+ 10. Log item removed from AIL
+ Moves log tail
+ Item unlocked
+
+From this, it can be seen that the only life cycle differences between the two
+logging methods are in the middle of the life cycle - they still have the same
+beginning and end and execution constraints. The only differences are in the
+commiting of the log items to the log itself and the completion processing.
+Hence delayed logging should not introduce any constraints on log item
+behaviour, allocation or freeing that don't already exist.
+
+As a result of this zero-impact "insertion" of delayed logging infrastructure
+and the design of the internal structures to avoid on disk format changes, we
+can basically switch between delayed logging and the existing mechanism with a
+mount option. Fundamentally, there is no reason why the log manager would not
+be able to swap methods automatically and transparently depending on load
+characteristics, but this should not be necessary if delayed logging works as
+designed.
+
+Roadmap:
+
+2.6.35 Inclusion in mainline as an experimental mount option
+ => approximately 2-3 months to merge window
+ => needs to be in xfs-dev tree in 4-6 weeks
+ => code is nearing readiness for review
+
+2.6.37 Remove experimental tag from mount option
+ => should be roughly 6 months after initial merge
+ => enough time to:
+ => gain confidence and fix problems reported by early
+ adopters (a.k.a. guinea pigs)
+ => address worst performance regressions and undesired
+ behaviours
+ => start tuning/optimising code for parallelism
+ => start tuning/optimising algorithms consuming
+ excessive CPU time
+
+2.6.39 Switch default mount option to use delayed logging
+ => should be roughly 12 months after initial merge
+ => enough time to shake out remaining problems before next round of
+ enterprise distro kernel rebases
advansys= [HW,SCSI]
See header of drivers/scsi/advansys.c.
- advwdt= [HW,WDT] Advantech WDT
- Format: <iostart>,<iostop>
-
aedsp16= [HW,OSS] Audio Excel DSP 16
Format: <io>,<irq>,<dma>,<mss_io>,<mpu_io>,<mpu_irq>
See also header of sound/oss/aedsp16.c.
This option is obsoleted by the "netdev=" option, which
has equivalent usage. See its documentation for details.
- eurwdt= [HW,WDT] Eurotech CPU-1220/1410 onboard watchdog.
- Format: <io>[,<irq>]
-
failslab=
fail_page_alloc=
fail_make_request=[KNL]
sched_debug [KNL] Enables verbose scheduler debug messages.
- sc1200wdt= [HW,WDT] SC1200 WDT (watchdog) driver
- Format: <io>[,<timeout>[,<isapnp>]]
-
scsi_debug_*= [SCSI]
See drivers/scsi/scsi_debug.c.
wd7000= [HW,SCSI]
See header of drivers/scsi/wd7000.c.
- wdt= [WDT] Watchdog
- See Documentation/watchdog/wdt.txt.
+ watchdog timers [HW,WDT] For information on watchdog timers,
+ see Documentation/watchdog/watchdog-parameters.txt
+ or other driver-specific files in the
+ Documentation/watchdog/ directory.
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
Currently, these files are in /proc/sys/vm:
- block_dump
+- compact_memory
- dirty_background_bytes
- dirty_background_ratio
- dirty_bytes
- dirty_ratio
- dirty_writeback_centisecs
- drop_caches
+- extfrag_threshold
- hugepages_treat_as_movable
- hugetlb_shm_group
- laptop_mode
==============================================================
+compact_memory
+
+Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
+all zones are compacted such that free memory is available in contiguous
+blocks where possible. This can be important for example in the allocation of
+huge pages although processes will also directly compact memory as required.
+
+==============================================================
+
dirty_background_bytes
Contains the amount of dirty memory at which the pdflush background writeback
==============================================================
+extfrag_threshold
+
+This parameter affects whether the kernel will compact memory or direct
+reclaim to satisfy a high-order allocation. /proc/extfrag_index shows what
+the fragmentation index for each order is in each zone in the system. Values
+tending towards 0 imply allocations would fail due to lack of memory,
+values towards 1000 imply failures are due to fragmentation and -1 implies
+that the allocation will succeed as long as watermarks are met.
+
+The kernel will not compact memory in a zone if the
+fragmentation index is <= extfrag_threshold. The default value is 500.
+
+==============================================================
+
hugepages_treat_as_movable
This parameter is only useful when kernelcore= is specified at boot time to
00-INDEX
- this file.
+hpwdt.txt
+ - information on the HP iLO2 NMI watchdog
pcwd-watchdog.txt
- documentation for Berkshire Products PC Watchdog ISA cards.
src/
- directory holding watchdog related example programs.
watchdog-api.txt
- description of the Linux Watchdog driver API.
+watchdog-parameters.txt
+ - information on driver parameters (for drivers other than
+ the ones that have driver-specific files here)
wdt.txt
- description of the Watchdog Timer Interfaces for Linux.
--- /dev/null
+This file provides information on the module parameters of many of
+the Linux watchdog drivers. Watchdog driver parameter specs should
+be listed here unless the driver has its own driver-specific information
+file.
+
+
+See Documentation/kernel-parameters.txt for information on
+providing kernel parameters for builtin drivers versus loadable
+modules.
+
+
+-------------------------------------------------
+acquirewdt:
+wdt_stop: Acquire WDT 'stop' io port (default 0x43)
+wdt_start: Acquire WDT 'start' io port (default 0x443)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+advantechwdt:
+wdt_stop: Advantech WDT 'stop' io port (default 0x443)
+wdt_start: Advantech WDT 'start' io port (default 0x443)
+timeout: Watchdog timeout in seconds. 1<= timeout <=63, default=60.
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+alim1535_wdt:
+timeout: Watchdog timeout in seconds. (0 < timeout < 18000, default=60
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+alim7101_wdt:
+timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30
+use_gpio: Use the gpio watchdog (required by old cobalt boards).
+ default=0/off/no
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+ar7_wdt:
+margin: Watchdog margin in seconds (default=60)
+nowayout: Disable watchdog shutdown on close
+ (default=kernel config parameter)
+-------------------------------------------------
+at32ap700x_wdt:
+timeout: Timeout value. Limited to be 1 or 2 seconds. (default=2)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+at91rm9200_wdt:
+wdt_time: Watchdog time in seconds. (default=5)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+at91sam9_wdt:
+heartbeat: Watchdog heartbeats in seconds. (default = 15)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+bcm47xx_wdt:
+wdt_time: Watchdog time in seconds. (default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+bfin_wdt:
+timeout: Watchdog timeout in seconds. (1<=timeout<=((2^32)/SCLK), default=20)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+coh901327_wdt:
+margin: Watchdog margin in seconds (default 60s)
+-------------------------------------------------
+cpu5wdt:
+port: base address of watchdog card, default is 0x91
+verbose: be verbose, default is 0 (no)
+ticks: count down ticks, default is 10000
+-------------------------------------------------
+cpwd:
+wd0_timeout: Default watchdog0 timeout in 1/10secs
+wd1_timeout: Default watchdog1 timeout in 1/10secs
+wd2_timeout: Default watchdog2 timeout in 1/10secs
+-------------------------------------------------
+davinci_wdt:
+heartbeat: Watchdog heartbeat period in seconds from 1 to 600, default 60
+-------------------------------------------------
+ep93xx_wdt:
+nowayout: Watchdog cannot be stopped once started
+timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=TBD)
+-------------------------------------------------
+eurotechwdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+io: Eurotech WDT io port (default=0x3f0)
+irq: Eurotech WDT irq (default=10)
+ev: Eurotech WDT event type (default is `int')
+-------------------------------------------------
+gef_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+geodewdt:
+timeout: Watchdog timeout in seconds. 1<= timeout <=131, default=60.
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+i6300esb:
+heartbeat: Watchdog heartbeat in seconds. (1<heartbeat<2046, default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+iTCO_wdt:
+heartbeat: Watchdog heartbeat in seconds.
+ (2<heartbeat<39 (TCO v1) or 613 (TCO v2), default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+iTCO_vendor_support:
+vendorsupport: iTCO vendor specific support mode, default=0 (none),
+ 1=SuperMicro Pent3, 2=SuperMicro Pent4+, 911=Broken SMI BIOS
+-------------------------------------------------
+ib700wdt:
+timeout: Watchdog timeout in seconds. 0<= timeout <=30, default=30.
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+ibmasr:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+indydog:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+iop_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+it8712f_wdt:
+margin: Watchdog margin in seconds (default 60)
+nowayout: Disable watchdog shutdown on close
+ (default=kernel config parameter)
+-------------------------------------------------
+it87_wdt:
+nogameport: Forbid the activation of game port, default=0
+exclusive: Watchdog exclusive device open, default=1
+timeout: Watchdog timeout in seconds, default=60
+testmode: Watchdog test mode (1 = no reboot), default=0
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+ixp2000_wdt:
+heartbeat: Watchdog heartbeat in seconds (default 60s)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+ixp4xx_wdt:
+heartbeat: Watchdog heartbeat in seconds (default 60s)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+ks8695_wdt:
+wdt_time: Watchdog time in seconds. (default=5)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+machzwd:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+action: after watchdog resets, generate:
+ 0 = RESET(*) 1 = SMI 2 = NMI 3 = SCI
+-------------------------------------------------
+max63xx_wdt:
+heartbeat: Watchdog heartbeat period in seconds from 1 to 60, default 60
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+nodelay: Force selection of a timeout setting without initial delay
+ (max6373/74 only, default=0)
+-------------------------------------------------
+mixcomwd:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+mpc8xxx_wdt:
+timeout: Watchdog timeout in ticks. (0<timeout<65536, default=65535)
+reset: Watchdog Interrupt/Reset Mode. 0 = interrupt, 1 = reset
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+mpcore_wdt:
+mpcore_margin: MPcore timer margin in seconds.
+ (0 < mpcore_margin < 65536, default=60)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+mpcore_noboot: MPcore watchdog action, set to 1 to ignore reboots,
+ 0 to reboot (default=0
+-------------------------------------------------
+mv64x60_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+nuc900_wdt:
+heartbeat: Watchdog heartbeats in seconds.
+ (default = 15)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+omap_wdt:
+timer_margin: initial watchdog timeout (in seconds)
+-------------------------------------------------
+orion_wdt:
+heartbeat: Initial watchdog heartbeat in seconds
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+pc87413_wdt:
+io: pc87413 WDT I/O port (default: io).
+timeout: Watchdog timeout in minutes (default=timeout).
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+pika_wdt:
+heartbeat: Watchdog heartbeats in seconds. (default = 15)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+pnx4008_wdt:
+heartbeat: Watchdog heartbeat period in seconds from 1 to 60, default 19
+nowayout: Set to 1 to keep watchdog running after device release
+-------------------------------------------------
+pnx833x_wdt:
+timeout: Watchdog timeout in Mhz. (68Mhz clock), default=2040000000 (30 seconds)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+start_enabled: Watchdog is started on module insertion (default=1)
+-------------------------------------------------
+rc32434_wdt:
+timeout: Watchdog timeout value, in seconds (default=20)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+riowd:
+riowd_timeout: Watchdog timeout in minutes (default=1)
+-------------------------------------------------
+s3c2410_wdt:
+tmr_margin: Watchdog tmr_margin in seconds. (default=15)
+tmr_atboot: Watchdog is started at boot time if set to 1, default=0
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+soft_noboot: Watchdog action, set to 1 to ignore reboots, 0 to reboot
+debug: Watchdog debug, set to >1 for debug, (default 0)
+-------------------------------------------------
+sa1100_wdt:
+margin: Watchdog margin in seconds (default 60s)
+-------------------------------------------------
+sb_wdog:
+timeout: Watchdog timeout in microseconds (max/default 8388607 or 8.3ish secs)
+-------------------------------------------------
+sbc60xxwdt:
+wdt_stop: SBC60xx WDT 'stop' io port (default 0x45)
+wdt_start: SBC60xx WDT 'start' io port (default 0x443)
+timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+sbc7240_wdt:
+timeout: Watchdog timeout in seconds. (1<=timeout<=255, default=30)
+nowayout: Disable watchdog when closing device file
+-------------------------------------------------
+sbc8360:
+timeout: Index into timeout table (0-63) (default=27 (60s))
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+sbc_epx_c3:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+sbc_fitpc2_wdt:
+margin: Watchdog margin in seconds (default 60s)
+nowayout: Watchdog cannot be stopped once started
+-------------------------------------------------
+sc1200wdt:
+isapnp: When set to 0 driver ISA PnP support will be disabled (default=1)
+io: io port
+timeout: range is 0-255 minutes, default is 1
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+sc520_wdt:
+timeout: Watchdog timeout in seconds. (1 <= timeout <= 3600, default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+sch311x_wdt:
+force_id: Override the detected device ID
+therm_trip: Should a ThermTrip trigger the reset generator
+timeout: Watchdog timeout in seconds. 1<= timeout <=15300, default=60
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+scx200_wdt:
+margin: Watchdog margin in seconds
+nowayout: Disable watchdog shutdown on close
+-------------------------------------------------
+shwdt:
+clock_division_ratio: Clock division ratio. Valid ranges are from 0x5 (1.31ms)
+ to 0x7 (5.25ms). (default=7)
+heartbeat: Watchdog heartbeat in seconds. (1 <= heartbeat <= 3600, default=30
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+smsc37b787_wdt:
+timeout: range is 1-255 units, default is 60
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+softdog:
+soft_margin: Watchdog soft_margin in seconds.
+ (0 < soft_margin < 65536, default=60)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+soft_noboot: Softdog action, set to 1 to ignore reboots, 0 to reboot
+ (default=0)
+-------------------------------------------------
+stmp3xxx_wdt:
+heartbeat: Watchdog heartbeat period in seconds from 1 to 4194304, default 19
+-------------------------------------------------
+ts72xx_wdt:
+timeout: Watchdog timeout in seconds. (1 <= timeout <= 8, default=8)
+nowayout: Disable watchdog shutdown on close
+-------------------------------------------------
+twl4030_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+txx9wdt:
+timeout: Watchdog timeout in seconds. (0<timeout<N, default=60)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+w83627hf_wdt:
+wdt_io: w83627hf/thf WDT io port (default 0x2E)
+timeout: Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+w83697hf_wdt:
+wdt_io: w83697hf/hg WDT io port (default 0x2e, 0 = autodetect)
+timeout: Watchdog timeout in seconds. 1<= timeout <=255 (default=60)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+early_disable: Watchdog gets disabled at boot time (default=1)
+-------------------------------------------------
+w83697ug_wdt:
+wdt_io: w83697ug/uf WDT io port (default 0x2e)
+timeout: Watchdog timeout in seconds. 1<= timeout <=255 (default=60)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+w83877f_wdt:
+timeout: Watchdog timeout in seconds. (1<=timeout<=3600, default=30)
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+w83977f_wdt:
+timeout: Watchdog timeout in seconds (15..7635), default=45)
+testmode: Watchdog testmode (1 = no reboot), default=0
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+wafer5823wdt:
+timeout: Watchdog timeout in seconds. 1 <= timeout <= 255, default=60.
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+wdt285:
+soft_margin: Watchdog timeout in seconds (default=60)
+-------------------------------------------------
+wdt977:
+timeout: Watchdog timeout in seconds (60..15300, default=60)
+testmode: Watchdog testmode (1 = no reboot), default=0
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+wm831x_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
+wm8350_wdt:
+nowayout: Watchdog cannot be stopped once started
+ (default=kernel config parameter)
+-------------------------------------------------
boards physically pull the machine down off their own onboard timers and
will reboot from almost anything.
-A second temperature monitoring interface is available on the WDT501P cards
+A second temperature monitoring interface is available on the WDT501P cards.
This provides /dev/temperature. This is the machine internal temperature in
degrees Fahrenheit. Each read returns a single byte giving the temperature.
The third interface logs kernel messages on additional alert events.
-The wdt card cannot be safely probed for. Instead you need to pass
-wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11".
+The ICS ISA-bus wdt card cannot be safely probed for. Instead you need to
+pass IO address and IRQ boot parameters. E.g.:
+ wdt.io=0x240 wdt.irq=11
+
+Other "wdt" driver parameters are:
+ heartbeat Watchdog heartbeat in seconds (default 60)
+ nowayout Watchdog cannot be stopped once started (kernel
+ build parameter)
+ tachometer WDT501-P Fan Tachometer support (0=disable, default=0)
+ type WDT501-P Card type (500 or 501, default=500)
Features
--------
Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c
-
#define UDIV_NEEDS_NORMALIZATION 1
#define abort() goto bad_insn
-
-#ifndef __LITTLE_ENDIAN
-#define __LITTLE_ENDIAN -1
-#endif
-#define __BYTE_ORDER __LITTLE_ENDIAN
#define S3C2410_RTCCON_CLKSEL (1<<1)
#define S3C2410_RTCCON_CNTSEL (1<<2)
#define S3C2410_RTCCON_CLKRST (1<<3)
+#define S3C64XX_RTCCON_TICEN (1<<8)
+
+#define S3C64XX_RTCCON_TICMSK (0xF<<7)
+#define S3C64XX_RTCCON_TICSHT (7)
#define S3C2410_TICNT S3C2410_RTCREG(0x44)
#define S3C2410_TICNT_ENABLE (1<<7)
#define L1_CACHE_SHIFT (CONFIG_FRV_L1_CACHE_SHIFT)
#define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
+#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
+
#define __cacheline_aligned __attribute__((aligned(L1_CACHE_BYTES)))
#define ____cacheline_aligned __attribute__((aligned(L1_CACHE_BYTES)))
#ifndef __ASM_GDB_STUB_H
#define __ASM_GDB_STUB_H
+#undef GDBSTUB_DEBUG_IO
#undef GDBSTUB_DEBUG_PROTOCOL
#include <asm/ptrace.h>
extern void debug_to_serial(const char *p, int n);
extern void console_set_baud(unsigned baud);
+#ifdef GDBSTUB_DEBUG_IO
+#define gdbstub_io(FMT,...) gdbstub_printk(FMT, ##__VA_ARGS__)
+#else
+#define gdbstub_io(FMT,...) ({ 0; })
+#endif
+
#ifdef GDBSTUB_DEBUG_PROTOCOL
#define gdbstub_proto(FMT,...) gdbstub_printk(FMT,##__VA_ARGS__)
#else
return -EINTR;
}
else if (st & (UART_LSR_FE|UART_LSR_OE|UART_LSR_PE)) {
- gdbstub_proto("### GDB Rx Error (st=%02x) ###\n",st);
+ gdbstub_io("### GDB Rx Error (st=%02x) ###\n",st);
return -EIO;
}
else {
- gdbstub_proto("### GDB Rx %02x (st=%02x) ###\n",ch,st);
+ gdbstub_io("### GDB Rx %02x (st=%02x) ###\n",ch,st);
*_ch = ch & 0x7f;
return 0;
}
} /* end gdbstub_get_mmu_state() */
+/*
+ * handle general query commands of the form 'qXXXXX'
+ */
+static void gdbstub_handle_query(void)
+{
+ if (strcmp(input_buffer, "qAttached") == 0) {
+ /* return current thread ID */
+ sprintf(output_buffer, "1");
+ return;
+ }
+
+ if (strcmp(input_buffer, "qC") == 0) {
+ /* return current thread ID */
+ sprintf(output_buffer, "QC 0");
+ return;
+ }
+
+ if (strcmp(input_buffer, "qOffsets") == 0) {
+ /* return relocation offset of text and data segments */
+ sprintf(output_buffer, "Text=0;Data=0;Bss=0");
+ return;
+ }
+
+ if (strcmp(input_buffer, "qSymbol::") == 0) {
+ sprintf(output_buffer, "OK");
+ return;
+ }
+
+ if (strcmp(input_buffer, "qSupported") == 0) {
+ /* query of supported features */
+ sprintf(output_buffer, "PacketSize=%u;ReverseContinue-;ReverseStep-",
+ sizeof(input_buffer));
+ return;
+ }
+
+ gdbstub_strcpy(output_buffer,"E01");
+}
+
/*****************************************************************************/
/*
* handle event interception and GDB remote protocol processing
case 'k' :
goto done; /* just continue */
+ /* detach */
+ case 'D':
+ gdbstub_strcpy(output_buffer, "OK");
+ break;
/* reset the whole machine (FIXME: system dependent) */
case 'r':
__debug_status.dcr |= DCR_SE;
goto done;
+ /* extended command */
+ case 'v':
+ if (strcmp(input_buffer, "vCont?") == 0) {
+ output_buffer[0] = 0;
+ break;
+ }
+ goto unsupported_cmd;
+
/* set baud rate (bBB) */
case 'b':
ptr = &input_buffer[1];
gdbstub_strcpy(output_buffer,"OK");
break;
+ /* Thread-setting packet */
+ case 'H':
+ gdbstub_strcpy(output_buffer, "OK");
+ break;
+
+ case 'q':
+ gdbstub_handle_query();
+ break;
+
default:
+ unsupported_cmd:
gdbstub_proto("### GDB Unsupported Cmd '%s'\n",input_buffer);
+ gdbstub_strcpy(output_buffer,"E01");
break;
}
-/* MN10300 Atomic counter operations
- *
- * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
- */
-#ifndef _ASM_ATOMIC_H
-#define _ASM_ATOMIC_H
-
-#ifdef CONFIG_SMP
-#error not SMP safe
-#endif
-
-/*
- * Atomic operations that C can't guarantee us. Useful for
- * resource counting etc..
- */
-
-#define ATOMIC_INIT(i) { (i) }
-
-#ifdef __KERNEL__
-
-/**
- * atomic_read - read atomic variable
- * @v: pointer of type atomic_t
- *
- * Atomically reads the value of @v. Note that the guaranteed
- * useful range of an atomic_t is only 24 bits.
- */
-#define atomic_read(v) (*(volatile int *)&(v)->counter)
-
-/**
- * atomic_set - set atomic variable
- * @v: pointer of type atomic_t
- * @i: required value
- *
- * Atomically sets the value of @v to @i. Note that the guaranteed
- * useful range of an atomic_t is only 24 bits.
- */
-#define atomic_set(v, i) (((v)->counter) = (i))
-
-#include <asm/system.h>
-
-/**
- * atomic_add_return - add integer to atomic variable
- * @i: integer value to add
- * @v: pointer of type atomic_t
- *
- * Atomically adds @i to @v and returns the result
- * Note that the guaranteed useful range of an atomic_t is only 24 bits.
- */
-static inline int atomic_add_return(int i, atomic_t *v)
-{
- unsigned long flags;
- int temp;
-
- local_irq_save(flags);
- temp = v->counter;
- temp += i;
- v->counter = temp;
- local_irq_restore(flags);
-
- return temp;
-}
-
-/**
- * atomic_sub_return - subtract integer from atomic variable
- * @i: integer value to subtract
- * @v: pointer of type atomic_t
- *
- * Atomically subtracts @i from @v and returns the result
- * Note that the guaranteed useful range of an atomic_t is only 24 bits.
- */
-static inline int atomic_sub_return(int i, atomic_t *v)
-{
- unsigned long flags;
- int temp;
-
- local_irq_save(flags);
- temp = v->counter;
- temp -= i;
- v->counter = temp;
- local_irq_restore(flags);
-
- return temp;
-}
-
-static inline int atomic_add_negative(int i, atomic_t *v)
-{
- return atomic_add_return(i, v) < 0;
-}
-
-static inline void atomic_add(int i, atomic_t *v)
-{
- atomic_add_return(i, v);
-}
-
-static inline void atomic_sub(int i, atomic_t *v)
-{
- atomic_sub_return(i, v);
-}
-
-static inline void atomic_inc(atomic_t *v)
-{
- atomic_add_return(1, v);
-}
-
-static inline void atomic_dec(atomic_t *v)
-{
- atomic_sub_return(1, v);
-}
-
-#define atomic_dec_return(v) atomic_sub_return(1, (v))
-#define atomic_inc_return(v) atomic_add_return(1, (v))
-
-#define atomic_sub_and_test(i, v) (atomic_sub_return((i), (v)) == 0)
-#define atomic_dec_and_test(v) (atomic_sub_return(1, (v)) == 0)
-#define atomic_inc_and_test(v) (atomic_add_return(1, (v)) == 0)
-
-#define atomic_add_unless(v, a, u) \
-({ \
- int c, old; \
- c = atomic_read(v); \
- while (c != (u) && (old = atomic_cmpxchg((v), c, c + (a))) != c) \
- c = old; \
- c != (u); \
-})
-
-#define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
-
-static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
-{
- unsigned long flags;
-
- mask = ~mask;
- local_irq_save(flags);
- *addr &= mask;
- local_irq_restore(flags);
-}
-
-#define atomic_xchg(ptr, v) (xchg(&(ptr)->counter, (v)))
-#define atomic_cmpxchg(v, old, new) (cmpxchg(&((v)->counter), (old), (new)))
-
-/* Atomic operations are already serializing on MN10300??? */
-#define smp_mb__before_atomic_dec() barrier()
-#define smp_mb__after_atomic_dec() barrier()
-#define smp_mb__before_atomic_inc() barrier()
-#define smp_mb__after_atomic_inc() barrier()
-
-#include <asm-generic/atomic-long.h>
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_ATOMIC_H */
+#include <asm-generic/atomic.h>
#define L1_CACHE_DISPARITY L1_CACHE_NENTRIES * L1_CACHE_BYTES
#endif
+#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
+
/* data cache purge registers
* - read from the register to unconditionally purge that cache line
* - write address & 0xffffff00 to conditionally purge that cache line
#define abort() \
return 0
-#ifdef __BIG_ENDIAN
-#define __BYTE_ORDER __BIG_ENDIAN
-#else
-#define __BYTE_ORDER __LITTLE_ENDIAN
-#endif
-
/* Exception flags. */
#define EFLAG_INVALID (1 << (31 - 2))
#define EFLAG_OVERFLOW (1 << (31 - 3))
#define UDIV_NEEDS_NORMALIZATION 0
#define abort() return 0
-
-#define __BYTE_ORDER __BIG_ENDIAN
} while (0)
#define abort() return 0
-
-#define __BYTE_ORDER __LITTLE_ENDIAN
-
-
#define abort() \
return 0
-
-#ifdef __BIG_ENDIAN
-#define __BYTE_ORDER __BIG_ENDIAN
-#else
-#define __BYTE_ORDER __LITTLE_ENDIAN
-#endif
#define abort() \
return 0
-
-#ifdef __BIG_ENDIAN
-#define __BYTE_ORDER __BIG_ENDIAN
-#else
-#define __BYTE_ORDER __LITTLE_ENDIAN
-#endif
-#if BYTE_ORDER == LITTLE_ENDIAN
+#if __BYTE_ORDER == __LITTLE_ENDIAN
#define le16_to_cpu(val) (val)
#define le32_to_cpu(val) (val)
#endif
-#if BYTE_ORDER == BIG_ENDIAN
+#if __BYTE_ORDER == __BIG_ENDIAN
#define le16_to_cpu(val) bswap_16(val)
#define le32_to_cpu(val) bswap_32(val)
#endif
#define MSR_IA32_MISC_ENABLE 0x000001a0
+#define MSR_IA32_TEMPERATURE_TARGET 0x000001a2
+
/* MISC_ENABLE bits: architectural */
#define MSR_IA32_MISC_ENABLE_FAST_STRING (1ULL << 0)
#define MSR_IA32_MISC_ENABLE_TCC (1ULL << 1)
# define CACHE_WAY_SIZE ICACHE_WAY_SIZE
#endif
+#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
#endif /* _XTENSA_CACHE_H */
#ifndef _XTENSA_HARDIRQ_H
#define _XTENSA_HARDIRQ_H
-#include <linux/cache.h>
-#include <asm/irq.h>
-
-/* headers.S is sensitive to the offsets of these fields */
-typedef struct {
- unsigned int __softirq_pending;
- unsigned int __syscall_count;
- struct task_struct * __ksoftirqd_task; /* waitqueue is too large */
- unsigned int __nmi_count; /* arch dependent */
-} ____cacheline_aligned irq_cpustat_t;
-
void ack_bad_irq(unsigned int irq);
-#include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */
+#define ack_bad_irq ack_bad_irq
+
+#include <asm-generic/hardirq.h>
#endif /* _XTENSA_HARDIRQ_H */
atomic_t irq_err_count;
-/*
- * 'what should we do if we get a hw irq event on an illegal vector'.
- * each architecture has to answer this themselves.
- */
-void ack_bad_irq(unsigned int irq)
-{
- printk("unexpected IRQ trap at vector %02x\n", irq);
-}
-
/*
* do_IRQ handles all normal device IRQ's (the special
* SMP cross-CPU interrupts have their own specific
#include <linux/linkage.h>
#include <asm/ptrace.h>
-#include <asm/ptrace.h>
#include <asm/current.h>
#include <asm/asm-offsets.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/thread_info.h>
-#include <asm/processor.h>
#define WINDOW_VECTORS_SIZE 0x180
printk("\n");
}
-static u8 hex_val(unsigned char c)
-{
- return isdigit(c) ? c - '0' : toupper(c) - 'A' + 10;
-}
-
static acpi_status acpi_str_to_uuid(char *str, u8 *uuid)
{
int i;
return AE_BAD_PARAMETER;
}
for (i = 0; i < 16; i++) {
- uuid[i] = hex_val(str[opc_map_to_uuid[i]]) << 4;
- uuid[i] |= hex_val(str[opc_map_to_uuid[i] + 1]);
+ uuid[i] = hex_to_bin(str[opc_map_to_uuid[i]]) << 4;
+ uuid[i] |= hex_to_bin(str[opc_map_to_uuid[i] + 1]);
}
return AE_OK;
}
#define CFAG12864BFB_NAME "cfag12864bfb"
-static struct fb_fix_screeninfo cfag12864bfb_fix __initdata = {
+static struct fb_fix_screeninfo cfag12864bfb_fix __devinitdata = {
.id = "cfag12864b",
.type = FB_TYPE_PACKED_PIXELS,
.visual = FB_VISUAL_MONO10,
.accel = FB_ACCEL_NONE,
};
-static struct fb_var_screeninfo cfag12864bfb_var __initdata = {
+static struct fb_var_screeninfo cfag12864bfb_var __devinitdata = {
.xres = CFAG12864B_WIDTH,
.yres = CFAG12864B_HEIGHT,
.xres_virtual = CFAG12864B_WIDTH,
return ret;
}
-static int cfag12864bfb_remove(struct platform_device *device)
+static int __devexit cfag12864bfb_remove(struct platform_device *device)
{
struct fb_info *info = platform_get_drvdata(device);
static struct platform_driver cfag12864bfb_driver = {
.probe = cfag12864bfb_probe,
- .remove = cfag12864bfb_remove,
+ .remove = __devexit_p(cfag12864bfb_remove),
.driver = {
.name = CFAG12864BFB_NAME,
},
#include <linux/memory.h>
#include <linux/node.h>
#include <linux/hugetlb.h>
+#include <linux/compaction.h>
#include <linux/cpumask.h>
#include <linux/topology.h>
#include <linux/nodemask.h>
scan_unevictable_register_node(node);
hugetlb_register_node(node);
+
+ compaction_register_node(node);
}
return error;
}
#include <asm/uaccess.h>
#include <linux/sysrq.h>
#include <linux/timer.h>
+#include <linux/time.h>
-#define VERSION_STR "0.9.0"
+#define VERSION_STR "0.9.1"
#define DEFAULT_IOFENCE_MARGIN 60 /* Default fudge factor, in seconds */
#define DEFAULT_IOFENCE_TICK 180 /* Default timer timeout, in seconds */
#if defined(CONFIG_S390)
# define HAVE_MONOTONIC
# define TIMER_FREQ 1000000000ULL
-#elif defined(CONFIG_IA64)
-# define TIMER_FREQ ((unsigned long long)local_cpu_data->itc_freq)
#else
-# define TIMER_FREQ (HZ*loops_per_jiffy)
+# define TIMER_FREQ 1000000000ULL
#endif
#ifdef HAVE_MONOTONIC
#else
static inline unsigned long long monotonic_clock(void)
{
- return get_cycles();
+ struct timespec ts;
+ getrawmonotonic(&ts);
+ return timespec_to_ns(&ts);
}
#endif /* HAVE_MONOTONIC */
printk(KERN_CRIT "Hangcheck: hangcheck value past margin!\n");
}
}
+#if 0
+ /*
+ * Enable to investigate delays in detail
+ */
+ printk("Hangcheck: called %Ld ns since last time (%Ld ns overshoot)\n",
+ tsc_diff, tsc_diff - hangcheck_tick*TIMER_FREQ);
+#endif
mod_timer(&hangcheck_ticktock, jiffies + (hangcheck_tick*HZ));
hangcheck_tsc = monotonic_clock();
}
#if defined (HAVE_MONOTONIC)
printk("Hangcheck: Using monotonic_clock().\n");
#else
- printk("Hangcheck: Using get_cycles().\n");
+ printk("Hangcheck: Using getrawmonotonic().\n");
#endif /* HAVE_MONOTONIC */
hangcheck_tsc_margin =
(unsigned long long)(hangcheck_margin + hangcheck_tick);
"HVSI_WAIT_FOR_MCTRL_RESPONSE",
"HVSI_FSP_DIED",
};
- const char *name = state_names[hp->state];
-
- if (hp->state > ARRAY_SIZE(state_names))
- name = "UNKNOWN";
+ const char *name = (hp->state < ARRAY_SIZE(state_names))
+ ? state_names[hp->state] : "UNKNOWN";
pr_debug("hvsi%i: state = %s\n", hp->index, name);
#endif /* DEBUG */
old_fops = file->f_op;
file->f_op = new_fops;
if (file->f_op->open) {
+ file->private_data = c;
err=file->f_op->open(inode,file);
if (err) {
fops_put(file->f_op);
#include <linux/math64.h>
#define BUCKETS 12
+#define INTERVALS 8
#define RESOLUTION 1024
-#define DECAY 4
+#define DECAY 8
#define MAX_INTERESTING 50000
+#define STDDEV_THRESH 400
+
/*
* Concepts and ideas behind the menu governor
* indexed based on the magnitude of the expected duration as well as the
* "is IO outstanding" property.
*
+ * Repeatable-interval-detector
+ * ----------------------------
+ * There are some cases where "next timer" is a completely unusable predictor:
+ * Those cases where the interval is fixed, for example due to hardware
+ * interrupt mitigation, but also due to fixed transfer rate devices such as
+ * mice.
+ * For this, we use a different predictor: We track the duration of the last 8
+ * intervals and if the stand deviation of these 8 intervals is below a
+ * threshold value, we use the average of these intervals as prediction.
+ *
* Limiting Performance Impact
* ---------------------------
* C states, especially those with large exit latencies, can have a real
unsigned int exit_us;
unsigned int bucket;
u64 correction_factor[BUCKETS];
+ u32 intervals[INTERVALS];
+ int interval_ptr;
};
return div_u64(dividend + (divisor / 2), divisor);
}
+/*
+ * Try detecting repeating patterns by keeping track of the last 8
+ * intervals, and checking if the standard deviation of that set
+ * of points is below a threshold. If it is... then use the
+ * average of these 8 points as the estimated value.
+ */
+static void detect_repeating_patterns(struct menu_device *data)
+{
+ int i;
+ uint64_t avg = 0;
+ uint64_t stddev = 0; /* contains the square of the std deviation */
+
+ /* first calculate average and standard deviation of the past */
+ for (i = 0; i < INTERVALS; i++)
+ avg += data->intervals[i];
+ avg = avg / INTERVALS;
+
+ /* if the avg is beyond the known next tick, it's worthless */
+ if (avg > data->expected_us)
+ return;
+
+ for (i = 0; i < INTERVALS; i++)
+ stddev += (data->intervals[i] - avg) *
+ (data->intervals[i] - avg);
+
+ stddev = stddev / INTERVALS;
+
+ /*
+ * now.. if stddev is small.. then assume we have a
+ * repeating pattern and predict we keep doing this.
+ */
+
+ if (avg && stddev < STDDEV_THRESH)
+ data->predicted_us = avg;
+}
+
/**
* menu_select - selects the next idle state to enter
* @dev: the CPU
data->predicted_us = div_round64(data->expected_us * data->correction_factor[data->bucket],
RESOLUTION * DECAY);
+ detect_repeating_patterns(data);
+
/*
* We want to default to C1 (hlt), not to busy polling
* unless the timer is happening really really soon.
new_factor = 1;
data->correction_factor[data->bucket] = new_factor;
+
+ /* update the repeating-pattern data */
+ data->intervals[data->interval_ptr++] = last_idle_us;
+ if (data->interval_ptr >= INTERVALS)
+ data->interval_ptr = 0;
}
/**
static int td_fill_desc(struct timb_dma_chan *td_chan, u8 *dma_desc,
struct scatterlist *sg, bool last)
{
- if (sg_dma_len(sg) > USHORT_MAX) {
+ if (sg_dma_len(sg) > USHRT_MAX) {
dev_err(chan2dev(&td_chan->chan), "Too big sg element\n");
return -EINVAL;
}
This driver can also be built as a module. If so, the module
will be called ads7828.
+config SENSORS_ADS7871
+ tristate "Texas Instruments ADS7871 A/D converter"
+ depends on SPI
+ help
+ If you say yes here you get support for TI ADS7871 & ADS7870
+
+ This driver can also be built as a module. If so, the module
+ will be called ads7871.
+
config SENSORS_AMC6821
tristate "Texas Instruments AMC6821"
depends on I2C && EXPERIMENTAL
obj-$(CONFIG_SENSORS_ADM1031) += adm1031.o
obj-$(CONFIG_SENSORS_ADM9240) += adm9240.o
obj-$(CONFIG_SENSORS_ADS7828) += ads7828.o
+obj-$(CONFIG_SENSORS_ADS7871) += ads7871.o
obj-$(CONFIG_SENSORS_ADT7411) += adt7411.o
obj-$(CONFIG_SENSORS_ADT7462) += adt7462.o
obj-$(CONFIG_SENSORS_ADT7470) += adt7470.o
--- /dev/null
+/*
+ * ads7871 - driver for TI ADS7871 A/D converter
+ *
+ * Copyright (c) 2010 Paul Thomas <pthomas8589@gmail.com>
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 or
+ * later as publishhed by the Free Software Foundation.
+ *
+ * You need to have something like this in struct spi_board_info
+ * {
+ * .modalias = "ads7871",
+ * .max_speed_hz = 2*1000*1000,
+ * .chip_select = 0,
+ * .bus_num = 1,
+ * },
+ */
+
+/*From figure 18 in the datasheet*/
+/*Register addresses*/
+#define REG_LS_BYTE 0 /*A/D Output Data, LS Byte*/
+#define REG_MS_BYTE 1 /*A/D Output Data, MS Byte*/
+#define REG_PGA_VALID 2 /*PGA Valid Register*/
+#define REG_AD_CONTROL 3 /*A/D Control Register*/
+#define REG_GAIN_MUX 4 /*Gain/Mux Register*/
+#define REG_IO_STATE 5 /*Digital I/O State Register*/
+#define REG_IO_CONTROL 6 /*Digital I/O Control Register*/
+#define REG_OSC_CONTROL 7 /*Rev/Oscillator Control Register*/
+#define REG_SER_CONTROL 24 /*Serial Interface Control Register*/
+#define REG_ID 31 /*ID Register*/
+
+/*From figure 17 in the datasheet
+* These bits get ORed with the address to form
+* the instruction byte */
+/*Instruction Bit masks*/
+#define INST_MODE_bm (1<<7)
+#define INST_READ_bm (1<<6)
+#define INST_16BIT_bm (1<<5)
+
+/*From figure 18 in the datasheet*/
+/*bit masks for Rev/Oscillator Control Register*/
+#define MUX_CNV_bv 7
+#define MUX_CNV_bm (1<<MUX_CNV_bv)
+#define MUX_M3_bm (1<<3) /*M3 selects single ended*/
+#define MUX_G_bv 4 /*allows for reg = (gain << MUX_G_bv) | ...*/
+
+/*From figure 18 in the datasheet*/
+/*bit masks for Rev/Oscillator Control Register*/
+#define OSC_OSCR_bm (1<<5)
+#define OSC_OSCE_bm (1<<4)
+#define OSC_REFE_bm (1<<3)
+#define OSC_BUFE_bm (1<<2)
+#define OSC_R2V_bm (1<<1)
+#define OSC_RBG_bm (1<<0)
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/spi/spi.h>
+#include <linux/hwmon.h>
+#include <linux/hwmon-sysfs.h>
+#include <linux/err.h>
+#include <linux/mutex.h>
+#include <linux/delay.h>
+
+#define DEVICE_NAME "ads7871"
+
+struct ads7871_data {
+ struct device *hwmon_dev;
+ struct mutex update_lock;
+};
+
+static int ads7871_read_reg8(struct spi_device *spi, int reg)
+{
+ int ret;
+ reg = reg | INST_READ_bm;
+ ret = spi_w8r8(spi, reg);
+ return ret;
+}
+
+static int ads7871_read_reg16(struct spi_device *spi, int reg)
+{
+ int ret;
+ reg = reg | INST_READ_bm | INST_16BIT_bm;
+ ret = spi_w8r16(spi, reg);
+ return ret;
+}
+
+static int ads7871_write_reg8(struct spi_device *spi, int reg, u8 val)
+{
+ u8 tmp[2] = {reg, val};
+ return spi_write(spi, tmp, sizeof(tmp));
+}
+
+static ssize_t show_voltage(struct device *dev,
+ struct device_attribute *da, char *buf)
+{
+ struct spi_device *spi = to_spi_device(dev);
+ struct sensor_device_attribute *attr = to_sensor_dev_attr(da);
+ int ret, val, i = 0;
+ uint8_t channel, mux_cnv;
+
+ channel = attr->index;
+ /*TODO: add support for conversions
+ *other than single ended with a gain of 1*/
+ /*MUX_M3_bm forces single ended*/
+ /*This is also where the gain of the PGA would be set*/
+ ads7871_write_reg8(spi, REG_GAIN_MUX,
+ (MUX_CNV_bm | MUX_M3_bm | channel));
+
+ ret = ads7871_read_reg8(spi, REG_GAIN_MUX);
+ mux_cnv = ((ret & MUX_CNV_bm)>>MUX_CNV_bv);
+ /*on 400MHz arm9 platform the conversion
+ *is already done when we do this test*/
+ while ((i < 2) && mux_cnv) {
+ i++;
+ ret = ads7871_read_reg8(spi, REG_GAIN_MUX);
+ mux_cnv = ((ret & MUX_CNV_bm)>>MUX_CNV_bv);
+ msleep_interruptible(1);
+ }
+
+ if (mux_cnv == 0) {
+ val = ads7871_read_reg16(spi, REG_LS_BYTE);
+ /*result in volts*10000 = (val/8192)*2.5*10000*/
+ val = ((val>>2) * 25000) / 8192;
+ return sprintf(buf, "%d\n", val);
+ } else {
+ return -1;
+ }
+}
+
+static SENSOR_DEVICE_ATTR(in0_input, S_IRUGO, show_voltage, NULL, 0);
+static SENSOR_DEVICE_ATTR(in1_input, S_IRUGO, show_voltage, NULL, 1);
+static SENSOR_DEVICE_ATTR(in2_input, S_IRUGO, show_voltage, NULL, 2);
+static SENSOR_DEVICE_ATTR(in3_input, S_IRUGO, show_voltage, NULL, 3);
+static SENSOR_DEVICE_ATTR(in4_input, S_IRUGO, show_voltage, NULL, 4);
+static SENSOR_DEVICE_ATTR(in5_input, S_IRUGO, show_voltage, NULL, 5);
+static SENSOR_DEVICE_ATTR(in6_input, S_IRUGO, show_voltage, NULL, 6);
+static SENSOR_DEVICE_ATTR(in7_input, S_IRUGO, show_voltage, NULL, 7);
+
+static struct attribute *ads7871_attributes[] = {
+ &sensor_dev_attr_in0_input.dev_attr.attr,
+ &sensor_dev_attr_in1_input.dev_attr.attr,
+ &sensor_dev_attr_in2_input.dev_attr.attr,
+ &sensor_dev_attr_in3_input.dev_attr.attr,
+ &sensor_dev_attr_in4_input.dev_attr.attr,
+ &sensor_dev_attr_in5_input.dev_attr.attr,
+ &sensor_dev_attr_in6_input.dev_attr.attr,
+ &sensor_dev_attr_in7_input.dev_attr.attr,
+ NULL
+};
+
+static const struct attribute_group ads7871_group = {
+ .attrs = ads7871_attributes,
+};
+
+static int __devinit ads7871_probe(struct spi_device *spi)
+{
+ int status, ret, err = 0;
+ uint8_t val;
+ struct ads7871_data *pdata;
+
+ dev_dbg(&spi->dev, "probe\n");
+
+ pdata = kzalloc(sizeof(struct ads7871_data), GFP_KERNEL);
+ if (!pdata) {
+ err = -ENOMEM;
+ goto exit;
+ }
+
+ status = sysfs_create_group(&spi->dev.kobj, &ads7871_group);
+ if (status < 0)
+ goto error_free;
+
+ pdata->hwmon_dev = hwmon_device_register(&spi->dev);
+ if (IS_ERR(pdata->hwmon_dev)) {
+ err = PTR_ERR(pdata->hwmon_dev);
+ goto error_remove;
+ }
+
+ spi_set_drvdata(spi, pdata);
+
+ /* Configure the SPI bus */
+ spi->mode = (SPI_MODE_0);
+ spi->bits_per_word = 8;
+ spi_setup(spi);
+
+ ads7871_write_reg8(spi, REG_SER_CONTROL, 0);
+ ads7871_write_reg8(spi, REG_AD_CONTROL, 0);
+
+ val = (OSC_OSCR_bm | OSC_OSCE_bm | OSC_REFE_bm | OSC_BUFE_bm);
+ ads7871_write_reg8(spi, REG_OSC_CONTROL, val);
+ ret = ads7871_read_reg8(spi, REG_OSC_CONTROL);
+
+ dev_dbg(&spi->dev, "REG_OSC_CONTROL write:%x, read:%x\n", val, ret);
+ /*because there is no other error checking on an SPI bus
+ we need to make sure we really have a chip*/
+ if (val != ret) {
+ err = -ENODEV;
+ goto error_remove;
+ }
+
+ return 0;
+
+error_remove:
+ sysfs_remove_group(&spi->dev.kobj, &ads7871_group);
+error_free:
+ kfree(pdata);
+exit:
+ return err;
+}
+
+static int __devexit ads7871_remove(struct spi_device *spi)
+{
+ struct ads7871_data *pdata = spi_get_drvdata(spi);
+
+ hwmon_device_unregister(pdata->hwmon_dev);
+ sysfs_remove_group(&spi->dev.kobj, &ads7871_group);
+ kfree(pdata);
+ return 0;
+}
+
+static struct spi_driver ads7871_driver = {
+ .driver = {
+ .name = DEVICE_NAME,
+ .bus = &spi_bus_type,
+ .owner = THIS_MODULE,
+ },
+
+ .probe = ads7871_probe,
+ .remove = __devexit_p(ads7871_remove),
+};
+
+static int __init ads7871_init(void)
+{
+ return spi_register_driver(&ads7871_driver);
+}
+
+static void __exit ads7871_exit(void)
+{
+ spi_unregister_driver(&ads7871_driver);
+}
+
+module_init(ads7871_init);
+module_exit(ads7871_exit);
+
+MODULE_AUTHOR("Paul Thomas <pthomas8589@gmail.com>");
+MODULE_DESCRIPTION("TI ADS7871 A/D driver");
+MODULE_LICENSE("GPL");
return tjmax;
}
+static int __devinit get_tjmax(struct cpuinfo_x86 *c, u32 id,
+ struct device *dev)
+{
+ /* The 100C is default for both mobile and non mobile CPUs */
+ int err;
+ u32 eax, edx;
+ u32 val;
+
+ /* A new feature of current Intel(R) processors, the
+ IA32_TEMPERATURE_TARGET contains the TjMax value */
+ err = rdmsr_safe_on_cpu(id, MSR_IA32_TEMPERATURE_TARGET, &eax, &edx);
+ if (err) {
+ dev_warn(dev, "Unable to read TjMax from CPU.\n");
+ } else {
+ val = (eax >> 16) & 0xff;
+ /*
+ * If the TjMax is not plausible, an assumption
+ * will be used
+ */
+ if ((val > 80) && (val < 120)) {
+ dev_info(dev, "TjMax is %d C.\n", val);
+ return val * 1000;
+ }
+ }
+
+ /*
+ * An assumption is made for early CPUs and unreadable MSR.
+ * NOTE: the given value may not be correct.
+ */
+
+ switch (c->x86_model) {
+ case 0xe:
+ case 0xf:
+ case 0x16:
+ case 0x1a:
+ dev_warn(dev, "TjMax is assumed as 100 C!\n");
+ return 100000;
+ break;
+ case 0x17:
+ case 0x1c: /* Atom CPUs */
+ return adjust_tjmax(c, id, dev);
+ break;
+ default:
+ dev_warn(dev, "CPU (model=0x%x) is not supported yet,"
+ " using default TjMax of 100C.\n", c->x86_model);
+ return 100000;
+ }
+}
+
static int __devinit coretemp_probe(struct platform_device *pdev)
{
struct coretemp_data *data;
}
}
- data->tjmax = adjust_tjmax(c, data->id, &pdev->dev);
+ data->tjmax = get_tjmax(c, data->id, &pdev->dev);
platform_set_drvdata(pdev, data);
- /* read the still undocumented IA32_TEMPERATURE_TARGET it exists
- on older CPUs but not in this register, Atoms don't have it either */
+ /*
+ * read the still undocumented IA32_TEMPERATURE_TARGET. It exists
+ * on older CPUs but not in this register,
+ * Atoms don't have it either.
+ */
if ((c->x86_model > 0xe) && (c->x86_model != 0x1c)) {
- err = rdmsr_safe_on_cpu(data->id, 0x1a2, &eax, &edx);
+ err = rdmsr_safe_on_cpu(data->id, MSR_IA32_TEMPERATURE_TARGET,
+ &eax, &edx);
if (err) {
dev_warn(&pdev->dev, "Unable to read"
" IA32_TEMPERATURE_TARGET MSR\n");
for_each_online_cpu(i) {
struct cpuinfo_x86 *c = &cpu_data(i);
+ /*
+ * CPUID.06H.EAX[0] indicates whether the CPU has thermal
+ * sensors. We check this bit only, all the early CPUs
+ * without thermal sensors will be filtered out.
+ */
+ if (c->cpuid_level >= 6 && (cpuid_eax(0x06) & 0x01)) {
+ err = coretemp_device_add(i);
+ if (err)
+ goto exit_devices_unreg;
- /* check if family 6, models 0xe (Pentium M DC),
- 0xf (Core 2 DC 65nm), 0x16 (Core 2 SC 65nm),
- 0x17 (Penryn 45nm), 0x1a (Nehalem), 0x1c (Atom),
- 0x1e (Lynnfield) */
- if ((c->cpuid_level < 0) || (c->x86 != 0x6) ||
- !((c->x86_model == 0xe) || (c->x86_model == 0xf) ||
- (c->x86_model == 0x16) || (c->x86_model == 0x17) ||
- (c->x86_model == 0x1a) || (c->x86_model == 0x1c) ||
- (c->x86_model == 0x1e))) {
-
- /* supported CPU not found, but report the unknown
- family 6 CPU */
- if ((c->x86 == 0x6) && (c->x86_model > 0xf))
- printk(KERN_WARNING DRVNAME ": Unknown CPU "
- "model 0x%x\n", c->x86_model);
- continue;
+ } else {
+ printk(KERN_INFO DRVNAME ": CPU (model=0x%x)"
+ " has no thermal sensor.\n", c->x86_model);
}
-
- err = coretemp_device_add(i);
- if (err)
- goto exit_devices_unreg;
}
if (list_empty(&pdev_list)) {
err = -ENODEV;
/* joystick device poll interval in milliseconds */
#define MDPS_POLL_INTERVAL 50
+#define MDPS_POLL_MIN 0
+#define MDPS_POLL_MAX 2000
/*
* The sensor can also generate interrupts (DRDY) but it's pretty pointless
* because they are generated even if the data do not change. So it's better
int position[3];
int i;
- mutex_lock(&lis3->mutex);
position[0] = lis3->read_data(lis3, OUTX);
position[1] = lis3->read_data(lis3, OUTY);
position[2] = lis3->read_data(lis3, OUTZ);
- mutex_unlock(&lis3->mutex);
for (i = 0; i < 3; i++)
position[i] = (position[i] * lis3->scale) / LIS3_ACCURACY;
EXPORT_SYMBOL_GPL(lis3lv02d_poweron);
+static void lis3lv02d_joystick_poll(struct input_polled_dev *pidev)
+{
+ int x, y, z;
+
+ mutex_lock(&lis3_dev.mutex);
+ lis3lv02d_get_xyz(&lis3_dev, &x, &y, &z);
+ input_report_abs(pidev->input, ABS_X, x);
+ input_report_abs(pidev->input, ABS_Y, y);
+ input_report_abs(pidev->input, ABS_Z, z);
+ input_sync(pidev->input);
+ mutex_unlock(&lis3_dev.mutex);
+}
+
static irqreturn_t lis302dl_interrupt(int irq, void *dummy)
{
+ if (!test_bit(0, &lis3_dev.misc_opened))
+ goto out;
+
/*
* Be careful: on some HP laptops the bios force DD when on battery and
* the lid is closed. This leads to interrupts as soon as a little move
wake_up_interruptible(&lis3_dev.misc_wait);
kill_fasync(&lis3_dev.async_queue, SIGIO, POLL_IN);
+out:
+ if (lis3_dev.whoami == WAI_8B && lis3_dev.idev &&
+ lis3_dev.idev->input->users)
+ return IRQ_WAKE_THREAD;
return IRQ_HANDLED;
}
-static int lis3lv02d_misc_open(struct inode *inode, struct file *file)
+static void lis302dl_interrupt_handle_click(struct lis3lv02d *lis3)
{
- int ret;
+ struct input_dev *dev = lis3->idev->input;
+ u8 click_src;
- if (test_and_set_bit(0, &lis3_dev.misc_opened))
- return -EBUSY; /* already open */
+ mutex_lock(&lis3->mutex);
+ lis3->read(lis3, CLICK_SRC, &click_src);
- atomic_set(&lis3_dev.count, 0);
+ if (click_src & CLICK_SINGLE_X) {
+ input_report_key(dev, lis3->mapped_btns[0], 1);
+ input_report_key(dev, lis3->mapped_btns[0], 0);
+ }
- /*
- * The sensor can generate interrupts for free-fall and direction
- * detection (distinguishable with FF_WU_SRC and DD_SRC) but to keep
- * the things simple and _fast_ we activate it only for free-fall, so
- * no need to read register (very slow with ACPI). For the same reason,
- * we forbid shared interrupts.
- *
- * IRQF_TRIGGER_RISING seems pointless on HP laptops because the
- * io-apic is not configurable (and generates a warning) but I keep it
- * in case of support for other hardware.
- */
- ret = request_irq(lis3_dev.irq, lis302dl_interrupt, IRQF_TRIGGER_RISING,
- DRIVER_NAME, &lis3_dev);
+ if (click_src & CLICK_SINGLE_Y) {
+ input_report_key(dev, lis3->mapped_btns[1], 1);
+ input_report_key(dev, lis3->mapped_btns[1], 0);
+ }
- if (ret) {
- clear_bit(0, &lis3_dev.misc_opened);
- printk(KERN_ERR DRIVER_NAME ": IRQ%d allocation failed\n", lis3_dev.irq);
- return -EBUSY;
+ if (click_src & CLICK_SINGLE_Z) {
+ input_report_key(dev, lis3->mapped_btns[2], 1);
+ input_report_key(dev, lis3->mapped_btns[2], 0);
}
+ input_sync(dev);
+ mutex_unlock(&lis3->mutex);
+}
+
+static void lis302dl_interrupt_handle_ff_wu(struct lis3lv02d *lis3)
+{
+ u8 wu1_src;
+ u8 wu2_src;
+
+ lis3->read(lis3, FF_WU_SRC_1, &wu1_src);
+ lis3->read(lis3, FF_WU_SRC_2, &wu2_src);
+
+ wu1_src = wu1_src & FF_WU_SRC_IA ? wu1_src : 0;
+ wu2_src = wu2_src & FF_WU_SRC_IA ? wu2_src : 0;
+
+ /* joystick poll is internally protected by the lis3->mutex. */
+ if (wu1_src || wu2_src)
+ lis3lv02d_joystick_poll(lis3_dev.idev);
+}
+
+static irqreturn_t lis302dl_interrupt_thread1_8b(int irq, void *data)
+{
+
+ struct lis3lv02d *lis3 = data;
+
+ if ((lis3->pdata->irq_cfg & LIS3_IRQ1_MASK) == LIS3_IRQ1_CLICK)
+ lis302dl_interrupt_handle_click(lis3);
+ else
+ lis302dl_interrupt_handle_ff_wu(lis3);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t lis302dl_interrupt_thread2_8b(int irq, void *data)
+{
+
+ struct lis3lv02d *lis3 = data;
+
+ if ((lis3->pdata->irq_cfg & LIS3_IRQ2_MASK) == LIS3_IRQ2_CLICK)
+ lis302dl_interrupt_handle_click(lis3);
+ else
+ lis302dl_interrupt_handle_ff_wu(lis3);
+
+ return IRQ_HANDLED;
+}
+
+static int lis3lv02d_misc_open(struct inode *inode, struct file *file)
+{
+ if (test_and_set_bit(0, &lis3_dev.misc_opened))
+ return -EBUSY; /* already open */
+
+ atomic_set(&lis3_dev.count, 0);
return 0;
}
static int lis3lv02d_misc_release(struct inode *inode, struct file *file)
{
fasync_helper(-1, file, 0, &lis3_dev.async_queue);
- free_irq(lis3_dev.irq, &lis3_dev);
clear_bit(0, &lis3_dev.misc_opened); /* release the device */
return 0;
}
.fops = &lis3lv02d_misc_fops,
};
-static void lis3lv02d_joystick_poll(struct input_polled_dev *pidev)
-{
- int x, y, z;
-
- lis3lv02d_get_xyz(&lis3_dev, &x, &y, &z);
- input_report_abs(pidev->input, ABS_X, x);
- input_report_abs(pidev->input, ABS_Y, y);
- input_report_abs(pidev->input, ABS_Z, z);
- input_sync(pidev->input);
-}
-
int lis3lv02d_joystick_enable(void)
{
struct input_dev *input_dev;
int err;
int max_val, fuzz, flat;
+ int btns[] = {BTN_X, BTN_Y, BTN_Z};
if (lis3_dev.idev)
return -EINVAL;
lis3_dev.idev->poll = lis3lv02d_joystick_poll;
lis3_dev.idev->poll_interval = MDPS_POLL_INTERVAL;
+ lis3_dev.idev->poll_interval_min = MDPS_POLL_MIN;
+ lis3_dev.idev->poll_interval_max = MDPS_POLL_MAX;
input_dev = lis3_dev.idev->input;
input_dev->name = "ST LIS3LV02DL Accelerometer";
input_set_abs_params(input_dev, ABS_Y, -max_val, max_val, fuzz, flat);
input_set_abs_params(input_dev, ABS_Z, -max_val, max_val, fuzz, flat);
+ lis3_dev.mapped_btns[0] = lis3lv02d_get_axis(abs(lis3_dev.ac.x), btns);
+ lis3_dev.mapped_btns[1] = lis3lv02d_get_axis(abs(lis3_dev.ac.y), btns);
+ lis3_dev.mapped_btns[2] = lis3lv02d_get_axis(abs(lis3_dev.ac.z), btns);
+
err = input_register_polled_device(lis3_dev.idev);
if (err) {
input_free_polled_device(lis3_dev.idev);
void lis3lv02d_joystick_disable(void)
{
+ if (lis3_dev.irq)
+ free_irq(lis3_dev.irq, &lis3_dev);
+ if (lis3_dev.pdata && lis3_dev.pdata->irq2)
+ free_irq(lis3_dev.pdata->irq2, &lis3_dev);
+
if (!lis3_dev.idev)
return;
{
int x, y, z;
+ mutex_lock(&lis3_dev.mutex);
lis3lv02d_get_xyz(&lis3_dev, &x, &y, &z);
+ mutex_unlock(&lis3_dev.mutex);
return sprintf(buf, "(%d,%d,%d)\n", x, y, z);
}
}
EXPORT_SYMBOL_GPL(lis3lv02d_remove_fs);
+static void lis3lv02d_8b_configure(struct lis3lv02d *dev,
+ struct lis3lv02d_platform_data *p)
+{
+ int err;
+ int ctrl2 = p->hipass_ctrl;
+
+ if (p->click_flags) {
+ dev->write(dev, CLICK_CFG, p->click_flags);
+ dev->write(dev, CLICK_TIMELIMIT, p->click_time_limit);
+ dev->write(dev, CLICK_LATENCY, p->click_latency);
+ dev->write(dev, CLICK_WINDOW, p->click_window);
+ dev->write(dev, CLICK_THSZ, p->click_thresh_z & 0xf);
+ dev->write(dev, CLICK_THSY_X,
+ (p->click_thresh_x & 0xf) |
+ (p->click_thresh_y << 4));
+
+ if (dev->idev) {
+ struct input_dev *input_dev = lis3_dev.idev->input;
+ input_set_capability(input_dev, EV_KEY, BTN_X);
+ input_set_capability(input_dev, EV_KEY, BTN_Y);
+ input_set_capability(input_dev, EV_KEY, BTN_Z);
+ }
+ }
+
+ if (p->wakeup_flags) {
+ dev->write(dev, FF_WU_CFG_1, p->wakeup_flags);
+ dev->write(dev, FF_WU_THS_1, p->wakeup_thresh & 0x7f);
+ /* default to 2.5ms for now */
+ dev->write(dev, FF_WU_DURATION_1, 1);
+ ctrl2 ^= HP_FF_WU1; /* Xor to keep compatible with old pdata*/
+ }
+
+ if (p->wakeup_flags2) {
+ dev->write(dev, FF_WU_CFG_2, p->wakeup_flags2);
+ dev->write(dev, FF_WU_THS_2, p->wakeup_thresh2 & 0x7f);
+ /* default to 2.5ms for now */
+ dev->write(dev, FF_WU_DURATION_2, 1);
+ ctrl2 ^= HP_FF_WU2; /* Xor to keep compatible with old pdata*/
+ }
+ /* Configure hipass filters */
+ dev->write(dev, CTRL_REG2, ctrl2);
+
+ if (p->irq2) {
+ err = request_threaded_irq(p->irq2,
+ NULL,
+ lis302dl_interrupt_thread2_8b,
+ IRQF_TRIGGER_RISING |
+ IRQF_ONESHOT,
+ DRIVER_NAME, &lis3_dev);
+ if (err < 0)
+ printk(KERN_ERR DRIVER_NAME
+ "No second IRQ. Limited functionality\n");
+ }
+}
+
/*
* Initialise the accelerometer and the various subsystems.
* Should be rather independent of the bus system.
*/
int lis3lv02d_init_device(struct lis3lv02d *dev)
{
+ int err;
+ irq_handler_t thread_fn;
+
dev->whoami = lis3lv02d_read_8(dev, WHO_AM_I);
switch (dev->whoami) {
if (dev->pdata) {
struct lis3lv02d_platform_data *p = dev->pdata;
- if (p->click_flags && (dev->whoami == WAI_8B)) {
- dev->write(dev, CLICK_CFG, p->click_flags);
- dev->write(dev, CLICK_TIMELIMIT, p->click_time_limit);
- dev->write(dev, CLICK_LATENCY, p->click_latency);
- dev->write(dev, CLICK_WINDOW, p->click_window);
- dev->write(dev, CLICK_THSZ, p->click_thresh_z & 0xf);
- dev->write(dev, CLICK_THSY_X,
- (p->click_thresh_x & 0xf) |
- (p->click_thresh_y << 4));
- }
-
- if (p->wakeup_flags && (dev->whoami == WAI_8B)) {
- dev->write(dev, FF_WU_CFG_1, p->wakeup_flags);
- dev->write(dev, FF_WU_THS_1, p->wakeup_thresh & 0x7f);
- /* default to 2.5ms for now */
- dev->write(dev, FF_WU_DURATION_1, 1);
- /* enable high pass filter for both free-fall units */
- dev->write(dev, CTRL_REG2, HP_FF_WU1 | HP_FF_WU2);
- }
+ if (dev->whoami == WAI_8B)
+ lis3lv02d_8b_configure(dev, p);
if (p->irq_cfg)
dev->write(dev, CTRL_REG3, p->irq_cfg);
goto out;
}
+ /*
+ * The sensor can generate interrupts for free-fall and direction
+ * detection (distinguishable with FF_WU_SRC and DD_SRC) but to keep
+ * the things simple and _fast_ we activate it only for free-fall, so
+ * no need to read register (very slow with ACPI). For the same reason,
+ * we forbid shared interrupts.
+ *
+ * IRQF_TRIGGER_RISING seems pointless on HP laptops because the
+ * io-apic is not configurable (and generates a warning) but I keep it
+ * in case of support for other hardware.
+ */
+ if (dev->whoami == WAI_8B)
+ thread_fn = lis302dl_interrupt_thread1_8b;
+ else
+ thread_fn = NULL;
+
+ err = request_threaded_irq(dev->irq, lis302dl_interrupt,
+ thread_fn,
+ IRQF_TRIGGER_RISING | IRQF_ONESHOT,
+ DRIVER_NAME, &lis3_dev);
+
+ if (err < 0) {
+ printk(KERN_ERR DRIVER_NAME "Cannot get IRQ\n");
+ goto out;
+ }
+
if (misc_register(&lis3lv02d_misc_device))
printk(KERN_ERR DRIVER_NAME ": misc_register failed\n");
out:
DD_SRC_IA = 0x40,
};
+enum lis3lv02d_click_src_8b {
+ CLICK_SINGLE_X = 0x01,
+ CLICK_DOUBLE_X = 0x02,
+ CLICK_SINGLE_Y = 0x04,
+ CLICK_DOUBLE_Y = 0x08,
+ CLICK_SINGLE_Z = 0x10,
+ CLICK_DOUBLE_Z = 0x20,
+ CLICK_IA = 0x40,
+};
+
struct axis_conversion {
s8 x;
s8 y;
struct platform_device *pdev; /* platform device */
atomic_t count; /* interrupt count after last read */
struct axis_conversion ac; /* hw -> logical axis */
+ int mapped_btns[3];
u32 irq; /* IRQ number */
struct fasync_struct *async_queue; /* queue for the misc device */
msgname, paramname);
}
-/*
- * convert hex to binary
- */
-static inline u8 hex2bin(char c)
-{
- int result = c & 0x0f;
- if (c & 0x40)
- result += 9;
- return result;
-}
-
/*
* convert an IE from Gigaset hex string to ETSI binary representation
* including length byte
while (*in) {
if (!isxdigit(in[0]) || !isxdigit(in[1]) || l >= maxlen)
return -1;
- out[++l] = (hex2bin(in[0]) << 4) + hex2bin(in[1]);
+ out[++l] = (hex_to_bin(in[0]) << 4) + hex_to_bin(in[1]);
in += 2;
}
out[0] = l;
if MISC_DEVICES
config AD525X_DPOT
- tristate "Analog Devices AD525x Digital Potentiometers"
- depends on I2C && SYSFS
+ tristate "Analog Devices Digital Potentiometers"
+ depends on (I2C || SPI) && SYSFS
help
If you say yes here, you get support for the Analog Devices
- AD5258, AD5259, AD5251, AD5252, AD5253, AD5254 and AD5255
+ AD5258, AD5259, AD5251, AD5252, AD5253, AD5254, AD5255
+ AD5160, AD5161, AD5162, AD5165, AD5200, AD5201, AD5203,
+ AD5204, AD5206, AD5207, AD5231, AD5232, AD5233, AD5235,
+ AD5260, AD5262, AD5263, AD5290, AD5291, AD5292, AD5293,
+ AD7376, AD8400, AD8402, AD8403, ADN2850, AD5241, AD5242,
+ AD5243, AD5245, AD5246, AD5247, AD5248, AD5280, AD5282,
+ ADN2860, AD5273, AD5171, AD5170, AD5172, AD5173
digital potentiometer chips.
See Documentation/misc-devices/ad525x_dpot.txt for the
This driver can also be built as a module. If so, the module
will be called ad525x_dpot.
+config AD525X_DPOT_I2C
+ tristate "support I2C bus connection"
+ depends on AD525X_DPOT && I2C
+ help
+ Say Y here if you have a digital potentiometers hooked to an I2C bus.
+
+ To compile this driver as a module, choose M here: the
+ module will be called ad525x_dpot-i2c.
+
+config AD525X_DPOT_SPI
+ tristate "support SPI bus connection"
+ depends on AD525X_DPOT && SPI_MASTER
+ help
+ Say Y here if you have a digital potentiometers hooked to an SPI bus.
+
+ If unsure, say N (but it's safe to say "Y").
+
+ To compile this driver as a module, choose M here: the
+ module will be called ad525x_dpot-spi.
+
config ATMEL_PWM
tristate "Atmel AT32/AT91 PWM support"
depends on AVR32 || ARCH_AT91SAM9263 || ARCH_AT91SAM9RL || ARCH_AT91CAP9
obj-$(CONFIG_IBM_ASM) += ibmasm/
obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o
+obj-$(CONFIG_AD525X_DPOT_I2C) += ad525x_dpot-i2c.o
+obj-$(CONFIG_AD525X_DPOT_SPI) += ad525x_dpot-spi.o
obj-$(CONFIG_ATMEL_PWM) += atmel_pwm.o
obj-$(CONFIG_ATMEL_SSC) += atmel-ssc.o
obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o
--- /dev/null
+/*
+ * Driver for the Analog Devices digital potentiometers (I2C bus)
+ *
+ * Copyright (C) 2010 Michael Hennerich, Analog Devices Inc.
+ *
+ * Licensed under the GPL-2 or later.
+ */
+
+#include <linux/i2c.h>
+#include <linux/module.h>
+
+#include "ad525x_dpot.h"
+
+/* ------------------------------------------------------------------------- */
+/* I2C bus functions */
+static int write_d8(void *client, u8 val)
+{
+ return i2c_smbus_write_byte(client, val);
+}
+
+static int write_r8d8(void *client, u8 reg, u8 val)
+{
+ return i2c_smbus_write_byte_data(client, reg, val);
+}
+
+static int write_r8d16(void *client, u8 reg, u16 val)
+{
+ return i2c_smbus_write_word_data(client, reg, val);
+}
+
+static int read_d8(void *client)
+{
+ return i2c_smbus_read_byte(client);
+}
+
+static int read_r8d8(void *client, u8 reg)
+{
+ return i2c_smbus_read_byte_data(client, reg);
+}
+
+static int read_r8d16(void *client, u8 reg)
+{
+ return i2c_smbus_read_word_data(client, reg);
+}
+
+static const struct ad_dpot_bus_ops bops = {
+ .read_d8 = read_d8,
+ .read_r8d8 = read_r8d8,
+ .read_r8d16 = read_r8d16,
+ .write_d8 = write_d8,
+ .write_r8d8 = write_r8d8,
+ .write_r8d16 = write_r8d16,
+};
+
+static int __devinit ad_dpot_i2c_probe(struct i2c_client *client,
+ const struct i2c_device_id *id)
+{
+ struct ad_dpot_bus_data bdata = {
+ .client = client,
+ .bops = &bops,
+ };
+
+ struct ad_dpot_id dpot_id = {
+ .name = (char *) &id->name,
+ .devid = id->driver_data,
+ };
+
+ if (!i2c_check_functionality(client->adapter,
+ I2C_FUNC_SMBUS_WORD_DATA)) {
+ dev_err(&client->dev, "SMBUS Word Data not Supported\n");
+ return -EIO;
+ }
+
+ return ad_dpot_probe(&client->dev, &bdata, &dpot_id);
+}
+
+static int __devexit ad_dpot_i2c_remove(struct i2c_client *client)
+{
+ return ad_dpot_remove(&client->dev);
+}
+
+static const struct i2c_device_id ad_dpot_id[] = {
+ {"ad5258", AD5258_ID},
+ {"ad5259", AD5259_ID},
+ {"ad5251", AD5251_ID},
+ {"ad5252", AD5252_ID},
+ {"ad5253", AD5253_ID},
+ {"ad5254", AD5254_ID},
+ {"ad5255", AD5255_ID},
+ {"ad5241", AD5241_ID},
+ {"ad5242", AD5242_ID},
+ {"ad5243", AD5243_ID},
+ {"ad5245", AD5245_ID},
+ {"ad5246", AD5246_ID},
+ {"ad5247", AD5247_ID},
+ {"ad5248", AD5248_ID},
+ {"ad5280", AD5280_ID},
+ {"ad5282", AD5282_ID},
+ {"adn2860", ADN2860_ID},
+ {"ad5273", AD5273_ID},
+ {"ad5171", AD5171_ID},
+ {"ad5170", AD5170_ID},
+ {"ad5172", AD5172_ID},
+ {"ad5173", AD5173_ID},
+ {}
+};
+MODULE_DEVICE_TABLE(i2c, ad_dpot_id);
+
+static struct i2c_driver ad_dpot_i2c_driver = {
+ .driver = {
+ .name = "ad_dpot",
+ .owner = THIS_MODULE,
+ },
+ .probe = ad_dpot_i2c_probe,
+ .remove = __devexit_p(ad_dpot_i2c_remove),
+ .id_table = ad_dpot_id,
+};
+
+static int __init ad_dpot_i2c_init(void)
+{
+ return i2c_add_driver(&ad_dpot_i2c_driver);
+}
+module_init(ad_dpot_i2c_init);
+
+static void __exit ad_dpot_i2c_exit(void)
+{
+ i2c_del_driver(&ad_dpot_i2c_driver);
+}
+module_exit(ad_dpot_i2c_exit);
+
+MODULE_AUTHOR("Michael Hennerich <hennerich@blackfin.uclinux.org>");
+MODULE_DESCRIPTION("digital potentiometer I2C bus driver");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("i2c:ad_dpot");
--- /dev/null
+/*
+ * Driver for the Analog Devices digital potentiometers (SPI bus)
+ *
+ * Copyright (C) 2010 Michael Hennerich, Analog Devices Inc.
+ *
+ * Licensed under the GPL-2 or later.
+ */
+
+#include <linux/spi/spi.h>
+#include <linux/module.h>
+
+#include "ad525x_dpot.h"
+
+static const struct ad_dpot_id ad_dpot_spi_devlist[] = {
+ {.name = "ad5160", .devid = AD5160_ID},
+ {.name = "ad5161", .devid = AD5161_ID},
+ {.name = "ad5162", .devid = AD5162_ID},
+ {.name = "ad5165", .devid = AD5165_ID},
+ {.name = "ad5200", .devid = AD5200_ID},
+ {.name = "ad5201", .devid = AD5201_ID},
+ {.name = "ad5203", .devid = AD5203_ID},
+ {.name = "ad5204", .devid = AD5204_ID},
+ {.name = "ad5206", .devid = AD5206_ID},
+ {.name = "ad5207", .devid = AD5207_ID},
+ {.name = "ad5231", .devid = AD5231_ID},
+ {.name = "ad5232", .devid = AD5232_ID},
+ {.name = "ad5233", .devid = AD5233_ID},
+ {.name = "ad5235", .devid = AD5235_ID},
+ {.name = "ad5260", .devid = AD5260_ID},
+ {.name = "ad5262", .devid = AD5262_ID},
+ {.name = "ad5263", .devid = AD5263_ID},
+ {.name = "ad5290", .devid = AD5290_ID},
+ {.name = "ad5291", .devid = AD5291_ID},
+ {.name = "ad5292", .devid = AD5292_ID},
+ {.name = "ad5293", .devid = AD5293_ID},
+ {.name = "ad7376", .devid = AD7376_ID},
+ {.name = "ad8400", .devid = AD8400_ID},
+ {.name = "ad8402", .devid = AD8402_ID},
+ {.name = "ad8403", .devid = AD8403_ID},
+ {.name = "adn2850", .devid = ADN2850_ID},
+ {}
+};
+
+/* ------------------------------------------------------------------------- */
+
+/* SPI bus functions */
+static int write8(void *client, u8 val)
+{
+ u8 data = val;
+ return spi_write(client, &data, 1);
+}
+
+static int write16(void *client, u8 reg, u8 val)
+{
+ u8 data[2] = {reg, val};
+ return spi_write(client, data, 1);
+}
+
+static int write24(void *client, u8 reg, u16 val)
+{
+ u8 data[3] = {reg, val >> 8, val};
+ return spi_write(client, data, 1);
+}
+
+static int read8(void *client)
+{
+ int ret;
+ u8 data;
+ ret = spi_read(client, &data, 1);
+ if (ret < 0)
+ return ret;
+
+ return data;
+}
+
+static int read16(void *client, u8 reg)
+{
+ int ret;
+ u8 buf_rx[2];
+
+ write16(client, reg, 0);
+ ret = spi_read(client, buf_rx, 2);
+ if (ret < 0)
+ return ret;
+
+ return (buf_rx[0] << 8) | buf_rx[1];
+}
+
+static int read24(void *client, u8 reg)
+{
+ int ret;
+ u8 buf_rx[3];
+
+ write24(client, reg, 0);
+ ret = spi_read(client, buf_rx, 3);
+ if (ret < 0)
+ return ret;
+
+ return (buf_rx[1] << 8) | buf_rx[2];
+}
+
+static const struct ad_dpot_bus_ops bops = {
+ .read_d8 = read8,
+ .read_r8d8 = read16,
+ .read_r8d16 = read24,
+ .write_d8 = write8,
+ .write_r8d8 = write16,
+ .write_r8d16 = write24,
+};
+
+static const struct ad_dpot_id *dpot_match_id(const struct ad_dpot_id *id,
+ char *name)
+{
+ while (id->name && id->name[0]) {
+ if (strcmp(name, id->name) == 0)
+ return id;
+ id++;
+ }
+ return NULL;
+}
+
+static int __devinit ad_dpot_spi_probe(struct spi_device *spi)
+{
+ char *name = spi->dev.platform_data;
+ const struct ad_dpot_id *dpot_id;
+
+ struct ad_dpot_bus_data bdata = {
+ .client = spi,
+ .bops = &bops,
+ };
+
+ dpot_id = dpot_match_id(ad_dpot_spi_devlist, name);
+
+ if (dpot_id == NULL) {
+ dev_err(&spi->dev, "%s not in supported device list", name);
+ return -ENODEV;
+ }
+
+ return ad_dpot_probe(&spi->dev, &bdata, dpot_id);
+}
+
+static int __devexit ad_dpot_spi_remove(struct spi_device *spi)
+{
+ return ad_dpot_remove(&spi->dev);
+}
+
+static struct spi_driver ad_dpot_spi_driver = {
+ .driver = {
+ .name = "ad_dpot",
+ .bus = &spi_bus_type,
+ .owner = THIS_MODULE,
+ },
+ .probe = ad_dpot_spi_probe,
+ .remove = __devexit_p(ad_dpot_spi_remove),
+};
+
+static int __init ad_dpot_spi_init(void)
+{
+ return spi_register_driver(&ad_dpot_spi_driver);
+}
+module_init(ad_dpot_spi_init);
+
+static void __exit ad_dpot_spi_exit(void)
+{
+ spi_unregister_driver(&ad_dpot_spi_driver);
+}
+module_exit(ad_dpot_spi_exit);
+
+MODULE_AUTHOR("Michael Hennerich <hennerich@blackfin.uclinux.org>");
+MODULE_DESCRIPTION("digital potentiometer SPI bus driver");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("spi:ad_dpot");
/*
- * ad525x_dpot: Driver for the Analog Devices AD525x digital potentiometers
- * Copyright (c) 2009 Analog Devices, Inc.
+ * ad525x_dpot: Driver for the Analog Devices digital potentiometers
+ * Copyright (c) 2009-2010 Analog Devices, Inc.
* Author: Michael Hennerich <hennerich@blackfin.uclinux.org>
*
* DEVID #Wipers #Positions Resistor Options (kOhm)
* AD5255 3 512 25, 250
* AD5253 4 64 1, 10, 50, 100
* AD5254 4 256 1, 10, 50, 100
+ * AD5160 1 256 5, 10, 50, 100
+ * AD5161 1 256 5, 10, 50, 100
+ * AD5162 2 256 2.5, 10, 50, 100
+ * AD5165 1 256 100
+ * AD5200 1 256 10, 50
+ * AD5201 1 33 10, 50
+ * AD5203 4 64 10, 100
+ * AD5204 4 256 10, 50, 100
+ * AD5206 6 256 10, 50, 100
+ * AD5207 2 256 10, 50, 100
+ * AD5231 1 1024 10, 50, 100
+ * AD5232 2 256 10, 50, 100
+ * AD5233 4 64 10, 50, 100
+ * AD5235 2 1024 25, 250
+ * AD5260 1 256 20, 50, 200
+ * AD5262 2 256 20, 50, 200
+ * AD5263 4 256 20, 50, 200
+ * AD5290 1 256 10, 50, 100
+ * AD5291 1 256 20
+ * AD5292 1 1024 20
+ * AD5293 1 1024 20
+ * AD7376 1 128 10, 50, 100, 1M
+ * AD8400 1 256 1, 10, 50, 100
+ * AD8402 2 256 1, 10, 50, 100
+ * AD8403 4 256 1, 10, 50, 100
+ * ADN2850 3 512 25, 250
+ * AD5241 1 256 10, 100, 1M
+ * AD5246 1 128 5, 10, 50, 100
+ * AD5247 1 128 5, 10, 50, 100
+ * AD5245 1 256 5, 10, 50, 100
+ * AD5243 2 256 2.5, 10, 50, 100
+ * AD5248 2 256 2.5, 10, 50, 100
+ * AD5242 2 256 20, 50, 200
+ * AD5280 1 256 20, 50, 200
+ * AD5282 2 256 20, 50, 200
+ * ADN2860 3 512 25, 250
+ * AD5273 1 64 1, 10, 50, 100 (OTP)
+ * AD5171 1 64 5, 10, 50, 100 (OTP)
+ * AD5170 1 256 2.5, 10, 50, 100 (OTP)
+ * AD5172 2 256 2.5, 10, 50, 100 (OTP)
+ * AD5173 2 256 2.5, 10, 50, 100 (OTP)
*
* See Documentation/misc-devices/ad525x_dpot.txt for more info.
*
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/i2c.h>
#include <linux/delay.h>
+#include <linux/slab.h>
-#define DRIVER_NAME "ad525x_dpot"
-#define DRIVER_VERSION "0.1"
-
-enum dpot_devid {
- AD5258_ID,
- AD5259_ID,
- AD5251_ID,
- AD5252_ID,
- AD5253_ID,
- AD5254_ID,
- AD5255_ID,
-};
+#define DRIVER_VERSION "0.2"
-#define AD5258_MAX_POSITION 64
-#define AD5259_MAX_POSITION 256
-#define AD5251_MAX_POSITION 64
-#define AD5252_MAX_POSITION 256
-#define AD5253_MAX_POSITION 64
-#define AD5254_MAX_POSITION 256
-#define AD5255_MAX_POSITION 512
-
-#define AD525X_RDAC0 0
-#define AD525X_RDAC1 1
-#define AD525X_RDAC2 2
-#define AD525X_RDAC3 3
-
-#define AD525X_REG_TOL 0x18
-#define AD525X_TOL_RDAC0 (AD525X_REG_TOL | AD525X_RDAC0)
-#define AD525X_TOL_RDAC1 (AD525X_REG_TOL | AD525X_RDAC1)
-#define AD525X_TOL_RDAC2 (AD525X_REG_TOL | AD525X_RDAC2)
-#define AD525X_TOL_RDAC3 (AD525X_REG_TOL | AD525X_RDAC3)
-
-/* RDAC-to-EEPROM Interface Commands */
-#define AD525X_I2C_RDAC (0x00 << 5)
-#define AD525X_I2C_EEPROM (0x01 << 5)
-#define AD525X_I2C_CMD (0x80)
-
-#define AD525X_DEC_ALL_6DB (AD525X_I2C_CMD | (0x4 << 3))
-#define AD525X_INC_ALL_6DB (AD525X_I2C_CMD | (0x9 << 3))
-#define AD525X_DEC_ALL (AD525X_I2C_CMD | (0x6 << 3))
-#define AD525X_INC_ALL (AD525X_I2C_CMD | (0xB << 3))
-
-static s32 ad525x_read(struct i2c_client *client, u8 reg);
-static s32 ad525x_write(struct i2c_client *client, u8 reg, u8 value);
+#include "ad525x_dpot.h"
/*
* Client data (each client gets its own)
*/
struct dpot_data {
+ struct ad_dpot_bus_data bdata;
struct mutex update_lock;
unsigned rdac_mask;
unsigned max_pos;
- unsigned devid;
+ unsigned long devid;
+ unsigned uid;
+ unsigned feat;
+ unsigned wipers;
+ u16 rdac_cache[MAX_RDACS];
+ DECLARE_BITMAP(otp_en_mask, MAX_RDACS);
};
+static inline int dpot_read_d8(struct dpot_data *dpot)
+{
+ return dpot->bdata.bops->read_d8(dpot->bdata.client);
+}
+
+static inline int dpot_read_r8d8(struct dpot_data *dpot, u8 reg)
+{
+ return dpot->bdata.bops->read_r8d8(dpot->bdata.client, reg);
+}
+
+static inline int dpot_read_r8d16(struct dpot_data *dpot, u8 reg)
+{
+ return dpot->bdata.bops->read_r8d16(dpot->bdata.client, reg);
+}
+
+static inline int dpot_write_d8(struct dpot_data *dpot, u8 val)
+{
+ return dpot->bdata.bops->write_d8(dpot->bdata.client, val);
+}
+
+static inline int dpot_write_r8d8(struct dpot_data *dpot, u8 reg, u16 val)
+{
+ return dpot->bdata.bops->write_r8d8(dpot->bdata.client, reg, val);
+}
+
+static inline int dpot_write_r8d16(struct dpot_data *dpot, u8 reg, u16 val)
+{
+ return dpot->bdata.bops->write_r8d16(dpot->bdata.client, reg, val);
+}
+
+static s32 dpot_read_spi(struct dpot_data *dpot, u8 reg)
+{
+ unsigned ctrl = 0;
+
+ if (!(reg & (DPOT_ADDR_EEPROM | DPOT_ADDR_CMD))) {
+
+ if (dpot->feat & F_RDACS_WONLY)
+ return dpot->rdac_cache[reg & DPOT_RDAC_MASK];
+
+ if (dpot->uid == DPOT_UID(AD5291_ID) ||
+ dpot->uid == DPOT_UID(AD5292_ID) ||
+ dpot->uid == DPOT_UID(AD5293_ID))
+ return dpot_read_r8d8(dpot,
+ DPOT_AD5291_READ_RDAC << 2);
+
+ ctrl = DPOT_SPI_READ_RDAC;
+ } else if (reg & DPOT_ADDR_EEPROM) {
+ ctrl = DPOT_SPI_READ_EEPROM;
+ }
+
+ if (dpot->feat & F_SPI_16BIT)
+ return dpot_read_r8d8(dpot, ctrl);
+ else if (dpot->feat & F_SPI_24BIT)
+ return dpot_read_r8d16(dpot, ctrl);
+
+ return -EFAULT;
+}
+
+static s32 dpot_read_i2c(struct dpot_data *dpot, u8 reg)
+{
+ unsigned ctrl = 0;
+ switch (dpot->uid) {
+ case DPOT_UID(AD5246_ID):
+ case DPOT_UID(AD5247_ID):
+ return dpot_read_d8(dpot);
+ case DPOT_UID(AD5245_ID):
+ case DPOT_UID(AD5241_ID):
+ case DPOT_UID(AD5242_ID):
+ case DPOT_UID(AD5243_ID):
+ case DPOT_UID(AD5248_ID):
+ case DPOT_UID(AD5280_ID):
+ case DPOT_UID(AD5282_ID):
+ ctrl = ((reg & DPOT_RDAC_MASK) == DPOT_RDAC0) ?
+ 0 : DPOT_AD5291_RDAC_AB;
+ return dpot_read_r8d8(dpot, ctrl);
+ case DPOT_UID(AD5170_ID):
+ case DPOT_UID(AD5171_ID):
+ case DPOT_UID(AD5273_ID):
+ return dpot_read_d8(dpot);
+ case DPOT_UID(AD5172_ID):
+ case DPOT_UID(AD5173_ID):
+ ctrl = ((reg & DPOT_RDAC_MASK) == DPOT_RDAC0) ?
+ 0 : DPOT_AD5272_3_A0;
+ return dpot_read_r8d8(dpot, ctrl);
+ default:
+ if ((reg & DPOT_REG_TOL) || (dpot->max_pos > 256))
+ return dpot_read_r8d16(dpot, (reg & 0xF8) |
+ ((reg & 0x7) << 1));
+ else
+ return dpot_read_r8d8(dpot, reg);
+ }
+}
+
+static s32 dpot_read(struct dpot_data *dpot, u8 reg)
+{
+ if (dpot->feat & F_SPI)
+ return dpot_read_spi(dpot, reg);
+ else
+ return dpot_read_i2c(dpot, reg);
+}
+
+static s32 dpot_write_spi(struct dpot_data *dpot, u8 reg, u16 value)
+{
+ unsigned val = 0;
+
+ if (!(reg & (DPOT_ADDR_EEPROM | DPOT_ADDR_CMD))) {
+ if (dpot->feat & F_RDACS_WONLY)
+ dpot->rdac_cache[reg & DPOT_RDAC_MASK] = value;
+
+ if (dpot->feat & F_AD_APPDATA) {
+ if (dpot->feat & F_SPI_8BIT) {
+ val = ((reg & DPOT_RDAC_MASK) <<
+ DPOT_MAX_POS(dpot->devid)) |
+ value;
+ return dpot_write_d8(dpot, val);
+ } else if (dpot->feat & F_SPI_16BIT) {
+ val = ((reg & DPOT_RDAC_MASK) <<
+ DPOT_MAX_POS(dpot->devid)) |
+ value;
+ return dpot_write_r8d8(dpot, val >> 8,
+ val & 0xFF);
+ } else
+ BUG();
+ } else {
+ if (dpot->uid == DPOT_UID(AD5291_ID) ||
+ dpot->uid == DPOT_UID(AD5292_ID) ||
+ dpot->uid == DPOT_UID(AD5293_ID))
+ return dpot_write_r8d8(dpot,
+ (DPOT_AD5291_RDAC << 2) |
+ (value >> 8), value & 0xFF);
+
+ val = DPOT_SPI_RDAC | (reg & DPOT_RDAC_MASK);
+ }
+ } else if (reg & DPOT_ADDR_EEPROM) {
+ val = DPOT_SPI_EEPROM | (reg & DPOT_RDAC_MASK);
+ } else if (reg & DPOT_ADDR_CMD) {
+ switch (reg) {
+ case DPOT_DEC_ALL_6DB:
+ val = DPOT_SPI_DEC_ALL_6DB;
+ break;
+ case DPOT_INC_ALL_6DB:
+ val = DPOT_SPI_INC_ALL_6DB;
+ break;
+ case DPOT_DEC_ALL:
+ val = DPOT_SPI_DEC_ALL;
+ break;
+ case DPOT_INC_ALL:
+ val = DPOT_SPI_INC_ALL;
+ break;
+ }
+ } else
+ BUG();
+
+ if (dpot->feat & F_SPI_16BIT)
+ return dpot_write_r8d8(dpot, val, value);
+ else if (dpot->feat & F_SPI_24BIT)
+ return dpot_write_r8d16(dpot, val, value);
+
+ return -EFAULT;
+}
+
+static s32 dpot_write_i2c(struct dpot_data *dpot, u8 reg, u16 value)
+{
+ /* Only write the instruction byte for certain commands */
+ unsigned tmp = 0, ctrl = 0;
+
+ switch (dpot->uid) {
+ case DPOT_UID(AD5246_ID):
+ case DPOT_UID(AD5247_ID):
+ return dpot_write_d8(dpot, value);
+ break;
+
+ case DPOT_UID(AD5245_ID):
+ case DPOT_UID(AD5241_ID):
+ case DPOT_UID(AD5242_ID):
+ case DPOT_UID(AD5243_ID):
+ case DPOT_UID(AD5248_ID):
+ case DPOT_UID(AD5280_ID):
+ case DPOT_UID(AD5282_ID):
+ ctrl = ((reg & DPOT_RDAC_MASK) == DPOT_RDAC0) ?
+ 0 : DPOT_AD5291_RDAC_AB;
+ return dpot_write_r8d8(dpot, ctrl, value);
+ break;
+ case DPOT_UID(AD5171_ID):
+ case DPOT_UID(AD5273_ID):
+ if (reg & DPOT_ADDR_OTP) {
+ tmp = dpot_read_d8(dpot);
+ if (tmp >> 6) /* Ready to Program? */
+ return -EFAULT;
+ ctrl = DPOT_AD5273_FUSE;
+ }
+ return dpot_write_r8d8(dpot, ctrl, value);
+ break;
+ case DPOT_UID(AD5172_ID):
+ case DPOT_UID(AD5173_ID):
+ ctrl = ((reg & DPOT_RDAC_MASK) == DPOT_RDAC0) ?
+ 0 : DPOT_AD5272_3_A0;
+ if (reg & DPOT_ADDR_OTP) {
+ tmp = dpot_read_r8d16(dpot, ctrl);
+ if (tmp >> 14) /* Ready to Program? */
+ return -EFAULT;
+ ctrl |= DPOT_AD5270_2_3_FUSE;
+ }
+ return dpot_write_r8d8(dpot, ctrl, value);
+ break;
+ case DPOT_UID(AD5170_ID):
+ if (reg & DPOT_ADDR_OTP) {
+ tmp = dpot_read_r8d16(dpot, tmp);
+ if (tmp >> 14) /* Ready to Program? */
+ return -EFAULT;
+ ctrl = DPOT_AD5270_2_3_FUSE;
+ }
+ return dpot_write_r8d8(dpot, ctrl, value);
+ break;
+ default:
+ if (reg & DPOT_ADDR_CMD)
+ return dpot_write_d8(dpot, reg);
+
+ if (dpot->max_pos > 256)
+ return dpot_write_r8d16(dpot, (reg & 0xF8) |
+ ((reg & 0x7) << 1), value);
+ else
+ /* All other registers require instruction + data bytes */
+ return dpot_write_r8d8(dpot, reg, value);
+ }
+}
+
+
+static s32 dpot_write(struct dpot_data *dpot, u8 reg, u16 value)
+{
+ if (dpot->feat & F_SPI)
+ return dpot_write_spi(dpot, reg, value);
+ else
+ return dpot_write_i2c(dpot, reg, value);
+}
+
/* sysfs functions */
static ssize_t sysfs_show_reg(struct device *dev,
- struct device_attribute *attr, char *buf, u32 reg)
+ struct device_attribute *attr,
+ char *buf, u32 reg)
{
- struct i2c_client *client = to_i2c_client(dev);
- struct dpot_data *data = i2c_get_clientdata(client);
+ struct dpot_data *data = dev_get_drvdata(dev);
s32 value;
+ if (reg & DPOT_ADDR_OTP_EN)
+ return sprintf(buf, "%s\n",
+ test_bit(DPOT_RDAC_MASK & reg, data->otp_en_mask) ?
+ "enabled" : "disabled");
+
+
mutex_lock(&data->update_lock);
- value = ad525x_read(client, reg);
+ value = dpot_read(data, reg);
mutex_unlock(&data->update_lock);
if (value < 0)
* datasheet (Rev. A) for more details.
*/
- if (reg & AD525X_REG_TOL)
+ if (reg & DPOT_REG_TOL)
return sprintf(buf, "0x%04x\n", value & 0xFFFF);
else
return sprintf(buf, "%u\n", value & data->rdac_mask);
struct device_attribute *attr,
const char *buf, size_t count, u32 reg)
{
- struct i2c_client *client = to_i2c_client(dev);
- struct dpot_data *data = i2c_get_clientdata(client);
+ struct dpot_data *data = dev_get_drvdata(dev);
unsigned long value;
int err;
+ if (reg & DPOT_ADDR_OTP_EN) {
+ if (!strncmp(buf, "enabled", sizeof("enabled")))
+ set_bit(DPOT_RDAC_MASK & reg, data->otp_en_mask);
+ else
+ clear_bit(DPOT_RDAC_MASK & reg, data->otp_en_mask);
+
+ return count;
+ }
+
+ if ((reg & DPOT_ADDR_OTP) &&
+ !test_bit(DPOT_RDAC_MASK & reg, data->otp_en_mask))
+ return -EPERM;
+
err = strict_strtoul(buf, 10, &value);
if (err)
return err;
value = data->rdac_mask;
mutex_lock(&data->update_lock);
- ad525x_write(client, reg, value);
- if (reg & AD525X_I2C_EEPROM)
+ dpot_write(data, reg, value);
+ if (reg & DPOT_ADDR_EEPROM)
msleep(26); /* Sleep while the EEPROM updates */
+ else if (reg & DPOT_ADDR_OTP)
+ msleep(400); /* Sleep while the OTP updates */
mutex_unlock(&data->update_lock);
return count;
struct device_attribute *attr,
const char *buf, size_t count, u32 reg)
{
- struct i2c_client *client = to_i2c_client(dev);
- struct dpot_data *data = i2c_get_clientdata(client);
+ struct dpot_data *data = dev_get_drvdata(dev);
mutex_lock(&data->update_lock);
- ad525x_write(client, reg, 0);
+ dpot_write(data, reg, 0);
mutex_unlock(&data->update_lock);
return count;
/* ------------------------------------------------------------------------- */
-static ssize_t show_rdac0(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC0);
-}
-
-static ssize_t set_rdac0(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_RDAC | AD525X_RDAC0);
-}
-
-static DEVICE_ATTR(rdac0, S_IWUSR | S_IRUGO, show_rdac0, set_rdac0);
-
-static ssize_t show_eeprom0(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC0);
-}
-
-static ssize_t set_eeprom0(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_EEPROM | AD525X_RDAC0);
-}
-
-static DEVICE_ATTR(eeprom0, S_IWUSR | S_IRUGO, show_eeprom0, set_eeprom0);
-
-static ssize_t show_tolerance0(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf,
- AD525X_I2C_EEPROM | AD525X_TOL_RDAC0);
-}
-
-static DEVICE_ATTR(tolerance0, S_IRUGO, show_tolerance0, NULL);
-
-/* ------------------------------------------------------------------------- */
-
-static ssize_t show_rdac1(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC1);
-}
-
-static ssize_t set_rdac1(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_RDAC | AD525X_RDAC1);
-}
-
-static DEVICE_ATTR(rdac1, S_IWUSR | S_IRUGO, show_rdac1, set_rdac1);
-
-static ssize_t show_eeprom1(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC1);
-}
-
-static ssize_t set_eeprom1(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_EEPROM | AD525X_RDAC1);
-}
-
-static DEVICE_ATTR(eeprom1, S_IWUSR | S_IRUGO, show_eeprom1, set_eeprom1);
-
-static ssize_t show_tolerance1(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf,
- AD525X_I2C_EEPROM | AD525X_TOL_RDAC1);
-}
-
-static DEVICE_ATTR(tolerance1, S_IRUGO, show_tolerance1, NULL);
-
-/* ------------------------------------------------------------------------- */
-
-static ssize_t show_rdac2(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC2);
-}
-
-static ssize_t set_rdac2(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_RDAC | AD525X_RDAC2);
-}
-
-static DEVICE_ATTR(rdac2, S_IWUSR | S_IRUGO, show_rdac2, set_rdac2);
-
-static ssize_t show_eeprom2(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC2);
-}
-
-static ssize_t set_eeprom2(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_EEPROM | AD525X_RDAC2);
-}
-
-static DEVICE_ATTR(eeprom2, S_IWUSR | S_IRUGO, show_eeprom2, set_eeprom2);
-
-static ssize_t show_tolerance2(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf,
- AD525X_I2C_EEPROM | AD525X_TOL_RDAC2);
-}
-
-static DEVICE_ATTR(tolerance2, S_IRUGO, show_tolerance2, NULL);
-
-/* ------------------------------------------------------------------------- */
-
-static ssize_t show_rdac3(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_RDAC | AD525X_RDAC3);
-}
-
-static ssize_t set_rdac3(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_RDAC | AD525X_RDAC3);
-}
-
-static DEVICE_ATTR(rdac3, S_IWUSR | S_IRUGO, show_rdac3, set_rdac3);
-
-static ssize_t show_eeprom3(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf, AD525X_I2C_EEPROM | AD525X_RDAC3);
-}
-
-static ssize_t set_eeprom3(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_set_reg(dev, attr, buf, count,
- AD525X_I2C_EEPROM | AD525X_RDAC3);
-}
+#define DPOT_DEVICE_SHOW(_name, _reg) static ssize_t \
+show_##_name(struct device *dev, \
+ struct device_attribute *attr, char *buf) \
+{ \
+ return sysfs_show_reg(dev, attr, buf, _reg); \
+}
+
+#define DPOT_DEVICE_SET(_name, _reg) static ssize_t \
+set_##_name(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return sysfs_set_reg(dev, attr, buf, count, _reg); \
+}
+
+#define DPOT_DEVICE_SHOW_SET(name, reg) \
+DPOT_DEVICE_SHOW(name, reg) \
+DPOT_DEVICE_SET(name, reg) \
+static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, set_##name);
+
+#define DPOT_DEVICE_SHOW_ONLY(name, reg) \
+DPOT_DEVICE_SHOW(name, reg) \
+static DEVICE_ATTR(name, S_IWUSR | S_IRUGO, show_##name, NULL);
+
+DPOT_DEVICE_SHOW_SET(rdac0, DPOT_ADDR_RDAC | DPOT_RDAC0);
+DPOT_DEVICE_SHOW_SET(eeprom0, DPOT_ADDR_EEPROM | DPOT_RDAC0);
+DPOT_DEVICE_SHOW_ONLY(tolerance0, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC0);
+DPOT_DEVICE_SHOW_SET(otp0, DPOT_ADDR_OTP | DPOT_RDAC0);
+DPOT_DEVICE_SHOW_SET(otp0en, DPOT_ADDR_OTP_EN | DPOT_RDAC0);
+
+DPOT_DEVICE_SHOW_SET(rdac1, DPOT_ADDR_RDAC | DPOT_RDAC1);
+DPOT_DEVICE_SHOW_SET(eeprom1, DPOT_ADDR_EEPROM | DPOT_RDAC1);
+DPOT_DEVICE_SHOW_ONLY(tolerance1, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC1);
+DPOT_DEVICE_SHOW_SET(otp1, DPOT_ADDR_OTP | DPOT_RDAC1);
+DPOT_DEVICE_SHOW_SET(otp1en, DPOT_ADDR_OTP_EN | DPOT_RDAC1);
+
+DPOT_DEVICE_SHOW_SET(rdac2, DPOT_ADDR_RDAC | DPOT_RDAC2);
+DPOT_DEVICE_SHOW_SET(eeprom2, DPOT_ADDR_EEPROM | DPOT_RDAC2);
+DPOT_DEVICE_SHOW_ONLY(tolerance2, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC2);
+DPOT_DEVICE_SHOW_SET(otp2, DPOT_ADDR_OTP | DPOT_RDAC2);
+DPOT_DEVICE_SHOW_SET(otp2en, DPOT_ADDR_OTP_EN | DPOT_RDAC2);
+
+DPOT_DEVICE_SHOW_SET(rdac3, DPOT_ADDR_RDAC | DPOT_RDAC3);
+DPOT_DEVICE_SHOW_SET(eeprom3, DPOT_ADDR_EEPROM | DPOT_RDAC3);
+DPOT_DEVICE_SHOW_ONLY(tolerance3, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC3);
+DPOT_DEVICE_SHOW_SET(otp3, DPOT_ADDR_OTP | DPOT_RDAC3);
+DPOT_DEVICE_SHOW_SET(otp3en, DPOT_ADDR_OTP_EN | DPOT_RDAC3);
+
+DPOT_DEVICE_SHOW_SET(rdac4, DPOT_ADDR_RDAC | DPOT_RDAC4);
+DPOT_DEVICE_SHOW_SET(eeprom4, DPOT_ADDR_EEPROM | DPOT_RDAC4);
+DPOT_DEVICE_SHOW_ONLY(tolerance4, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC4);
+DPOT_DEVICE_SHOW_SET(otp4, DPOT_ADDR_OTP | DPOT_RDAC4);
+DPOT_DEVICE_SHOW_SET(otp4en, DPOT_ADDR_OTP_EN | DPOT_RDAC4);
+
+DPOT_DEVICE_SHOW_SET(rdac5, DPOT_ADDR_RDAC | DPOT_RDAC5);
+DPOT_DEVICE_SHOW_SET(eeprom5, DPOT_ADDR_EEPROM | DPOT_RDAC5);
+DPOT_DEVICE_SHOW_ONLY(tolerance5, DPOT_ADDR_EEPROM | DPOT_TOL_RDAC5);
+DPOT_DEVICE_SHOW_SET(otp5, DPOT_ADDR_OTP | DPOT_RDAC5);
+DPOT_DEVICE_SHOW_SET(otp5en, DPOT_ADDR_OTP_EN | DPOT_RDAC5);
+
+static const struct attribute *dpot_attrib_wipers[] = {
+ &dev_attr_rdac0.attr,
+ &dev_attr_rdac1.attr,
+ &dev_attr_rdac2.attr,
+ &dev_attr_rdac3.attr,
+ &dev_attr_rdac4.attr,
+ &dev_attr_rdac5.attr,
+ NULL
+};
-static DEVICE_ATTR(eeprom3, S_IWUSR | S_IRUGO, show_eeprom3, set_eeprom3);
+static const struct attribute *dpot_attrib_eeprom[] = {
+ &dev_attr_eeprom0.attr,
+ &dev_attr_eeprom1.attr,
+ &dev_attr_eeprom2.attr,
+ &dev_attr_eeprom3.attr,
+ &dev_attr_eeprom4.attr,
+ &dev_attr_eeprom5.attr,
+ NULL
+};
-static ssize_t show_tolerance3(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- return sysfs_show_reg(dev, attr, buf,
- AD525X_I2C_EEPROM | AD525X_TOL_RDAC3);
-}
+static const struct attribute *dpot_attrib_otp[] = {
+ &dev_attr_otp0.attr,
+ &dev_attr_otp1.attr,
+ &dev_attr_otp2.attr,
+ &dev_attr_otp3.attr,
+ &dev_attr_otp4.attr,
+ &dev_attr_otp5.attr,
+ NULL
+};
-static DEVICE_ATTR(tolerance3, S_IRUGO, show_tolerance3, NULL);
-
-static struct attribute *ad525x_attributes_wipers[4][4] = {
- {
- &dev_attr_rdac0.attr,
- &dev_attr_eeprom0.attr,
- &dev_attr_tolerance0.attr,
- NULL
- }, {
- &dev_attr_rdac1.attr,
- &dev_attr_eeprom1.attr,
- &dev_attr_tolerance1.attr,
- NULL
- }, {
- &dev_attr_rdac2.attr,
- &dev_attr_eeprom2.attr,
- &dev_attr_tolerance2.attr,
- NULL
- }, {
- &dev_attr_rdac3.attr,
- &dev_attr_eeprom3.attr,
- &dev_attr_tolerance3.attr,
- NULL
- }
+static const struct attribute *dpot_attrib_otp_en[] = {
+ &dev_attr_otp0en.attr,
+ &dev_attr_otp1en.attr,
+ &dev_attr_otp2en.attr,
+ &dev_attr_otp3en.attr,
+ &dev_attr_otp4en.attr,
+ &dev_attr_otp5en.attr,
+ NULL
};
-static const struct attribute_group ad525x_group_wipers[] = {
- {.attrs = ad525x_attributes_wipers[AD525X_RDAC0]},
- {.attrs = ad525x_attributes_wipers[AD525X_RDAC1]},
- {.attrs = ad525x_attributes_wipers[AD525X_RDAC2]},
- {.attrs = ad525x_attributes_wipers[AD525X_RDAC3]},
+static const struct attribute *dpot_attrib_tolerance[] = {
+ &dev_attr_tolerance0.attr,
+ &dev_attr_tolerance1.attr,
+ &dev_attr_tolerance2.attr,
+ &dev_attr_tolerance3.attr,
+ &dev_attr_tolerance4.attr,
+ &dev_attr_tolerance5.attr,
+ NULL
};
/* ------------------------------------------------------------------------- */
-static ssize_t set_inc_all(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_do_cmd(dev, attr, buf, count, AD525X_INC_ALL);
-}
+#define DPOT_DEVICE_DO_CMD(_name, _cmd) static ssize_t \
+set_##_name(struct device *dev, \
+ struct device_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ return sysfs_do_cmd(dev, attr, buf, count, _cmd); \
+} \
+static DEVICE_ATTR(_name, S_IWUSR | S_IRUGO, NULL, set_##_name);
-static DEVICE_ATTR(inc_all, S_IWUSR, NULL, set_inc_all);
-
-static ssize_t set_dec_all(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_do_cmd(dev, attr, buf, count, AD525X_DEC_ALL);
-}
-
-static DEVICE_ATTR(dec_all, S_IWUSR, NULL, set_dec_all);
-
-static ssize_t set_inc_all_6db(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_do_cmd(dev, attr, buf, count, AD525X_INC_ALL_6DB);
-}
-
-static DEVICE_ATTR(inc_all_6db, S_IWUSR, NULL, set_inc_all_6db);
-
-static ssize_t set_dec_all_6db(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- return sysfs_do_cmd(dev, attr, buf, count, AD525X_DEC_ALL_6DB);
-}
-
-static DEVICE_ATTR(dec_all_6db, S_IWUSR, NULL, set_dec_all_6db);
+DPOT_DEVICE_DO_CMD(inc_all, DPOT_INC_ALL);
+DPOT_DEVICE_DO_CMD(dec_all, DPOT_DEC_ALL);
+DPOT_DEVICE_DO_CMD(inc_all_6db, DPOT_INC_ALL_6DB);
+DPOT_DEVICE_DO_CMD(dec_all_6db, DPOT_DEC_ALL_6DB);
static struct attribute *ad525x_attributes_commands[] = {
&dev_attr_inc_all.attr,
.attrs = ad525x_attributes_commands,
};
-/* ------------------------------------------------------------------------- */
-
-/* i2c device functions */
+__devinit int ad_dpot_add_files(struct device *dev,
+ unsigned features, unsigned rdac)
+{
+ int err = sysfs_create_file(&dev->kobj,
+ dpot_attrib_wipers[rdac]);
+ if (features & F_CMD_EEP)
+ err |= sysfs_create_file(&dev->kobj,
+ dpot_attrib_eeprom[rdac]);
+ if (features & F_CMD_TOL)
+ err |= sysfs_create_file(&dev->kobj,
+ dpot_attrib_tolerance[rdac]);
+ if (features & F_CMD_OTP) {
+ err |= sysfs_create_file(&dev->kobj,
+ dpot_attrib_otp_en[rdac]);
+ err |= sysfs_create_file(&dev->kobj,
+ dpot_attrib_otp[rdac]);
+ }
-/**
- * ad525x_read - return the value contained in the specified register
- * on the AD5258 device.
- * @client: value returned from i2c_new_device()
- * @reg: the register to read
- *
- * If the tolerance register is specified, 2 bytes are returned.
- * Otherwise, 1 byte is returned. A negative value indicates an error
- * occurred while reading the register.
- */
-static s32 ad525x_read(struct i2c_client *client, u8 reg)
-{
- struct dpot_data *data = i2c_get_clientdata(client);
+ if (err)
+ dev_err(dev, "failed to register sysfs hooks for RDAC%d\n",
+ rdac);
- if ((reg & AD525X_REG_TOL) || (data->max_pos > 256))
- return i2c_smbus_read_word_data(client, (reg & 0xF8) |
- ((reg & 0x7) << 1));
- else
- return i2c_smbus_read_byte_data(client, reg);
+ return err;
}
-/**
- * ad525x_write - store the given value in the specified register on
- * the AD5258 device.
- * @client: value returned from i2c_new_device()
- * @reg: the register to write
- * @value: the byte to store in the register
- *
- * For certain instructions that do not require a data byte, "NULL"
- * should be specified for the "value" parameter. These instructions
- * include NOP, RESTORE_FROM_EEPROM, and STORE_TO_EEPROM.
- *
- * A negative return value indicates an error occurred while reading
- * the register.
- */
-static s32 ad525x_write(struct i2c_client *client, u8 reg, u8 value)
-{
- struct dpot_data *data = i2c_get_clientdata(client);
-
- /* Only write the instruction byte for certain commands */
- if (reg & AD525X_I2C_CMD)
- return i2c_smbus_write_byte(client, reg);
-
- if (data->max_pos > 256)
- return i2c_smbus_write_word_data(client, (reg & 0xF8) |
- ((reg & 0x7) << 1), value);
- else
- /* All other registers require instruction + data bytes */
- return i2c_smbus_write_byte_data(client, reg, value);
+inline void ad_dpot_remove_files(struct device *dev,
+ unsigned features, unsigned rdac)
+{
+ sysfs_remove_file(&dev->kobj,
+ dpot_attrib_wipers[rdac]);
+ if (features & F_CMD_EEP)
+ sysfs_remove_file(&dev->kobj,
+ dpot_attrib_eeprom[rdac]);
+ if (features & F_CMD_TOL)
+ sysfs_remove_file(&dev->kobj,
+ dpot_attrib_tolerance[rdac]);
+ if (features & F_CMD_OTP) {
+ sysfs_remove_file(&dev->kobj,
+ dpot_attrib_otp_en[rdac]);
+ sysfs_remove_file(&dev->kobj,
+ dpot_attrib_otp[rdac]);
+ }
}
-static int ad525x_probe(struct i2c_client *client,
- const struct i2c_device_id *id)
+__devinit int ad_dpot_probe(struct device *dev,
+ struct ad_dpot_bus_data *bdata, const struct ad_dpot_id *id)
{
- struct device *dev = &client->dev;
- struct dpot_data *data;
- int err = 0;
- dev_dbg(dev, "%s\n", __func__);
-
- if (!i2c_check_functionality(client->adapter, I2C_FUNC_SMBUS_BYTE)) {
- dev_err(dev, "missing I2C functionality for this driver\n");
- goto exit;
- }
+ struct dpot_data *data;
+ int i, err = 0;
data = kzalloc(sizeof(struct dpot_data), GFP_KERNEL);
if (!data) {
goto exit;
}
- i2c_set_clientdata(client, data);
+ dev_set_drvdata(dev, data);
mutex_init(&data->update_lock);
- switch (id->driver_data) {
- case AD5258_ID:
- data->max_pos = AD5258_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- break;
- case AD5259_ID:
- data->max_pos = AD5259_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- break;
- case AD5251_ID:
- data->max_pos = AD5251_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5252_ID:
- data->max_pos = AD5252_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5253_ID:
- data->max_pos = AD5253_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC2]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5254_ID:
- data->max_pos = AD5254_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC2]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5255_ID:
- data->max_pos = AD5255_MAX_POSITION;
- err = sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- err |= sysfs_create_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC2]);
- err |= sysfs_create_group(&dev->kobj, &ad525x_group_commands);
- break;
- default:
- err = -ENODEV;
- goto exit_free;
- }
+ data->bdata = *bdata;
+ data->devid = id->devid;
+
+ data->max_pos = 1 << DPOT_MAX_POS(data->devid);
+ data->rdac_mask = data->max_pos - 1;
+ data->feat = DPOT_FEAT(data->devid);
+ data->uid = DPOT_UID(data->devid);
+ data->wipers = DPOT_WIPERS(data->devid);
+
+ for (i = DPOT_RDAC0; i < MAX_RDACS; i++)
+ if (data->wipers & (1 << i)) {
+ err = ad_dpot_add_files(dev, data->feat, i);
+ if (err)
+ goto exit_remove_files;
+ /* power-up midscale */
+ if (data->feat & F_RDACS_WONLY)
+ data->rdac_cache[i] = data->max_pos / 2;
+ }
+
+ if (data->feat & F_CMD_INC)
+ err = sysfs_create_group(&dev->kobj, &ad525x_group_commands);
if (err) {
dev_err(dev, "failed to register sysfs hooks\n");
goto exit_free;
}
- data->devid = id->driver_data;
- data->rdac_mask = data->max_pos - 1;
-
dev_info(dev, "%s %d-Position Digital Potentiometer registered\n",
id->name, data->max_pos);
return 0;
+exit_remove_files:
+ for (i = DPOT_RDAC0; i < MAX_RDACS; i++)
+ if (data->wipers & (1 << i))
+ ad_dpot_remove_files(dev, data->feat, i);
+
exit_free:
kfree(data);
- i2c_set_clientdata(client, NULL);
+ dev_set_drvdata(dev, NULL);
exit:
- dev_err(dev, "failed to create client\n");
+ dev_err(dev, "failed to create client for %s ID 0x%lX\n",
+ id->name, id->devid);
return err;
}
+EXPORT_SYMBOL(ad_dpot_probe);
-static int __devexit ad525x_remove(struct i2c_client *client)
+__devexit int ad_dpot_remove(struct device *dev)
{
- struct dpot_data *data = i2c_get_clientdata(client);
- struct device *dev = &client->dev;
-
- switch (data->devid) {
- case AD5258_ID:
- case AD5259_ID:
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- break;
- case AD5251_ID:
- case AD5252_ID:
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- sysfs_remove_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5253_ID:
- case AD5254_ID:
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC2]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC3]);
- sysfs_remove_group(&dev->kobj, &ad525x_group_commands);
- break;
- case AD5255_ID:
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC0]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC1]);
- sysfs_remove_group(&dev->kobj,
- &ad525x_group_wipers[AD525X_RDAC2]);
- sysfs_remove_group(&dev->kobj, &ad525x_group_commands);
- break;
- }
+ struct dpot_data *data = dev_get_drvdata(dev);
+ int i;
+
+ for (i = DPOT_RDAC0; i < MAX_RDACS; i++)
+ if (data->wipers & (1 << i))
+ ad_dpot_remove_files(dev, data->feat, i);
- i2c_set_clientdata(client, NULL);
kfree(data);
return 0;
}
+EXPORT_SYMBOL(ad_dpot_remove);
-static const struct i2c_device_id ad525x_idtable[] = {
- {"ad5258", AD5258_ID},
- {"ad5259", AD5259_ID},
- {"ad5251", AD5251_ID},
- {"ad5252", AD5252_ID},
- {"ad5253", AD5253_ID},
- {"ad5254", AD5254_ID},
- {"ad5255", AD5255_ID},
- {}
-};
-
-MODULE_DEVICE_TABLE(i2c, ad525x_idtable);
-
-static struct i2c_driver ad525x_driver = {
- .driver = {
- .owner = THIS_MODULE,
- .name = DRIVER_NAME,
- },
- .id_table = ad525x_idtable,
- .probe = ad525x_probe,
- .remove = __devexit_p(ad525x_remove),
-};
-
-static int __init ad525x_init(void)
-{
- return i2c_add_driver(&ad525x_driver);
-}
-
-module_init(ad525x_init);
-
-static void __exit ad525x_exit(void)
-{
- i2c_del_driver(&ad525x_driver);
-}
-
-module_exit(ad525x_exit);
MODULE_AUTHOR("Chris Verges <chrisv@cyberswitching.com>, "
- "Michael Hennerich <hennerich@blackfin.uclinux.org>, ");
-MODULE_DESCRIPTION("AD5258/9 digital potentiometer driver");
+ "Michael Hennerich <hennerich@blackfin.uclinux.org>");
+MODULE_DESCRIPTION("Digital potentiometer driver");
MODULE_LICENSE("GPL");
MODULE_VERSION(DRIVER_VERSION);
--- /dev/null
+/*
+ * Driver for the Analog Devices digital potentiometers
+ *
+ * Copyright (C) 2010 Michael Hennerich, Analog Devices Inc.
+ *
+ * Licensed under the GPL-2 or later.
+ */
+
+#ifndef _AD_DPOT_H_
+#define _AD_DPOT_H_
+
+#include <linux/types.h>
+
+#define DPOT_CONF(features, wipers, max_pos, uid) \
+ (((features) << 18) | (((wipers) & 0xFF) << 10) | \
+ ((max_pos & 0xF) << 6) | (uid & 0x3F))
+
+#define DPOT_UID(conf) (conf & 0x3F)
+#define DPOT_MAX_POS(conf) ((conf >> 6) & 0xF)
+#define DPOT_WIPERS(conf) ((conf >> 10) & 0xFF)
+#define DPOT_FEAT(conf) (conf >> 18)
+
+#define BRDAC0 (1 << 0)
+#define BRDAC1 (1 << 1)
+#define BRDAC2 (1 << 2)
+#define BRDAC3 (1 << 3)
+#define BRDAC4 (1 << 4)
+#define BRDAC5 (1 << 5)
+#define MAX_RDACS 6
+
+#define F_CMD_INC (1 << 0) /* Features INC/DEC ALL, 6dB */
+#define F_CMD_EEP (1 << 1) /* Features EEPROM */
+#define F_CMD_OTP (1 << 2) /* Features OTP */
+#define F_CMD_TOL (1 << 3) /* RDACS feature Tolerance REG */
+#define F_RDACS_RW (1 << 4) /* RDACS are Read/Write */
+#define F_RDACS_WONLY (1 << 5) /* RDACS are Write only */
+#define F_AD_APPDATA (1 << 6) /* RDAC Address append to data */
+#define F_SPI_8BIT (1 << 7) /* All SPI XFERS are 8-bit */
+#define F_SPI_16BIT (1 << 8) /* All SPI XFERS are 16-bit */
+#define F_SPI_24BIT (1 << 9) /* All SPI XFERS are 24-bit */
+
+#define F_RDACS_RW_TOL (F_RDACS_RW | F_CMD_EEP | F_CMD_TOL)
+#define F_RDACS_RW_EEP (F_RDACS_RW | F_CMD_EEP)
+#define F_SPI (F_SPI_8BIT | F_SPI_16BIT | F_SPI_24BIT)
+
+enum dpot_devid {
+ AD5258_ID = DPOT_CONF(F_RDACS_RW_TOL, BRDAC0, 6, 0), /* I2C */
+ AD5259_ID = DPOT_CONF(F_RDACS_RW_TOL, BRDAC0, 8, 1),
+ AD5251_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC3, 6, 2),
+ AD5252_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC3, 8, 3),
+ AD5253_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 6, 4),
+ AD5254_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 8, 5),
+ AD5255_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC1 | BRDAC2, 9, 6),
+ AD5160_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 7), /* SPI */
+ AD5161_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 8),
+ AD5162_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1, 8, 9),
+ AD5165_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 10),
+ AD5200_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 11),
+ AD5201_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 5, 12),
+ AD5203_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 6, 13),
+ AD5204_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 8, 14),
+ AD5206_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3 | BRDAC4 | BRDAC5,
+ 8, 15),
+ AD5207_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1, 8, 16),
+ AD5231_ID = DPOT_CONF(F_RDACS_RW_EEP | F_CMD_INC | F_SPI_24BIT,
+ BRDAC0, 10, 17),
+ AD5232_ID = DPOT_CONF(F_RDACS_RW_EEP | F_CMD_INC | F_SPI_16BIT,
+ BRDAC0 | BRDAC1, 8, 18),
+ AD5233_ID = DPOT_CONF(F_RDACS_RW_EEP | F_CMD_INC | F_SPI_16BIT,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 6, 19),
+ AD5235_ID = DPOT_CONF(F_RDACS_RW_EEP | F_CMD_INC | F_SPI_24BIT,
+ BRDAC0 | BRDAC1, 10, 20),
+ AD5260_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 21),
+ AD5262_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1, 8, 22),
+ AD5263_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1 | BRDAC2 | BRDAC3, 8, 23),
+ AD5290_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 24),
+ AD5291_ID = DPOT_CONF(F_RDACS_RW | F_SPI_16BIT, BRDAC0, 8, 25),
+ AD5292_ID = DPOT_CONF(F_RDACS_RW | F_SPI_16BIT, BRDAC0, 10, 26),
+ AD5293_ID = DPOT_CONF(F_RDACS_RW | F_SPI_16BIT, BRDAC0, 10, 27),
+ AD7376_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 7, 28),
+ AD8400_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_8BIT,
+ BRDAC0, 8, 29),
+ AD8402_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1, 8, 30),
+ AD8403_ID = DPOT_CONF(F_RDACS_WONLY | F_AD_APPDATA | F_SPI_16BIT,
+ BRDAC0 | BRDAC1 | BRDAC2, 8, 31),
+ ADN2850_ID = DPOT_CONF(F_RDACS_RW_EEP | F_CMD_INC | F_SPI_24BIT,
+ BRDAC0 | BRDAC1, 10, 32),
+ AD5241_ID = DPOT_CONF(F_RDACS_RW, BRDAC0, 8, 33),
+ AD5242_ID = DPOT_CONF(F_RDACS_RW, BRDAC0 | BRDAC1, 8, 34),
+ AD5243_ID = DPOT_CONF(F_RDACS_RW, BRDAC0 | BRDAC1, 8, 35),
+ AD5245_ID = DPOT_CONF(F_RDACS_RW, BRDAC0, 8, 36),
+ AD5246_ID = DPOT_CONF(F_RDACS_RW, BRDAC0, 7, 37),
+ AD5247_ID = DPOT_CONF(F_RDACS_RW, BRDAC0, 7, 38),
+ AD5248_ID = DPOT_CONF(F_RDACS_RW, BRDAC0 | BRDAC1, 8, 39),
+ AD5280_ID = DPOT_CONF(F_RDACS_RW, BRDAC0, 8, 40),
+ AD5282_ID = DPOT_CONF(F_RDACS_RW, BRDAC0 | BRDAC1, 8, 41),
+ ADN2860_ID = DPOT_CONF(F_RDACS_RW_TOL | F_CMD_INC,
+ BRDAC0 | BRDAC1 | BRDAC2, 9, 42),
+ AD5273_ID = DPOT_CONF(F_RDACS_RW | F_CMD_OTP, BRDAC0, 6, 43),
+ AD5171_ID = DPOT_CONF(F_RDACS_RW | F_CMD_OTP, BRDAC0, 6, 44),
+ AD5170_ID = DPOT_CONF(F_RDACS_RW | F_CMD_OTP, BRDAC0, 8, 45),
+ AD5172_ID = DPOT_CONF(F_RDACS_RW | F_CMD_OTP, BRDAC0 | BRDAC1, 8, 46),
+ AD5173_ID = DPOT_CONF(F_RDACS_RW | F_CMD_OTP, BRDAC0 | BRDAC1, 8, 47),
+};
+
+#define DPOT_RDAC0 0
+#define DPOT_RDAC1 1
+#define DPOT_RDAC2 2
+#define DPOT_RDAC3 3
+#define DPOT_RDAC4 4
+#define DPOT_RDAC5 5
+
+#define DPOT_RDAC_MASK 0x1F
+
+#define DPOT_REG_TOL 0x18
+#define DPOT_TOL_RDAC0 (DPOT_REG_TOL | DPOT_RDAC0)
+#define DPOT_TOL_RDAC1 (DPOT_REG_TOL | DPOT_RDAC1)
+#define DPOT_TOL_RDAC2 (DPOT_REG_TOL | DPOT_RDAC2)
+#define DPOT_TOL_RDAC3 (DPOT_REG_TOL | DPOT_RDAC3)
+#define DPOT_TOL_RDAC4 (DPOT_REG_TOL | DPOT_RDAC4)
+#define DPOT_TOL_RDAC5 (DPOT_REG_TOL | DPOT_RDAC5)
+
+/* RDAC-to-EEPROM Interface Commands */
+#define DPOT_ADDR_RDAC (0x0 << 5)
+#define DPOT_ADDR_EEPROM (0x1 << 5)
+#define DPOT_ADDR_OTP (0x1 << 6)
+#define DPOT_ADDR_CMD (0x1 << 7)
+#define DPOT_ADDR_OTP_EN (0x1 << 9)
+
+#define DPOT_DEC_ALL_6DB (DPOT_ADDR_CMD | (0x4 << 3))
+#define DPOT_INC_ALL_6DB (DPOT_ADDR_CMD | (0x9 << 3))
+#define DPOT_DEC_ALL (DPOT_ADDR_CMD | (0x6 << 3))
+#define DPOT_INC_ALL (DPOT_ADDR_CMD | (0xB << 3))
+
+#define DPOT_SPI_RDAC 0xB0
+#define DPOT_SPI_EEPROM 0x30
+#define DPOT_SPI_READ_RDAC 0xA0
+#define DPOT_SPI_READ_EEPROM 0x90
+#define DPOT_SPI_DEC_ALL_6DB 0x50
+#define DPOT_SPI_INC_ALL_6DB 0xD0
+#define DPOT_SPI_DEC_ALL 0x70
+#define DPOT_SPI_INC_ALL 0xF0
+
+/* AD5291/2/3 use special commands */
+#define DPOT_AD5291_RDAC 0x01
+#define DPOT_AD5291_READ_RDAC 0x02
+
+/* AD524x use special commands */
+#define DPOT_AD5291_RDAC_AB 0x80
+
+#define DPOT_AD5273_FUSE 0x80
+#define DPOT_AD5270_2_3_FUSE 0x20
+#define DPOT_AD5270_2_3_OW 0x08
+#define DPOT_AD5272_3_A0 0x08
+#define DPOT_AD5270_2FUSE 0x80
+
+struct dpot_data;
+
+struct ad_dpot_bus_ops {
+ int (*read_d8) (void *client);
+ int (*read_r8d8) (void *client, u8 reg);
+ int (*read_r8d16) (void *client, u8 reg);
+ int (*write_d8) (void *client, u8 val);
+ int (*write_r8d8) (void *client, u8 reg, u8 val);
+ int (*write_r8d16) (void *client, u8 reg, u16 val);
+};
+
+struct ad_dpot_bus_data {
+ void *client;
+ const struct ad_dpot_bus_ops *bops;
+};
+
+struct ad_dpot_id {
+ char *name;
+ unsigned long devid;
+};
+
+int ad_dpot_probe(struct device *dev, struct ad_dpot_bus_data *bdata, const struct ad_dpot_id *id);
+int ad_dpot_remove(struct device *dev);
+
+#endif
enable_MAC(ai, 1);
}
-static inline u8 hexVal(char c) {
- if (c>='0' && c<='9') return c -= '0';
- if (c>='a' && c<='f') return c -= 'a'-10;
- if (c>='A' && c<='F') return c -= 'A'-10;
- return 0;
-}
-
static void proc_APList_on_close( struct inode *inode, struct file *file ) {
struct proc_data *data = (struct proc_data *)file->private_data;
struct proc_dir_entry *dp = PDE(inode);
switch(j%3) {
case 0:
APList_rid.ap[i][j/3]=
- hexVal(data->wbuffer[j+i*6*3])<<4;
+ hex_to_bin(data->wbuffer[j+i*6*3])<<4;
break;
case 1:
APList_rid.ap[i][j/3]|=
- hexVal(data->wbuffer[j+i*6*3]);
+ hex_to_bin(data->wbuffer[j+i*6*3]);
break;
}
}
for( i = 0; i < 16*3 && data->wbuffer[i+j]; i++ ) {
switch(i%3) {
case 0:
- key[i/3] = hexVal(data->wbuffer[i+j])<<4;
+ key[i/3] = hex_to_bin(data->wbuffer[i+j])<<4;
break;
case 1:
- key[i/3] |= hexVal(data->wbuffer[i+j]);
+ key[i/3] |= hex_to_bin(data->wbuffer[i+j]);
break;
}
}
{
struct device *dev = container_of(kobj, struct device, kobj);
struct power_supply *psy = dev_get_drvdata(dev);
+ mode_t mode = S_IRUSR | S_IRGRP | S_IROTH;
int i;
+ if (attrno == POWER_SUPPLY_PROP_TYPE)
+ return mode;
+
for (i = 0; i < psy->num_properties; i++) {
int property = psy->properties[i];
if (property == attrno) {
- mode_t mode = S_IRUSR | S_IRGRP | S_IROTH;
-
if (psy->property_is_writeable &&
psy->property_is_writeable(psy, property) > 0)
mode |= S_IWUSR;
config RTC_DRV_S3C
tristate "Samsung S3C series SoC RTC"
- depends on ARCH_S3C2410
+ depends on ARCH_S3C2410 || ARCH_S3C64XX
help
RTC (Realtime Clock) driver for the clock inbuilt into the
Samsung S3C24XX series of SoCs. This can provide periodic
}
}
+ cmos_rtc.dev = dev;
+ dev_set_drvdata(dev, &cmos_rtc);
+
cmos_rtc.rtc = rtc_device_register(driver_name, dev,
&cmos_rtc_ops, THIS_MODULE);
if (IS_ERR(cmos_rtc.rtc)) {
goto cleanup0;
}
- cmos_rtc.dev = dev;
- dev_set_drvdata(dev, &cmos_rtc);
rename_region(ports, dev_name(&cmos_rtc.rtc->dev));
spin_lock_irq(&rtc_lock);
#include <linux/rtc.h>
#include <linux/io.h>
#include <linux/bcd.h>
-#include <asm/rtc.h>
#define DRV_NAME "rtc-ds1302"
#define DRV_VERSION "0.1.1"
#define RTC_ADDR_MIN 0x01 /* Address of minute register */
#define RTC_ADDR_SEC 0x00 /* Address of second register */
+#ifdef CONFIG_SH_SECUREEDGE5410
+#include <asm/rtc.h>
+#include <mach/snapgear.h>
+
#define RTC_RESET 0x1000
#define RTC_IODATA 0x0800
#define RTC_SCLK 0x0400
-#ifdef CONFIG_SH_SECUREEDGE5410
-#include <mach/snapgear.h>
#define set_dp(x) SECUREEDGE_WRITE_IOPORT(x, 0x1c00)
#define get_dp() SECUREEDGE_READ_IOPORT()
+#define ds1302_set_tx()
+#define ds1302_set_rx()
+
+static inline int ds1302_hw_init(void)
+{
+ return 0;
+}
+
+static inline void ds1302_reset(void)
+{
+ set_dp(get_dp() & ~(RTC_RESET | RTC_IODATA | RTC_SCLK));
+}
+
+static inline void ds1302_clock(void)
+{
+ set_dp(get_dp() | RTC_SCLK); /* clock high */
+ set_dp(get_dp() & ~RTC_SCLK); /* clock low */
+}
+
+static inline void ds1302_start(void)
+{
+ set_dp(get_dp() | RTC_RESET);
+}
+
+static inline void ds1302_stop(void)
+{
+ set_dp(get_dp() & ~RTC_RESET);
+}
+
+static inline void ds1302_txbit(int bit)
+{
+ set_dp((get_dp() & ~RTC_IODATA) | (bit ? RTC_IODATA : 0));
+}
+
+static inline int ds1302_rxbit(void)
+{
+ return !!(get_dp() & RTC_IODATA);
+}
+
#else
#error "Add support for your platform"
#endif
{
int i;
+ ds1302_set_tx();
+
for (i = 8; (i); i--, val >>= 1) {
- set_dp((get_dp() & ~RTC_IODATA) | ((val & 0x1) ?
- RTC_IODATA : 0));
- set_dp(get_dp() | RTC_SCLK); /* clock high */
- set_dp(get_dp() & ~RTC_SCLK); /* clock low */
+ ds1302_txbit(val & 0x1);
+ ds1302_clock();
}
}
unsigned int val;
int i;
+ ds1302_set_rx();
+
for (i = 0, val = 0; (i < 8); i++) {
- val |= (((get_dp() & RTC_IODATA) ? 1 : 0) << i);
- set_dp(get_dp() | RTC_SCLK); /* clock high */
- set_dp(get_dp() & ~RTC_SCLK); /* clock low */
+ val |= (ds1302_rxbit() << i);
+ ds1302_clock();
}
return val;
{
unsigned int val;
- set_dp(get_dp() & ~(RTC_RESET | RTC_IODATA | RTC_SCLK));
+ ds1302_reset();
- set_dp(get_dp() | RTC_RESET);
+ ds1302_start();
ds1302_sendbits(((addr & 0x3f) << 1) | RTC_CMD_READ);
val = ds1302_recvbits();
- set_dp(get_dp() & ~RTC_RESET);
+ ds1302_stop();
return val;
}
static void ds1302_writebyte(unsigned int addr, unsigned int val)
{
- set_dp(get_dp() & ~(RTC_RESET | RTC_IODATA | RTC_SCLK));
- set_dp(get_dp() | RTC_RESET);
+ ds1302_reset();
+
+ ds1302_start();
ds1302_sendbits(((addr & 0x3f) << 1) | RTC_CMD_WRITE);
ds1302_sendbits(val);
- set_dp(get_dp() & ~RTC_RESET);
+ ds1302_stop();
}
static int ds1302_rtc_read_time(struct device *dev, struct rtc_time *tm)
{
struct rtc_device *rtc;
+ if (ds1302_hw_init()) {
+ dev_err(&pdev->dev, "Failed to init communication channel");
+ return -EINVAL;
+ }
+
/* Reset */
- set_dp(get_dp() & ~(RTC_RESET | RTC_IODATA | RTC_SCLK));
+ ds1302_reset();
/* Write a magic value to the DS1302 RAM, and see if it sticks. */
ds1302_writebyte(RTC_ADDR_RAM0, 0x42);
- if (ds1302_readbyte(RTC_ADDR_RAM0) != 0x42)
+ if (ds1302_readbyte(RTC_ADDR_RAM0) != 0x42) {
+ dev_err(&pdev->dev, "Failed to probe");
return -ENODEV;
+ }
rtc = rtc_device_register("ds1302", &pdev->dev,
&ds1302_rtc_ops, THIS_MODULE);
static DEVICE_ATTR(usr, S_IRUGO | S_IWUSR, isl1208_sysfs_show_usr,
isl1208_sysfs_store_usr);
-static int
-isl1208_sysfs_register(struct device *dev)
-{
- int err;
-
- err = device_create_file(dev, &dev_attr_atrim);
- if (err)
- return err;
-
- err = device_create_file(dev, &dev_attr_dtrim);
- if (err) {
- device_remove_file(dev, &dev_attr_atrim);
- return err;
- }
-
- err = device_create_file(dev, &dev_attr_usr);
- if (err) {
- device_remove_file(dev, &dev_attr_atrim);
- device_remove_file(dev, &dev_attr_dtrim);
- }
-
- return 0;
-}
-
-static int
-isl1208_sysfs_unregister(struct device *dev)
-{
- device_remove_file(dev, &dev_attr_dtrim);
- device_remove_file(dev, &dev_attr_atrim);
- device_remove_file(dev, &dev_attr_usr);
+static struct attribute *isl1208_rtc_attrs[] = {
+ &dev_attr_atrim.attr,
+ &dev_attr_dtrim.attr,
+ &dev_attr_usr.attr,
+ NULL
+};
- return 0;
-}
+static const struct attribute_group isl1208_rtc_sysfs_files = {
+ .attrs = isl1208_rtc_attrs,
+};
static int
isl1208_probe(struct i2c_client *client, const struct i2c_device_id *id)
dev_warn(&client->dev, "rtc power failure detected, "
"please set clock.\n");
- rc = isl1208_sysfs_register(&client->dev);
+ rc = sysfs_create_group(&client->dev.kobj, &isl1208_rtc_sysfs_files);
if (rc)
goto exit_unregister;
{
struct rtc_device *rtc = i2c_get_clientdata(client);
- isl1208_sysfs_unregister(&client->dev);
+ sysfs_remove_group(&client->dev.kobj, &isl1208_rtc_sysfs_files);
rtc_device_unregister(rtc);
return 0;
static int __init mxc_rtc_probe(struct platform_device *pdev)
{
- struct clk *clk;
struct resource *res;
struct rtc_device *rtc;
struct rtc_plat_data *pdata = NULL;
pdata->ioaddr = devm_ioremap(&pdev->dev, res->start,
resource_size(res));
- clk = clk_get(&pdev->dev, "ckil");
- if (IS_ERR(clk)) {
- ret = PTR_ERR(clk);
+ pdata->clk = clk_get(&pdev->dev, "rtc");
+ if (IS_ERR(pdata->clk)) {
+ dev_err(&pdev->dev, "unable to get clock!\n");
+ ret = PTR_ERR(pdata->clk);
goto exit_free_pdata;
}
- rate = clk_get_rate(clk);
- clk_put(clk);
+ clk_enable(pdata->clk);
+ rate = clk_get_rate(pdata->clk);
if (rate == 32768)
reg = RTC_INPUT_CLK_32768HZ;
else {
dev_err(&pdev->dev, "rtc clock is not valid (%lu)\n", rate);
ret = -EINVAL;
- goto exit_free_pdata;
+ goto exit_put_clk;
}
reg |= RTC_ENABLE_BIT;
if (((readw(pdata->ioaddr + RTC_RTCCTL)) & RTC_ENABLE_BIT) == 0) {
dev_err(&pdev->dev, "hardware module can't be enabled!\n");
ret = -EIO;
- goto exit_free_pdata;
- }
-
- pdata->clk = clk_get(&pdev->dev, "rtc");
- if (IS_ERR(pdata->clk)) {
- dev_err(&pdev->dev, "unable to get clock!\n");
- ret = PTR_ERR(pdata->clk);
- goto exit_free_pdata;
+ goto exit_put_clk;
}
- clk_enable(pdata->clk);
-
rtc = rtc_device_register(pdev->name, &pdev->dev, &mxc_rtc_ops,
THIS_MODULE);
if (IS_ERR(rtc)) {
#include <asm/irq.h>
#include <plat/regs-rtc.h>
+enum s3c_cpu_type {
+ TYPE_S3C2410,
+ TYPE_S3C64XX,
+};
+
/* I have yet to find an S3C implementation with more than one
* of these rtc blocks in */
static void __iomem *s3c_rtc_base;
static int s3c_rtc_alarmno = NO_IRQ;
static int s3c_rtc_tickno = NO_IRQ;
+static enum s3c_cpu_type s3c_rtc_cpu_type;
static DEFINE_SPINLOCK(s3c_rtc_pie_lock);
pr_debug("%s: pie=%d\n", __func__, enabled);
spin_lock_irq(&s3c_rtc_pie_lock);
- tmp = readb(s3c_rtc_base + S3C2410_TICNT) & ~S3C2410_TICNT_ENABLE;
- if (enabled)
- tmp |= S3C2410_TICNT_ENABLE;
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX) {
+ tmp = readb(s3c_rtc_base + S3C2410_RTCCON);
+ tmp &= ~S3C64XX_RTCCON_TICEN;
+
+ if (enabled)
+ tmp |= S3C64XX_RTCCON_TICEN;
+
+ writeb(tmp, s3c_rtc_base + S3C2410_RTCCON);
+ } else {
+ tmp = readb(s3c_rtc_base + S3C2410_TICNT);
+ tmp &= ~S3C2410_TICNT_ENABLE;
+
+ if (enabled)
+ tmp |= S3C2410_TICNT_ENABLE;
+
+ writeb(tmp, s3c_rtc_base + S3C2410_TICNT);
+ }
- writeb(tmp, s3c_rtc_base + S3C2410_TICNT);
spin_unlock_irq(&s3c_rtc_pie_lock);
return 0;
static int s3c_rtc_setfreq(struct device *dev, int freq)
{
- unsigned int tmp;
+ struct platform_device *pdev = to_platform_device(dev);
+ struct rtc_device *rtc_dev = platform_get_drvdata(pdev);
+ unsigned int tmp = 0;
if (!is_power_of_2(freq))
return -EINVAL;
spin_lock_irq(&s3c_rtc_pie_lock);
- tmp = readb(s3c_rtc_base + S3C2410_TICNT) & S3C2410_TICNT_ENABLE;
- tmp |= (128 / freq)-1;
+ if (s3c_rtc_cpu_type == TYPE_S3C2410) {
+ tmp = readb(s3c_rtc_base + S3C2410_TICNT);
+ tmp &= S3C2410_TICNT_ENABLE;
+ }
+
+ tmp |= (rtc_dev->max_user_freq / freq)-1;
writeb(tmp, s3c_rtc_base + S3C2410_TICNT);
spin_unlock_irq(&s3c_rtc_pie_lock);
static int s3c_rtc_proc(struct device *dev, struct seq_file *seq)
{
- unsigned int ticnt = readb(s3c_rtc_base + S3C2410_TICNT);
+ unsigned int ticnt;
- seq_printf(seq, "periodic_IRQ\t: %s\n",
- (ticnt & S3C2410_TICNT_ENABLE) ? "yes" : "no" );
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX) {
+ ticnt = readb(s3c_rtc_base + S3C2410_RTCCON);
+ ticnt &= S3C64XX_RTCCON_TICEN;
+ } else {
+ ticnt = readb(s3c_rtc_base + S3C2410_TICNT);
+ ticnt &= S3C2410_TICNT_ENABLE;
+ }
+
+ seq_printf(seq, "periodic_IRQ\t: %s\n", ticnt ? "yes" : "no");
return 0;
}
if (!en) {
tmp = readb(base + S3C2410_RTCCON);
- writeb(tmp & ~S3C2410_RTCCON_RTCEN, base + S3C2410_RTCCON);
-
- tmp = readb(base + S3C2410_TICNT);
- writeb(tmp & ~S3C2410_TICNT_ENABLE, base + S3C2410_TICNT);
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX)
+ tmp &= ~S3C64XX_RTCCON_TICEN;
+ tmp &= ~S3C2410_RTCCON_RTCEN;
+ writeb(tmp, base + S3C2410_RTCCON);
+
+ if (s3c_rtc_cpu_type == TYPE_S3C2410) {
+ tmp = readb(base + S3C2410_TICNT);
+ tmp &= ~S3C2410_TICNT_ENABLE;
+ writeb(tmp, base + S3C2410_TICNT);
+ }
} else {
/* re-enable the device, and check it is ok */
goto err_nortc;
}
- rtc->max_user_freq = 128;
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX)
+ rtc->max_user_freq = 32768;
+ else
+ rtc->max_user_freq = 128;
+
+ s3c_rtc_cpu_type = platform_get_device_id(pdev)->driver_data;
platform_set_drvdata(pdev, rtc);
return 0;
/* RTC Power management control */
-static int ticnt_save;
+static int ticnt_save, ticnt_en_save;
static int s3c_rtc_suspend(struct platform_device *pdev, pm_message_t state)
{
/* save TICNT for anyone using periodic interrupts */
ticnt_save = readb(s3c_rtc_base + S3C2410_TICNT);
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX) {
+ ticnt_en_save = readb(s3c_rtc_base + S3C2410_RTCCON);
+ ticnt_en_save &= S3C64XX_RTCCON_TICEN;
+ }
s3c_rtc_enable(pdev, 0);
return 0;
}
static int s3c_rtc_resume(struct platform_device *pdev)
{
+ unsigned int tmp;
+
s3c_rtc_enable(pdev, 1);
writeb(ticnt_save, s3c_rtc_base + S3C2410_TICNT);
+ if (s3c_rtc_cpu_type == TYPE_S3C64XX && ticnt_en_save) {
+ tmp = readb(s3c_rtc_base + S3C2410_RTCCON);
+ writeb(tmp | ticnt_en_save, s3c_rtc_base + S3C2410_RTCCON);
+ }
return 0;
}
#else
#define s3c_rtc_resume NULL
#endif
-static struct platform_driver s3c2410_rtc_driver = {
+static struct platform_device_id s3c_rtc_driver_ids[] = {
+ {
+ .name = "s3c2410-rtc",
+ .driver_data = TYPE_S3C2410,
+ }, {
+ .name = "s3c64xx-rtc",
+ .driver_data = TYPE_S3C64XX,
+ },
+ { }
+};
+
+MODULE_DEVICE_TABLE(platform, s3c_rtc_driver_ids);
+
+static struct platform_driver s3c_rtc_driver = {
.probe = s3c_rtc_probe,
.remove = __devexit_p(s3c_rtc_remove),
.suspend = s3c_rtc_suspend,
.resume = s3c_rtc_resume,
+ .id_table = s3c_rtc_driver_ids,
.driver = {
- .name = "s3c2410-rtc",
+ .name = "s3c-rtc",
.owner = THIS_MODULE,
},
};
static int __init s3c_rtc_init(void)
{
printk(banner);
- return platform_driver_register(&s3c2410_rtc_driver);
+ return platform_driver_register(&s3c_rtc_driver);
}
static void __exit s3c_rtc_exit(void)
{
- platform_driver_unregister(&s3c2410_rtc_driver);
+ platform_driver_unregister(&s3c_rtc_driver);
}
module_init(s3c_rtc_init);
goto err;
}
- ret = wm831x_request_irq(wm831x, per_irq, wm831x_per_irq,
- IRQF_TRIGGER_RISING, "wm831x_rtc_per",
- wm831x_rtc);
+ ret = request_threaded_irq(per_irq, NULL, wm831x_per_irq,
+ IRQF_TRIGGER_RISING, "RTC period",
+ wm831x_rtc);
if (ret != 0) {
dev_err(&pdev->dev, "Failed to request periodic IRQ %d: %d\n",
per_irq, ret);
}
- ret = wm831x_request_irq(wm831x, alm_irq, wm831x_alm_irq,
- IRQF_TRIGGER_RISING, "wm831x_rtc_alm",
- wm831x_rtc);
+ ret = request_threaded_irq(alm_irq, NULL, wm831x_alm_irq,
+ IRQF_TRIGGER_RISING, "RTC alarm",
+ wm831x_rtc);
if (ret != 0) {
dev_err(&pdev->dev, "Failed to request alarm IRQ %d: %d\n",
alm_irq, ret);
int per_irq = platform_get_irq_byname(pdev, "PER");
int alm_irq = platform_get_irq_byname(pdev, "ALM");
- wm831x_free_irq(wm831x_rtc->wm831x, alm_irq, wm831x_rtc);
- wm831x_free_irq(wm831x_rtc->wm831x, per_irq, wm831x_rtc);
+ free_irq(alm_irq, wm831x_rtc);
+ free_irq(per_irq, wm831x_rtc);
rtc_device_unregister(wm831x_rtc->rtc);
kfree(wm831x_rtc);
}
if (!lport->vport)
- fc_host_max_npiv_vports(lport->host) = USHORT_MAX;
+ fc_host_max_npiv_vports(lport->host) = USHRT_MAX;
snprintf(fc_host_symbolic_name(lport->host), FC_SYMBOLIC_NAME_SIZE,
"%s v%s over %s", FCOE_NAME, FCOE_VERSION,
if (ioc->config_cmds.status & MPT2_CMD_PENDING) {
ioc->config_cmds.status |= MPT2_CMD_RESET;
mpt2sas_base_free_smid(ioc, ioc->config_cmds.smid);
- ioc->config_cmds.smid = USHORT_MAX;
+ ioc->config_cmds.smid = USHRT_MAX;
complete(&ioc->config_cmds.done);
}
break;
#ifdef CONFIG_SCSI_MPT2SAS_LOGGING
_config_display_some_debug(ioc, smid, "config_done", mpi_reply);
#endif
- ioc->config_cmds.smid = USHORT_MAX;
+ ioc->config_cmds.smid = USHRT_MAX;
complete(&ioc->config_cmds.done);
return 1;
}
for (i = 0; i < ARRAY_SIZE(baud_table); i++)
if (baud_table[i] == n)
break;
- if (i < BAUD_TABLE_SIZE) {
+ if (i < ARRAY_SIZE(baud_table)) {
m68328_console_baud = n;
m68328_console_cbaud = 0;
if (i > 15) {
}
/* IRQL = PASSIVE_LEVEL */
-u8 BtoH(char ch)
-{
- if (ch >= '0' && ch <= '9')
- return (ch - '0'); /* Handle numerals */
- if (ch >= 'A' && ch <= 'F')
- return (ch - 'A' + 0xA); /* Handle capitol hex digits */
- if (ch >= 'a' && ch <= 'f')
- return (ch - 'a' + 0xA); /* Handle small hex digits */
- return (255);
-}
-
/* */
/* FUNCTION: AtoH(char *, u8 *, int) */
/* */
destTemp = (u8 *)dest;
while (destlen--) {
- *destTemp = BtoH(*srcptr++) << 4; /* Put 1st ascii byte in upper nibble. */
- *destTemp += BtoH(*srcptr++); /* Add 2nd ascii byte to above. */
+ *destTemp = hex_to_bin(*srcptr++) << 4; /* Put 1st ascii byte in upper nibble. */
+ *destTemp += hex_to_bin(*srcptr++); /* Add 2nd ascii byte to above. */
destTemp++;
}
}
void AtoH(char *src, u8 *dest, int destlen);
-u8 BtoH(char ch);
-
void RTMPPatchMacBbpBug(struct rt_rtmp_adapter *pAd);
void RTMPInitTimer(struct rt_rtmp_adapter *pAd,
#define ENDPOINT_ISOC_DATA 0x07
#define ENDPOINT_FIRMWARE 0x05
-#define hex2int(c) ( (c >= '0') && (c <= '9') ? (c - '0') : ((c & 0xf) + 9) )
-
struct speedtch_params {
unsigned int altsetting;
unsigned int BMaxDSL;
memset(atm_dev->esi, 0, sizeof(atm_dev->esi));
if (usb_string(usb_dev, usb_dev->descriptor.iSerialNumber, mac_str, sizeof(mac_str)) == 12) {
for (i = 0; i < 6; i++)
- atm_dev->esi[i] = (hex2int(mac_str[i * 2]) * 16) + (hex2int(mac_str[i * 2 + 1]));
+ atm_dev->esi[i] = (hex_to_bin(mac_str[i * 2]) << 4) +
+ hex_to_bin(mac_str[i * 2 + 1]);
}
/* Start modem synchronisation */
count = indirect->len / sizeof desc;
/* Buffers are chained via a 16 bit next field, so
* we can have at most 2^16 of these. */
- if (count > USHORT_MAX + 1) {
+ if (count > USHRT_MAX + 1) {
vq_err(vq, "Indirect buffer length too big: %d\n",
indirect->len);
return -E2BIG;
spinlock_t lock;
};
-static struct fb_fix_screeninfo arcfb_fix __initdata = {
+static struct fb_fix_screeninfo arcfb_fix __devinitdata = {
.id = "arcfb",
.type = FB_TYPE_PACKED_PIXELS,
.visual = FB_VISUAL_MONO01,
.accel = FB_ACCEL_NONE,
};
-static struct fb_var_screeninfo arcfb_var __initdata = {
+static struct fb_var_screeninfo arcfb_var __devinitdata = {
.xres = 128,
.yres = 64,
.xres_virtual = 128,
return retval;
}
-static int arcfb_remove(struct platform_device *dev)
+static int __devexit arcfb_remove(struct platform_device *dev)
{
struct fb_info *info = platform_get_drvdata(dev);
static struct platform_driver arcfb_driver = {
.probe = arcfb_probe,
- .remove = arcfb_remove,
+ .remove = __devexit_p(arcfb_remove),
.driver = {
.name = "arcfb",
},
#define ATYIO_FEATW 0x41545903 /* ATY\03 */
#endif
-#ifndef FBIO_WAITFORVSYNC
-#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
-#endif
-
static int atyfb_ioctl(struct fb_info *info, u_int cmd, u_long arg)
{
struct atyfb_par *par = (struct atyfb_par *) info->par;
#define LCD_X_RES 320 /* Horizontal Resolution */
#define LCD_Y_RES 240 /* Vertical Resolution */
#define DMA_BUS_SIZE 16
+#define U_LINE 4 /* Blanking Lines */
-#define USE_RGB565_16_BIT_PPI
-
-#ifdef USE_RGB565_16_BIT_PPI
-#define LCD_BPP 16 /* Bit Per Pixel */
-#define CLOCKS_PER_PIX 1
-#define CPLD_PIPELINE_DELAY_COR 0 /* NO CPLB */
-#endif
/* Interface 16/18-bit TFT over an 8-bit wide PPI using a small Programmable Logic Device (CPLD)
* http://blackfin.uclinux.org/gf/project/stamp/frs/?action=FrsReleaseBrowse&frs_package_id=165
*/
-#ifdef USE_RGB565_8_BIT_PPI
-#define LCD_BPP 16 /* Bit Per Pixel */
-#define CLOCKS_PER_PIX 2
-#define CPLD_PIPELINE_DELAY_COR 3 /* RGB565 */
-#endif
-
-#ifdef USE_RGB888_8_BIT_PPI
-#define LCD_BPP 24 /* Bit Per Pixel */
-#define CLOCKS_PER_PIX 3
-#define CPLD_PIPELINE_DELAY_COR 5 /* RGB888 */
-#endif
-
- /*
- * HS and VS timing parameters (all in number of PPI clk ticks)
- */
-
-#define U_LINE 4 /* Blanking Lines */
-
-#define H_ACTPIX (LCD_X_RES * CLOCKS_PER_PIX) /* active horizontal pixel */
-#define H_PERIOD (336 * CLOCKS_PER_PIX) /* HS period */
-#define H_PULSE (2 * CLOCKS_PER_PIX) /* HS pulse width */
-#define H_START (7 * CLOCKS_PER_PIX + CPLD_PIPELINE_DELAY_COR) /* first valid pixel */
-
-#define V_LINES (LCD_Y_RES + U_LINE) /* total vertical lines */
-#define V_PULSE (2 * CLOCKS_PER_PIX) /* VS pulse width (1-5 H_PERIODs) */
-#define V_PERIOD (H_PERIOD * V_LINES) /* VS period */
-
-#define ACTIVE_VIDEO_MEM_OFFSET ((U_LINE / 2) * LCD_X_RES * (LCD_BPP / 8))
#define BFIN_LCD_NBR_PALETTE_ENTRIES 256
#define PPI_PORT_CFG_01 0x10
#define PPI_POLS_1 0x8000
-#if (CLOCKS_PER_PIX > 1)
-#define PPI_PMODE (DLEN_8 | PACK_EN)
-#else
-#define PPI_PMODE (DLEN_16)
-#endif
-
#define LQ035_INDEX 0x74
#define LQ035_DATA 0x76
int irq;
spinlock_t lock; /* lock */
u32 pseudo_pal[16];
+
+ u32 lcd_bpp;
+ u32 h_actpix;
+ u32 h_period;
+ u32 h_pulse;
+ u32 h_start;
+ u32 v_lines;
+ u32 v_pulse;
+ u32 v_period;
};
static int nocursor;
return 0;
}
+static int bfin_lq035q1_calc_timing(struct bfin_lq035q1fb_info *fbi)
+{
+ unsigned long clocks_per_pix, cpld_pipeline_delay_cor;
+
+ /*
+ * Interface 16/18-bit TFT over an 8-bit wide PPI using a small
+ * Programmable Logic Device (CPLD)
+ * http://blackfin.uclinux.org/gf/project/stamp/frs/?action=FrsReleaseBrowse&frs_package_id=165
+ */
+
+ switch (fbi->disp_info->ppi_mode) {
+ case USE_RGB565_16_BIT_PPI:
+ fbi->lcd_bpp = 16;
+ clocks_per_pix = 1;
+ cpld_pipeline_delay_cor = 0;
+ break;
+ case USE_RGB565_8_BIT_PPI:
+ fbi->lcd_bpp = 16;
+ clocks_per_pix = 2;
+ cpld_pipeline_delay_cor = 3;
+ break;
+ case USE_RGB888_8_BIT_PPI:
+ fbi->lcd_bpp = 24;
+ clocks_per_pix = 3;
+ cpld_pipeline_delay_cor = 5;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * HS and VS timing parameters (all in number of PPI clk ticks)
+ */
+
+ fbi->h_actpix = (LCD_X_RES * clocks_per_pix); /* active horizontal pixel */
+ fbi->h_period = (336 * clocks_per_pix); /* HS period */
+ fbi->h_pulse = (2 * clocks_per_pix); /* HS pulse width */
+ fbi->h_start = (7 * clocks_per_pix + cpld_pipeline_delay_cor); /* first valid pixel */
+
+ fbi->v_lines = (LCD_Y_RES + U_LINE); /* total vertical lines */
+ fbi->v_pulse = (2 * clocks_per_pix); /* VS pulse width (1-5 H_PERIODs) */
+ fbi->v_period = (fbi->h_period * fbi->v_lines); /* VS period */
+
+ return 0;
+}
+
static void bfin_lq035q1_config_ppi(struct bfin_lq035q1fb_info *fbi)
{
- bfin_write_PPI_DELAY(H_START);
- bfin_write_PPI_COUNT(H_ACTPIX - 1);
- bfin_write_PPI_FRAME(V_LINES);
+ unsigned ppi_pmode;
+
+ if (fbi->disp_info->ppi_mode == USE_RGB565_16_BIT_PPI)
+ ppi_pmode = DLEN_16;
+ else
+ ppi_pmode = (DLEN_8 | PACK_EN);
+
+ bfin_write_PPI_DELAY(fbi->h_start);
+ bfin_write_PPI_COUNT(fbi->h_actpix - 1);
+ bfin_write_PPI_FRAME(fbi->v_lines);
bfin_write_PPI_CONTROL(PPI_TX_MODE | /* output mode , PORT_DIR */
PPI_XFER_TYPE_11 | /* sync mode XFR_TYPE */
PPI_PORT_CFG_01 | /* two frame sync PORT_CFG */
- PPI_PMODE | /* 8/16 bit data length / PACK_EN? */
+ ppi_pmode | /* 8/16 bit data length / PACK_EN? */
PPI_POLS_1); /* faling edge syncs POLS */
}
}
-static void bfin_lq035q1_init_timers(void)
+static void bfin_lq035q1_init_timers(struct bfin_lq035q1fb_info *fbi)
{
bfin_lq035q1_stop_timers();
- set_gptimer_period(TIMER_HSYNC_id, H_PERIOD);
- set_gptimer_pwidth(TIMER_HSYNC_id, H_PULSE);
+ set_gptimer_period(TIMER_HSYNC_id, fbi->h_period);
+ set_gptimer_pwidth(TIMER_HSYNC_id, fbi->h_pulse);
set_gptimer_config(TIMER_HSYNC_id, TIMER_MODE_PWM | TIMER_PERIOD_CNT |
TIMER_TIN_SEL | TIMER_CLK_SEL|
TIMER_EMU_RUN);
- set_gptimer_period(TIMER_VSYNC_id, V_PERIOD);
- set_gptimer_pwidth(TIMER_VSYNC_id, V_PULSE);
+ set_gptimer_period(TIMER_VSYNC_id, fbi->v_period);
+ set_gptimer_pwidth(TIMER_VSYNC_id, fbi->v_pulse);
set_gptimer_config(TIMER_VSYNC_id, TIMER_MODE_PWM | TIMER_PERIOD_CNT |
TIMER_TIN_SEL | TIMER_CLK_SEL |
TIMER_EMU_RUN);
static void bfin_lq035q1_config_dma(struct bfin_lq035q1fb_info *fbi)
{
+
set_dma_config(CH_PPI,
set_bfin_dma_config(DIR_READ, DMA_FLOW_AUTO,
INTR_DISABLE, DIMENSION_2D,
DATA_SIZE_16,
DMA_NOSYNC_KEEP_DMA_BUF));
- set_dma_x_count(CH_PPI, (LCD_X_RES * LCD_BPP) / DMA_BUS_SIZE);
+ set_dma_x_count(CH_PPI, (LCD_X_RES * fbi->lcd_bpp) / DMA_BUS_SIZE);
set_dma_x_modify(CH_PPI, DMA_BUS_SIZE / 8);
- set_dma_y_count(CH_PPI, V_LINES);
+ set_dma_y_count(CH_PPI, fbi->v_lines);
set_dma_y_modify(CH_PPI, DMA_BUS_SIZE / 8);
set_dma_start_addr(CH_PPI, (unsigned long)fbi->fb_buffer);
}
-#if (CLOCKS_PER_PIX == 1)
static const u16 ppi0_req_16[] = {P_PPI0_CLK, P_PPI0_FS1, P_PPI0_FS2,
P_PPI0_D0, P_PPI0_D1, P_PPI0_D2,
P_PPI0_D3, P_PPI0_D4, P_PPI0_D5,
P_PPI0_D9, P_PPI0_D10, P_PPI0_D11,
P_PPI0_D12, P_PPI0_D13, P_PPI0_D14,
P_PPI0_D15, 0};
-#else
-static const u16 ppi0_req_16[] = {P_PPI0_CLK, P_PPI0_FS1, P_PPI0_FS2,
+
+static const u16 ppi0_req_8[] = {P_PPI0_CLK, P_PPI0_FS1, P_PPI0_FS2,
P_PPI0_D0, P_PPI0_D1, P_PPI0_D2,
P_PPI0_D3, P_PPI0_D4, P_PPI0_D5,
P_PPI0_D6, P_PPI0_D7, 0};
-#endif
-static inline void bfin_lq035q1_free_ports(void)
+static inline void bfin_lq035q1_free_ports(unsigned ppi16)
{
- peripheral_free_list(ppi0_req_16);
+ if (ppi16)
+ peripheral_free_list(ppi0_req_16);
+ else
+ peripheral_free_list(ppi0_req_8);
+
if (ANOMALY_05000400)
gpio_free(P_IDENT(P_PPI0_FS3));
}
-static int __devinit bfin_lq035q1_request_ports(struct platform_device *pdev)
+static int __devinit bfin_lq035q1_request_ports(struct platform_device *pdev,
+ unsigned ppi16)
{
+ int ret;
/* ANOMALY_05000400 - PPI Does Not Start Properly In Specific Mode:
* Drive PPI_FS3 Low
*/
gpio_direction_output(P_IDENT(P_PPI0_FS3), 0);
}
- if (peripheral_request_list(ppi0_req_16, DRIVER_NAME)) {
+ if (ppi16)
+ ret = peripheral_request_list(ppi0_req_16, DRIVER_NAME);
+ else
+ ret = peripheral_request_list(ppi0_req_8, DRIVER_NAME);
+
+ if (ret) {
dev_err(&pdev->dev, "requesting peripherals failed\n");
return -EFAULT;
}
bfin_lq035q1_config_dma(fbi);
bfin_lq035q1_config_ppi(fbi);
- bfin_lq035q1_init_timers();
+ bfin_lq035q1_init_timers(fbi);
/* start dma */
enable_dma(CH_PPI);
static int bfin_lq035q1_fb_check_var(struct fb_var_screeninfo *var,
struct fb_info *info)
{
- switch (var->bits_per_pixel) {
-#if (LCD_BPP == 24)
- case 24:/* TRUECOLOUR, 16m */
-#else
- case 16:/* DIRECTCOLOUR, 64k */
-#endif
+ struct bfin_lq035q1fb_info *fbi = info->par;
+
+ if (var->bits_per_pixel == fbi->lcd_bpp) {
var->red.offset = info->var.red.offset;
var->green.offset = info->var.green.offset;
var->blue.offset = info->var.blue.offset;
var->red.msb_right = 0;
var->green.msb_right = 0;
var->blue.msb_right = 0;
- break;
- default:
+ } else {
pr_debug("%s: depth not supported: %u BPP\n", __func__,
var->bits_per_pixel);
return -EINVAL;
{
struct bfin_lq035q1fb_info *info;
struct fb_info *fbinfo;
+ u32 active_video_mem_offset;
int ret;
ret = request_dma(CH_PPI, DRIVER_NAME"_CH_PPI");
platform_set_drvdata(pdev, fbinfo);
+ ret = bfin_lq035q1_calc_timing(info);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Failed PPI Mode\n");
+ goto out3;
+ }
+
strcpy(fbinfo->fix.id, DRIVER_NAME);
fbinfo->fix.type = FB_TYPE_PACKED_PIXELS;
fbinfo->var.xres_virtual = LCD_X_RES;
fbinfo->var.yres = LCD_Y_RES;
fbinfo->var.yres_virtual = LCD_Y_RES;
- fbinfo->var.bits_per_pixel = LCD_BPP;
+ fbinfo->var.bits_per_pixel = info->lcd_bpp;
if (info->disp_info->mode & LQ035_BGR) {
-#if (LCD_BPP == 24)
- fbinfo->var.red.offset = 0;
- fbinfo->var.green.offset = 8;
- fbinfo->var.blue.offset = 16;
-#else
- fbinfo->var.red.offset = 0;
- fbinfo->var.green.offset = 5;
- fbinfo->var.blue.offset = 11;
-#endif
+ if (info->lcd_bpp == 24) {
+ fbinfo->var.red.offset = 0;
+ fbinfo->var.green.offset = 8;
+ fbinfo->var.blue.offset = 16;
+ } else {
+ fbinfo->var.red.offset = 0;
+ fbinfo->var.green.offset = 5;
+ fbinfo->var.blue.offset = 11;
+ }
} else {
-#if (LCD_BPP == 24)
- fbinfo->var.red.offset = 16;
- fbinfo->var.green.offset = 8;
- fbinfo->var.blue.offset = 0;
-#else
- fbinfo->var.red.offset = 11;
- fbinfo->var.green.offset = 5;
- fbinfo->var.blue.offset = 0;
-#endif
+ if (info->lcd_bpp == 24) {
+ fbinfo->var.red.offset = 16;
+ fbinfo->var.green.offset = 8;
+ fbinfo->var.blue.offset = 0;
+ } else {
+ fbinfo->var.red.offset = 11;
+ fbinfo->var.green.offset = 5;
+ fbinfo->var.blue.offset = 0;
+ }
}
fbinfo->var.transp.offset = 0;
-#if (LCD_BPP == 24)
- fbinfo->var.red.length = 8;
- fbinfo->var.green.length = 8;
- fbinfo->var.blue.length = 8;
-#else
- fbinfo->var.red.length = 5;
- fbinfo->var.green.length = 6;
- fbinfo->var.blue.length = 5;
-#endif
+ if (info->lcd_bpp == 24) {
+ fbinfo->var.red.length = 8;
+ fbinfo->var.green.length = 8;
+ fbinfo->var.blue.length = 8;
+ } else {
+ fbinfo->var.red.length = 5;
+ fbinfo->var.green.length = 6;
+ fbinfo->var.blue.length = 5;
+ }
fbinfo->var.transp.length = 0;
- fbinfo->fix.smem_len = LCD_X_RES * LCD_Y_RES * LCD_BPP / 8
- + ACTIVE_VIDEO_MEM_OFFSET;
+ active_video_mem_offset = ((U_LINE / 2) * LCD_X_RES * (info->lcd_bpp / 8));
+
+ fbinfo->fix.smem_len = LCD_X_RES * LCD_Y_RES * info->lcd_bpp / 8
+ + active_video_mem_offset;
fbinfo->fix.line_length = fbinfo->var.xres_virtual *
fbinfo->var.bits_per_pixel / 8;
goto out3;
}
- fbinfo->screen_base = (void *)info->fb_buffer + ACTIVE_VIDEO_MEM_OFFSET;
- fbinfo->fix.smem_start = (int)info->fb_buffer + ACTIVE_VIDEO_MEM_OFFSET;
+ fbinfo->screen_base = (void *)info->fb_buffer + active_video_mem_offset;
+ fbinfo->fix.smem_start = (int)info->fb_buffer + active_video_mem_offset;
fbinfo->fbops = &bfin_lq035q1_fb_ops;
goto out4;
}
- ret = bfin_lq035q1_request_ports(pdev);
+ ret = bfin_lq035q1_request_ports(pdev,
+ info->disp_info->ppi_mode == USE_RGB565_16_BIT_PPI);
if (ret) {
dev_err(&pdev->dev, "couldn't request gpio port\n");
goto out6;
}
dev_info(&pdev->dev, "%dx%d %d-bit RGB FrameBuffer initialized\n",
- LCD_X_RES, LCD_Y_RES, LCD_BPP);
+ LCD_X_RES, LCD_Y_RES, info->lcd_bpp);
return 0;
out8:
free_irq(info->irq, info);
out7:
- bfin_lq035q1_free_ports();
+ bfin_lq035q1_free_ports(info->disp_info->ppi_mode ==
+ USE_RGB565_16_BIT_PPI);
out6:
fb_dealloc_cmap(&fbinfo->cmap);
out4:
fb_dealloc_cmap(&fbinfo->cmap);
- bfin_lq035q1_free_ports();
+ bfin_lq035q1_free_ports(info->disp_info->ppi_mode ==
+ USE_RGB565_16_BIT_PPI);
platform_set_drvdata(pdev, NULL);
framebuffer_release(fbinfo);
bfin_lq035q1_config_dma(info);
bfin_lq035q1_config_ppi(info);
- bfin_lq035q1_init_timers();
+ bfin_lq035q1_init_timers(info);
/* start dma */
enable_dma(CH_PPI);
#define DRIVER_NAME "da8xx_lcdc"
/* LCD Status Register */
+#define LCD_END_OF_FRAME1 BIT(9)
#define LCD_END_OF_FRAME0 BIT(8)
+#define LCD_PL_LOAD_DONE BIT(6)
#define LCD_FIFO_UNDERFLOW BIT(5)
#define LCD_SYNC_LOST BIT(2)
#define LCD_PALETTE_LOAD_MODE(x) ((x) << 20)
#define PALETTE_AND_DATA 0x00
#define PALETTE_ONLY 0x01
+#define DATA_ONLY 0x02
#define LCD_MONO_8BIT_MODE BIT(9)
#define LCD_RASTER_ORDER BIT(8)
#define LCD_TFT_MODE BIT(7)
#define LCD_UNDERFLOW_INT_ENA BIT(6)
+#define LCD_PL_ENABLE BIT(4)
#define LCD_MONOCHROME_MODE BIT(1)
#define LCD_RASTER_ENABLE BIT(0)
#define LCD_TFT_ALT_ENABLE BIT(23)
#define LCD_DMA_CTRL_REG 0x40
#define LCD_DMA_FRM_BUF_BASE_ADDR_0_REG 0x44
#define LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG 0x48
+#define LCD_DMA_FRM_BUF_BASE_ADDR_1_REG 0x4C
+#define LCD_DMA_FRM_BUF_CEILING_ADDR_1_REG 0x50
+
+#define LCD_NUM_BUFFERS 2
#define WSI_TIMEOUT 50
#define PALETTE_SIZE 256
struct da8xx_fb_par {
resource_size_t p_palette_base;
unsigned char *v_palette_base;
+ dma_addr_t vram_phys;
+ unsigned long vram_size;
+ void *vram_virt;
+ unsigned int dma_start;
+ unsigned int dma_end;
struct clk *lcdc_clk;
int irq;
unsigned short pseudo_palette[16];
- unsigned int databuf_sz;
unsigned int palette_sz;
unsigned int pxl_clk;
int blank;
+ wait_queue_head_t vsync_wait;
+ int vsync_flag;
+ int vsync_timeout;
#ifdef CONFIG_CPU_FREQ
struct notifier_block freq_transition;
#endif
.type = FB_TYPE_PACKED_PIXELS,
.type_aux = 0,
.visual = FB_VISUAL_PSEUDOCOLOR,
- .xpanstep = 1,
+ .xpanstep = 0,
.ypanstep = 1,
- .ywrapstep = 1,
+ .ywrapstep = 0,
.accel = FB_ACCEL_NONE
};
static void lcd_blit(int load_mode, struct da8xx_fb_par *par)
{
- u32 tmp = par->p_palette_base + par->databuf_sz - 4;
- u32 reg;
+ u32 start;
+ u32 end;
+ u32 reg_ras;
+ u32 reg_dma;
+
+ /* init reg to clear PLM (loading mode) fields */
+ reg_ras = lcdc_read(LCD_RASTER_CTRL_REG);
+ reg_ras &= ~(3 << 20);
+
+ reg_dma = lcdc_read(LCD_DMA_CTRL_REG);
+
+ if (load_mode == LOAD_DATA) {
+ start = par->dma_start;
+ end = par->dma_end;
+
+ reg_ras |= LCD_PALETTE_LOAD_MODE(DATA_ONLY);
+ reg_dma |= LCD_END_OF_FRAME_INT_ENA;
+ reg_dma |= LCD_DUAL_FRAME_BUFFER_ENABLE;
+
+ lcdc_write(start, LCD_DMA_FRM_BUF_BASE_ADDR_0_REG);
+ lcdc_write(end, LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG);
+ lcdc_write(start, LCD_DMA_FRM_BUF_BASE_ADDR_1_REG);
+ lcdc_write(end, LCD_DMA_FRM_BUF_CEILING_ADDR_1_REG);
+ } else if (load_mode == LOAD_PALETTE) {
+ start = par->p_palette_base;
+ end = start + par->palette_sz - 1;
+
+ reg_ras |= LCD_PALETTE_LOAD_MODE(PALETTE_ONLY);
+ reg_ras |= LCD_PL_ENABLE;
+
+ lcdc_write(start, LCD_DMA_FRM_BUF_BASE_ADDR_0_REG);
+ lcdc_write(end, LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG);
+ }
- /* Update the databuf in the hw. */
- lcdc_write(par->p_palette_base, LCD_DMA_FRM_BUF_BASE_ADDR_0_REG);
- lcdc_write(tmp, LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG);
+ lcdc_write(reg_dma, LCD_DMA_CTRL_REG);
+ lcdc_write(reg_ras, LCD_RASTER_CTRL_REG);
- /* Start the DMA. */
- reg = lcdc_read(LCD_RASTER_CTRL_REG);
- reg &= ~(3 << 20);
- if (load_mode == LOAD_DATA)
- reg |= LCD_PALETTE_LOAD_MODE(PALETTE_AND_DATA);
- else if (load_mode == LOAD_PALETTE)
- reg |= LCD_PALETTE_LOAD_MODE(PALETTE_ONLY);
-
- lcdc_write(reg, LCD_RASTER_CTRL_REG);
+ /*
+ * The Raster enable bit must be set after all other control fields are
+ * set.
+ */
+ lcd_enable_raster();
}
/* Configure the Burst Size of DMA */
static int lcd_cfg_frame_buffer(struct da8xx_fb_par *par, u32 width, u32 height,
u32 bpp, u32 raster_order)
{
- u32 bpl, reg;
+ u32 reg;
- /* Disable Dual Frame Buffer. */
- reg = lcdc_read(LCD_DMA_CTRL_REG);
- lcdc_write(reg & ~LCD_DUAL_FRAME_BUFFER_ENABLE,
- LCD_DMA_CTRL_REG);
/* Set the Panel Width */
/* Pixels per line = (PPL + 1)*16 */
/*0x3F in bits 4..9 gives max horisontal resolution = 1024 pixels*/
return -EINVAL;
}
- bpl = width * bpp / 8;
- par->databuf_sz = height * bpl + par->palette_sz;
-
return 0;
}
struct fb_info *info)
{
struct da8xx_fb_par *par = info->par;
- unsigned short *palette = (unsigned short *)par->v_palette_base;
+ unsigned short *palette = (unsigned short *) par->v_palette_base;
u_short pal;
+ int update_hw = 0;
if (regno > 255)
return 1;
pal |= (green & 0x00f0);
pal |= (blue & 0x000f);
- palette[regno] = pal;
-
+ if (palette[regno] != pal) {
+ update_hw = 1;
+ palette[regno] = pal;
+ }
} else if ((info->var.bits_per_pixel == 16) && regno < 16) {
red >>= (16 - info->var.red.length);
red <<= info->var.red.offset;
par->pseudo_palette[regno] = red | green | blue;
- palette[0] = 0x4000;
+ if (palette[0] != 0x4000) {
+ update_hw = 1;
+ palette[0] = 0x4000;
+ }
}
+ /* Update the palette in the h/w as needed. */
+ if (update_hw)
+ lcd_blit(LOAD_PALETTE, par);
+
return 0;
}
static irqreturn_t lcdc_irq_handler(int irq, void *arg)
{
+ struct da8xx_fb_par *par = arg;
u32 stat = lcdc_read(LCD_STAT_REG);
+ u32 reg_ras;
if ((stat & LCD_SYNC_LOST) && (stat & LCD_FIFO_UNDERFLOW)) {
lcd_disable_raster();
lcdc_write(stat, LCD_STAT_REG);
lcd_enable_raster();
- } else
+ } else if (stat & LCD_PL_LOAD_DONE) {
+ /*
+ * Must disable raster before changing state of any control bit.
+ * And also must be disabled before clearing the PL loading
+ * interrupt via the following write to the status register. If
+ * this is done after then one gets multiple PL done interrupts.
+ */
+ lcd_disable_raster();
+
lcdc_write(stat, LCD_STAT_REG);
+ /* Disable PL completion inerrupt */
+ reg_ras = lcdc_read(LCD_RASTER_CTRL_REG);
+ reg_ras &= ~LCD_PL_ENABLE;
+ lcdc_write(reg_ras, LCD_RASTER_CTRL_REG);
+
+ /* Setup and start data loading mode */
+ lcd_blit(LOAD_DATA, par);
+ } else {
+ lcdc_write(stat, LCD_STAT_REG);
+
+ if (stat & LCD_END_OF_FRAME0) {
+ lcdc_write(par->dma_start,
+ LCD_DMA_FRM_BUF_BASE_ADDR_0_REG);
+ lcdc_write(par->dma_end,
+ LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG);
+ par->vsync_flag = 1;
+ wake_up_interruptible(&par->vsync_wait);
+ }
+
+ if (stat & LCD_END_OF_FRAME1) {
+ lcdc_write(par->dma_start,
+ LCD_DMA_FRM_BUF_BASE_ADDR_1_REG);
+ lcdc_write(par->dma_end,
+ LCD_DMA_FRM_BUF_CEILING_ADDR_1_REG);
+ par->vsync_flag = 1;
+ wake_up_interruptible(&par->vsync_wait);
+ }
+ }
+
return IRQ_HANDLED;
}
unregister_framebuffer(info);
fb_dealloc_cmap(&info->cmap);
- dma_free_coherent(NULL, par->databuf_sz + PAGE_SIZE,
- info->screen_base - PAGE_SIZE,
- info->fix.smem_start);
+ dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base,
+ par->p_palette_base);
+ dma_free_coherent(NULL, par->vram_size, par->vram_virt,
+ par->vram_phys);
free_irq(par->irq, par);
clk_disable(par->lcdc_clk);
clk_put(par->lcdc_clk);
return 0;
}
+/*
+ * Function to wait for vertical sync which for this LCD peripheral
+ * translates into waiting for the current raster frame to complete.
+ */
+static int fb_wait_for_vsync(struct fb_info *info)
+{
+ struct da8xx_fb_par *par = info->par;
+ int ret;
+
+ /*
+ * Set flag to 0 and wait for isr to set to 1. It would seem there is a
+ * race condition here where the ISR could have occured just before or
+ * just after this set. But since we are just coarsely waiting for
+ * a frame to complete then that's OK. i.e. if the frame completed
+ * just before this code executed then we have to wait another full
+ * frame time but there is no way to avoid such a situation. On the
+ * other hand if the frame completed just after then we don't need
+ * to wait long at all. Either way we are guaranteed to return to the
+ * user immediately after a frame completion which is all that is
+ * required.
+ */
+ par->vsync_flag = 0;
+ ret = wait_event_interruptible_timeout(par->vsync_wait,
+ par->vsync_flag != 0,
+ par->vsync_timeout);
+ if (ret < 0)
+ return ret;
+ if (ret == 0)
+ return -ETIMEDOUT;
+
+ return 0;
+}
+
static int fb_ioctl(struct fb_info *info, unsigned int cmd,
unsigned long arg)
{
sync_arg.pulse_width,
sync_arg.front_porch);
break;
+ case FBIO_WAITFORVSYNC:
+ return fb_wait_for_vsync(info);
default:
return -EINVAL;
}
return ret;
}
+/*
+ * Set new x,y offsets in the virtual display for the visible area and switch
+ * to the new mode.
+ */
+static int da8xx_pan_display(struct fb_var_screeninfo *var,
+ struct fb_info *fbi)
+{
+ int ret = 0;
+ struct fb_var_screeninfo new_var;
+ struct da8xx_fb_par *par = fbi->par;
+ struct fb_fix_screeninfo *fix = &fbi->fix;
+ unsigned int end;
+ unsigned int start;
+
+ if (var->xoffset != fbi->var.xoffset ||
+ var->yoffset != fbi->var.yoffset) {
+ memcpy(&new_var, &fbi->var, sizeof(new_var));
+ new_var.xoffset = var->xoffset;
+ new_var.yoffset = var->yoffset;
+ if (fb_check_var(&new_var, fbi))
+ ret = -EINVAL;
+ else {
+ memcpy(&fbi->var, &new_var, sizeof(new_var));
+
+ start = fix->smem_start +
+ new_var.yoffset * fix->line_length +
+ new_var.xoffset * var->bits_per_pixel / 8;
+ end = start + var->yres * fix->line_length - 1;
+ par->dma_start = start;
+ par->dma_end = end;
+ }
+ }
+
+ return ret;
+}
+
static struct fb_ops da8xx_fb_ops = {
.owner = THIS_MODULE,
.fb_check_var = fb_check_var,
.fb_setcolreg = fb_setcolreg,
+ .fb_pan_display = da8xx_pan_display,
.fb_ioctl = fb_ioctl,
.fb_fillrect = cfb_fillrect,
.fb_copyarea = cfb_copyarea,
}
/* allocate frame buffer */
- da8xx_fb_info->screen_base = dma_alloc_coherent(NULL,
- par->databuf_sz + PAGE_SIZE,
- (resource_size_t *)
- &da8xx_fb_info->fix.smem_start,
- GFP_KERNEL | GFP_DMA);
-
- if (!da8xx_fb_info->screen_base) {
+ par->vram_size = lcdc_info->width * lcdc_info->height * lcd_cfg->bpp;
+ par->vram_size = PAGE_ALIGN(par->vram_size/8);
+ par->vram_size = par->vram_size * LCD_NUM_BUFFERS;
+
+ par->vram_virt = dma_alloc_coherent(NULL,
+ par->vram_size,
+ (resource_size_t *) &par->vram_phys,
+ GFP_KERNEL | GFP_DMA);
+ if (!par->vram_virt) {
dev_err(&device->dev,
"GLCD: kmalloc for frame buffer failed\n");
ret = -EINVAL;
goto err_release_fb;
}
- /* move palette base pointer by (PAGE_SIZE - palette_sz) bytes */
- par->v_palette_base = da8xx_fb_info->screen_base +
- (PAGE_SIZE - par->palette_sz);
- par->p_palette_base = da8xx_fb_info->fix.smem_start +
- (PAGE_SIZE - par->palette_sz);
-
- /* the rest of the frame buffer is pixel data */
- da8xx_fb_info->screen_base = par->v_palette_base + par->palette_sz;
- da8xx_fb_fix.smem_start = par->p_palette_base + par->palette_sz;
- da8xx_fb_fix.smem_len = par->databuf_sz - par->palette_sz;
- da8xx_fb_fix.line_length = (lcdc_info->width * lcd_cfg->bpp) / 8;
+ da8xx_fb_info->screen_base = (char __iomem *) par->vram_virt;
+ da8xx_fb_fix.smem_start = par->vram_phys;
+ da8xx_fb_fix.smem_len = par->vram_size;
+ da8xx_fb_fix.line_length = (lcdc_info->width * lcd_cfg->bpp) / 8;
+
+ par->dma_start = par->vram_phys;
+ par->dma_end = par->dma_start + lcdc_info->height *
+ da8xx_fb_fix.line_length - 1;
+
+ /* allocate palette buffer */
+ par->v_palette_base = dma_alloc_coherent(NULL,
+ PALETTE_SIZE,
+ (resource_size_t *)
+ &par->p_palette_base,
+ GFP_KERNEL | GFP_DMA);
+ if (!par->v_palette_base) {
+ dev_err(&device->dev,
+ "GLCD: kmalloc for palette buffer failed\n");
+ ret = -EINVAL;
+ goto err_release_fb_mem;
+ }
+ memset(par->v_palette_base, 0, PALETTE_SIZE);
par->irq = platform_get_irq(device, 0);
if (par->irq < 0) {
ret = -ENOENT;
- goto err_release_fb_mem;
+ goto err_release_pl_mem;
}
ret = request_irq(par->irq, lcdc_irq_handler, 0, DRIVER_NAME, par);
if (ret)
- goto err_release_fb_mem;
+ goto err_release_pl_mem;
/* Initialize par */
da8xx_fb_info->var.bits_per_pixel = lcd_cfg->bpp;
da8xx_fb_var.xres = lcdc_info->width;
da8xx_fb_var.xres_virtual = lcdc_info->width;
- da8xx_fb_var.yres = lcdc_info->height;
- da8xx_fb_var.yres_virtual = lcdc_info->height;
+ da8xx_fb_var.yres = lcdc_info->height;
+ da8xx_fb_var.yres_virtual = lcdc_info->height * LCD_NUM_BUFFERS;
da8xx_fb_var.grayscale =
lcd_cfg->p_disp_panel->panel_shade == MONOCHROME ? 1 : 0;
ret = fb_alloc_cmap(&da8xx_fb_info->cmap, PALETTE_SIZE, 0);
if (ret)
goto err_free_irq;
-
- /* First palette_sz byte of the frame buffer is the palette */
da8xx_fb_info->cmap.len = par->palette_sz;
- /* Flush the buffer to the screen. */
- lcd_blit(LOAD_DATA, par);
-
/* initialize var_screeninfo */
da8xx_fb_var.activate = FB_ACTIVATE_FORCE;
fb_set_var(da8xx_fb_info, &da8xx_fb_var);
dev_set_drvdata(&device->dev, da8xx_fb_info);
+
+ /* initialize the vsync wait queue */
+ init_waitqueue_head(&par->vsync_wait);
+ par->vsync_timeout = HZ / 5;
+
/* Register the Frame Buffer */
if (register_framebuffer(da8xx_fb_info) < 0) {
dev_err(&device->dev,
goto err_cpu_freq;
}
#endif
-
- /* enable raster engine */
- lcd_enable_raster();
-
return 0;
#ifdef CONFIG_CPU_FREQ
err_free_irq:
free_irq(par->irq, par);
+err_release_pl_mem:
+ dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base,
+ par->p_palette_base);
+
err_release_fb_mem:
- dma_free_coherent(NULL, par->databuf_sz + PAGE_SIZE,
- da8xx_fb_info->screen_base - PAGE_SIZE,
- da8xx_fb_info->fix.smem_start);
+ dma_free_coherent(NULL, par->vram_size, par->vram_virt, par->vram_phys);
err_release_fb:
framebuffer_release(da8xx_fb_info);
{
struct fb_info *info = container_of(work, struct fb_info,
deferred_work.work);
- struct list_head *node, *next;
- struct page *cur;
struct fb_deferred_io *fbdefio = info->fbdefio;
+ struct page *page, *tmp_page;
+ struct list_head *node, *tmp_node;
+ struct list_head non_dirty;
+
+ INIT_LIST_HEAD(&non_dirty);
/* here we mkclean the pages, then do all deferred IO */
mutex_lock(&fbdefio->lock);
- list_for_each_entry(cur, &fbdefio->pagelist, lru) {
- lock_page(cur);
- page_mkclean(cur);
- unlock_page(cur);
+ list_for_each_entry_safe(page, tmp_page, &fbdefio->pagelist, lru) {
+ lock_page(page);
+ /*
+ * The workqueue callback can be triggered after a
+ * ->page_mkwrite() call but before the PTE has been marked
+ * dirty. In this case page_mkclean() won't "rearm" the page.
+ *
+ * To avoid this, remove those "non-dirty" pages from the
+ * pagelist before calling the driver's callback, then add
+ * them back to get processed on the next work iteration.
+ * At that time, their PTEs will hopefully be dirty for real.
+ */
+ if (!page_mkclean(page))
+ list_move_tail(&page->lru, &non_dirty);
+ unlock_page(page);
}
/* driver's callback with pagelist */
fbdefio->deferred_io(info, &fbdefio->pagelist);
- /* clear the list */
- list_for_each_safe(node, next, &fbdefio->pagelist) {
+ /* clear the list... */
+ list_for_each_safe(node, tmp_node, &fbdefio->pagelist) {
list_del(node);
}
+ /* ... and add back the "non-dirty" pages to the list */
+ list_splice_tail(&non_dirty, &fbdefio->pagelist);
mutex_unlock(&fbdefio->lock);
}
void fb_deferred_io_cleanup(struct fb_info *info)
{
struct fb_deferred_io *fbdefio = info->fbdefio;
+ struct list_head *node, *tmp_node;
struct page *page;
int i;
cancel_delayed_work(&info->deferred_work);
flush_scheduled_work();
+ /* the list may have still some non-dirty pages at this point */
+ mutex_lock(&fbdefio->lock);
+ list_for_each_safe(node, tmp_node, &fbdefio->pagelist) {
+ list_del(node);
+ }
+ mutex_unlock(&fbdefio->lock);
+
/* clear out the mapping that we setup */
for (i = 0 ; i < info->fix.smem_len; i += PAGE_SIZE) {
page = fb_deferred_io_page(info, i);
/* Framebuffer driver structures */
-static struct fb_var_screeninfo __initdata hga_default_var = {
+static struct fb_var_screeninfo hga_default_var __devinitdata = {
.xres = 720,
.yres = 348,
.xres_virtual = 720,
.width = -1,
};
-static struct fb_fix_screeninfo __initdata hga_fix = {
+static struct fb_fix_screeninfo hga_fix __devinitdata = {
.id = "HGA",
.type = FB_TYPE_PACKED_PIXELS, /* (not sure) */
.visual = FB_VISUAL_MONO10,
spin_unlock_irqrestore(&hga_reg_lock, flags);
}
-static int __init hga_card_detect(void)
+static int __devinit hga_card_detect(void)
{
int count = 0;
void __iomem *p, *q;
return 0;
}
-static int hgafb_remove(struct platform_device *pdev)
+static int __devexit hgafb_remove(struct platform_device *pdev)
{
struct fb_info *info = platform_get_drvdata(pdev);
static struct platform_driver hgafb_driver = {
.probe = hgafb_probe,
- .remove = hgafb_remove,
+ .remove = __devexit_p(hgafb_remove),
.driver = {
.name = "hgafb",
},
#define WIDTH 640
-static struct fb_var_screeninfo hitfb_var __initdata = {
+static struct fb_var_screeninfo hitfb_var __devinitdata = {
.activate = FB_ACTIVATE_NOW,
.height = -1,
.width = -1,
.vmode = FB_VMODE_NONINTERLACED,
};
-static struct fb_fix_screeninfo hitfb_fix __initdata = {
+static struct fb_fix_screeninfo hitfb_fix __devinitdata = {
.id = "Hitachi HD64461",
.type = FB_TYPE_PACKED_PIXELS,
.accel = FB_ACCEL_NONE,
return ret;
}
-static int __exit hitfb_remove(struct platform_device *dev)
+static int __devexit hitfb_remove(struct platform_device *dev)
{
struct fb_info *info = platform_get_drvdata(dev);
static struct platform_driver hitfb_driver = {
.probe = hitfb_probe,
- .remove = __exit_p(hitfb_remove),
+ .remove = __devexit_p(hitfb_remove),
.driver = {
.name = "hitfb",
.owner = THIS_MODULE,
((dinfo)->chipset == INTEL_965G) || \
((dinfo)->chipset == INTEL_965GM))
-#ifndef FBIO_WAITFORVSYNC
-#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
-#endif
-
/*** function prototypes ***/
extern int intelfb_var_to_depth(const struct fb_var_screeninfo *var);
release_regs:
iounmap(fbi->io);
release_mem_region:
- release_mem_region((unsigned long)fbi->mem, size);
+ release_mem_region(res->start, size);
free_fb:
framebuffer_release(fbinfo);
return ret;
* cache. Once this area is remapped, all virtual memory
* access to the video memory should occur at the new region.
*/
-static int __init s3c2410fb_map_video_memory(struct fb_info *info)
+static int __devinit s3c2410fb_map_video_memory(struct fb_info *info)
{
struct s3c2410fb_info *fbi = info->par;
dma_addr_t map_dma;
static char driver_name[] = "s3c2410fb";
-static int __init s3c24xxfb_probe(struct platform_device *pdev,
+static int __devinit s3c24xxfb_probe(struct platform_device *pdev,
enum s3c_drv_type drv_type)
{
struct s3c2410fb_info *info;
/*
* Cleanup
*/
-static int s3c2410fb_remove(struct platform_device *pdev)
+static int __devexit s3c2410fb_remove(struct platform_device *pdev)
{
struct fb_info *fbinfo = platform_get_drvdata(pdev);
struct s3c2410fb_info *info = fbinfo->par;
static struct platform_driver s3c2410fb_driver = {
.probe = s3c2410fb_probe,
- .remove = s3c2410fb_remove,
+ .remove = __devexit_p(s3c2410fb_remove),
.suspend = s3c2410fb_suspend,
.resume = s3c2410fb_resume,
.driver = {
static struct platform_driver s3c2412fb_driver = {
.probe = s3c2412fb_probe,
- .remove = s3c2410fb_remove,
+ .remove = __devexit_p(s3c2410fb_remove),
.suspend = s3c2410fb_suspend,
.resume = s3c2410fb_resume,
.driver = {
static int flatpanel_id = -1;
-static struct fb_fix_screeninfo sgivwfb_fix __initdata = {
+static struct fb_fix_screeninfo sgivwfb_fix __devinitdata = {
.id = "SGI Vis WS FB",
.type = FB_TYPE_PACKED_PIXELS,
.visual = FB_VISUAL_PSEUDOCOLOR,
.line_length = 640,
};
-static struct fb_var_screeninfo sgivwfb_var __initdata = {
+static struct fb_var_screeninfo sgivwfb_var __devinitdata = {
/* 640x480, 8 bpp */
.xres = 640,
.yres = 480,
.vmode = FB_VMODE_NONINTERLACED
};
-static struct fb_var_screeninfo sgivwfb_var1600sw __initdata = {
+static struct fb_var_screeninfo sgivwfb_var1600sw __devinitdata = {
/* 1600x1024, 8 bpp */
.xres = 1600,
.yres = 1024,
return -ENXIO;
}
-static int sgivwfb_remove(struct platform_device *dev)
+static int __devexit sgivwfb_remove(struct platform_device *dev)
{
struct fb_info *info = platform_get_drvdata(dev);
static struct platform_driver sgivwfb_driver = {
.probe = sgivwfb_probe,
- .remove = sgivwfb_remove,
+ .remove = __devexit_p(sgivwfb_remove),
.driver = {
.name = "sgivwfb",
},
memset(fix, 0, sizeof(struct fb_fix_screeninfo));
- strcpy(fix->id, ivideo->myid);
+ strlcpy(fix->id, ivideo->myid, sizeof(fix->id));
mutex_lock(&info->mm_lock);
fix->smem_start = ivideo->video_base + ivideo->video_offset;
vfree(mem);
}
-static struct fb_var_screeninfo vfb_default __initdata = {
+static struct fb_var_screeninfo vfb_default __devinitdata = {
.xres = 640,
.yres = 480,
.xres_virtual = 640,
.vmode = FB_VMODE_NONINTERLACED,
};
-static struct fb_fix_screeninfo vfb_fix __initdata = {
+static struct fb_fix_screeninfo vfb_fix __devinitdata = {
.id = "Virtual FB",
.type = FB_TYPE_PACKED_PIXELS,
.visual = FB_VISUAL_PSEUDOCOLOR,
/* --------------------------------------------------------------------- */
-static struct fb_var_screeninfo vga16fb_defined __initdata = {
+static struct fb_var_screeninfo vga16fb_defined __devinitdata = {
.xres = 640,
.yres = 480,
.xres_virtual = 640,
};
/* name should not depend on EGA/VGA */
-static struct fb_fix_screeninfo vga16fb_fix __initdata = {
+static struct fb_fix_screeninfo vga16fb_fix __devinitdata = {
.id = "VGA16 VGA",
.smem_start = VGA_FB_PHYS,
.smem_len = VGA_FB_PHYS_LEN,
};
#ifndef MODULE
-static int vga16fb_setup(char *options)
+static int __init vga16fb_setup(char *options)
{
char *this_opt;
return ret;
}
-static int vga16fb_remove(struct platform_device *dev)
+static int __devexit vga16fb_remove(struct platform_device *dev)
{
struct fb_info *info = platform_get_drvdata(dev);
static struct platform_driver vga16fb_driver = {
.probe = vga16fb_probe,
- .remove = vga16fb_remove,
+ .remove = __devexit_p(vga16fb_remove),
.driver = {
.name = "vga16fb",
},
static void w100_update_disable(void);
static void calc_hsync(struct w100fb_par *par);
static void w100_init_graphic_engine(struct w100fb_par *par);
-struct w100_pll_info *w100_get_xtal_table(unsigned int freq);
+struct w100_pll_info *w100_get_xtal_table(unsigned int freq) __devinit;
/* Pseudo palette size */
#define MAX_PALETTES 16
}
-static int w100fb_remove(struct platform_device *pdev)
+static int __devexit w100fb_remove(struct platform_device *pdev)
{
struct fb_info *info = platform_get_drvdata(pdev);
struct w100fb_par *par=info->par;
{ 0 },
};
-struct w100_pll_info *w100_get_xtal_table(unsigned int freq)
+struct w100_pll_info __devinit *w100_get_xtal_table(unsigned int freq)
{
struct pll_entries *pll_entry = w100_pll_tables;
static struct platform_driver w100fb_driver = {
.probe = w100fb_probe,
- .remove = w100fb_remove,
+ .remove = __devexit_p(w100fb_remove),
.suspend = w100fb_suspend,
.resume = w100fb_resume,
.driver = {
},
};
-int __devinit w100fb_init(void)
+int __init w100fb_init(void)
{
return platform_driver_register(&w100fb_driver);
}
Watchdog timer embedded into KS8695 processor. This will reboot your
system when the timeout is reached.
+config HAVE_S3C2410_WATCHDOG
+ bool
+ help
+ This will include watchdog timer support for Samsung SoCs. If
+ you want to include watchdog support for any machine, kindly
+ select this in the respective mach-XXXX/Kconfig file.
+
config S3C2410_WATCHDOG
tristate "S3C2410 Watchdog"
- depends on ARCH_S3C2410
+ depends on ARCH_S3C2410 || HAVE_S3C2410_WATCHDOG
help
- Watchdog timer block in the Samsung S3C2410 chips. This will
- reboot the system when the timer expires with the watchdog
- enabled.
+ Watchdog timer block in the Samsung SoCs. This will reboot
+ the system when the timer expires with the watchdog enabled.
The driver is limited by the speed of the system's PCLK
signal, so with reasonably fast systems (PCLK around 50-66MHz)
help
Support for memory mapped max63{69,70,71,72,73,74} watchdog timer.
+config IMX2_WDT
+ tristate "IMX2+ Watchdog"
+ depends on ARCH_MX2 || ARCH_MX25 || ARCH_MX3 || ARCH_MX5
+ help
+ This is the driver for the hardware watchdog
+ on the Freescale IMX2 and later processors.
+ If you have one of these processors and wish to have
+ watchdog support enabled, say Y, otherwise say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called imx2_wdt.
+
# AVR32 Architecture
config AT32AP700X_WDT
obj-$(CONFIG_NUC900_WATCHDOG) += nuc900_wdt.o
obj-$(CONFIG_ADX_WATCHDOG) += adx_wdt.o
obj-$(CONFIG_TS72XX_WATCHDOG) += ts72xx_wdt.o
+obj-$(CONFIG_IMX2_WDT) += imx2_wdt.o
# AVR32 Architecture
obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
#include <linux/interrupt.h>
#include <linux/uaccess.h>
#include <asm/blackfin.h>
+#include <asm/bfin_watchdog.h>
#define stamp(fmt, args...) \
pr_debug("%s:%i: " fmt "\n", __func__, __LINE__, ## args)
# define bfin_write_WDOG_STAT(x) bfin_write_WDOGA_STAT(x)
#endif
-/* Bit in SWRST that indicates boot caused by watchdog */
-#define SWRST_RESET_WDOG 0x4000
-
-/* Bit in WDOG_CTL that indicates watchdog has expired (WDR0) */
-#define WDOG_EXPIRED 0x8000
-
-/* Masks for WDEV field in WDOG_CTL register */
-#define ICTL_RESET 0x0
-#define ICTL_NMI 0x2
-#define ICTL_GPI 0x4
-#define ICTL_NONE 0x6
-#define ICTL_MASK 0x6
-
-/* Masks for WDEN field in WDOG_CTL register */
-#define WDEN_MASK 0x0FF0
-#define WDEN_ENABLE 0x0000
-#define WDEN_DISABLE 0x0AD0
-
/* some defaults */
#define WATCHDOG_TIMEOUT 20
if (copy_to_user((void *)arg, &ident, sizeof(ident)))
return -EFAULT;
case WDIOC_GETSTATUS:
- return put_user(ident.options, p);
+ return put_user(0, p);
case WDIOC_GETBOOTSTATUS:
/* XXX: something is clearing TSR */
tmp = mfspr(SPRN_TSR) & TSR_WRS(3);
- /* returns 1 if last reset was caused by the WDT */
- return (tmp ? 1 : 0);
+ /* returns CARDRESET if last reset was caused by the WDT */
+ return (tmp ? WDIOF_CARDRESET : 0);
case WDIOC_SETOPTIONS:
if (get_user(tmp, p))
return -EINVAL;
/*
* You must set these - there is no sane way to probe for this board.
- * You can use eurwdt=x,y to set these now.
*/
static int io = 0x3f0;
outl(val32, SMI_EN); /* Needed to deactivate watchdog */
}
-static void supermicro_old_pre_keepalive(unsigned long acpibase)
-{
- /* Reload TCO Timer (done in iTCO_wdt_keepalive) + */
- /* Clear "Expire Flag" (Bit 3 of TC01_STS register) */
- outb(0x08, TCO1_STS);
-}
-
/*
* Vendor Support: 2
* Board: Super Micro Computer Inc. P4SBx, P4DPx
void iTCO_vendor_pre_keepalive(unsigned long acpibase, unsigned int heartbeat)
{
- if (vendorsupport == SUPERMICRO_OLD_BOARD)
- supermicro_old_pre_keepalive(acpibase);
- else if (vendorsupport == SUPERMICRO_NEW_BOARD)
+ if (vendorsupport == SUPERMICRO_NEW_BOARD)
supermicro_new_pre_set_heartbeat(heartbeat);
}
EXPORT_SYMBOL(iTCO_vendor_pre_keepalive);
/* Module and version information */
#define DRV_NAME "iTCO_wdt"
-#define DRV_VERSION "1.05"
+#define DRV_VERSION "1.06"
#define PFX DRV_NAME ": "
/* Includes */
#define WATCHDOG_HEARTBEAT 30 /* 30 sec default heartbeat */
static int heartbeat = WATCHDOG_HEARTBEAT; /* in seconds */
module_param(heartbeat, int, 0);
-MODULE_PARM_DESC(heartbeat, "Watchdog heartbeat in seconds. "
- "(2<heartbeat<39 (TCO v1) or 613 (TCO v2), default="
+MODULE_PARM_DESC(heartbeat, "Watchdog timeout in seconds. "
+ "5..76 (TCO v1) or 3..614 (TCO v2), default="
__MODULE_STRING(WATCHDOG_HEARTBEAT) ")");
static int nowayout = WATCHDOG_NOWAYOUT;
/* Reload the timer by writing to the TCO Timer Counter register */
if (iTCO_wdt_private.iTCO_version == 2)
outw(0x01, TCO_RLD);
- else if (iTCO_wdt_private.iTCO_version == 1)
+ else if (iTCO_wdt_private.iTCO_version == 1) {
+ /* Reset the timeout status bit so that the timer
+ * needs to count down twice again before rebooting */
+ outw(0x0008, TCO1_STS); /* write 1 to clear bit */
+
outb(0x01, TCO_RLD);
+ }
spin_unlock(&iTCO_wdt_private.io_lock);
return 0;
unsigned int tmrval;
tmrval = seconds_to_ticks(t);
+
+ /* For TCO v1 the timer counts down twice before rebooting */
+ if (iTCO_wdt_private.iTCO_version == 1)
+ tmrval /= 2;
+
/* from the specs: */
/* "Values of 0h-3h are ignored and should not be attempted" */
if (tmrval < 0x04)
spin_lock(&iTCO_wdt_private.io_lock);
val8 = inb(TCO_RLD);
val8 &= 0x3f;
+ if (!(inw(TCO1_STS) & 0x0008))
+ val8 += (inb(TCOv1_TMR) & 0x3f);
spin_unlock(&iTCO_wdt_private.io_lock);
*time_left = (val8 * 6) / 10;
TCOBASE);
/* Clear out the (probably old) status */
- outb(8, TCO1_STS); /* Clear the Time Out Status bit */
- outb(2, TCO2_STS); /* Clear SECOND_TO_STS bit */
- outb(4, TCO2_STS); /* Clear BOOT_STS bit */
+ outw(0x0008, TCO1_STS); /* Clear the Time Out Status bit */
+ outw(0x0002, TCO2_STS); /* Clear SECOND_TO_STS bit */
+ outw(0x0004, TCO2_STS); /* Clear BOOT_STS bit */
/* Make sure the watchdog is not running */
iTCO_wdt_stop();
if (iTCO_wdt_set_heartbeat(heartbeat)) {
iTCO_wdt_set_heartbeat(WATCHDOG_HEARTBEAT);
printk(KERN_INFO PFX
- "heartbeat value must be 2 < heartbeat < 39 (TCO v1) "
- "or 613 (TCO v2), using %d\n", heartbeat);
+ "timeout value out of range, using %d\n", heartbeat);
}
ret = misc_register(&iTCO_wdt_miscdev);
--- /dev/null
+/*
+ * Watchdog driver for IMX2 and later processors
+ *
+ * Copyright (C) 2010 Wolfram Sang, Pengutronix e.K. <w.sang@pengutronix.de>
+ *
+ * some parts adapted by similar drivers from Darius Augulis and Vladimir
+ * Zapolskiy, additional improvements by Wim Van Sebroeck.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * NOTE: MX1 has a slightly different Watchdog than MX2 and later:
+ *
+ * MX1: MX2+:
+ * ---- -----
+ * Registers: 32-bit 16-bit
+ * Stopable timer: Yes No
+ * Need to enable clk: No Yes
+ * Halt on suspend: Manual Can be automatic
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/platform_device.h>
+#include <linux/watchdog.h>
+#include <linux/clk.h>
+#include <linux/fs.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+#include <linux/timer.h>
+#include <linux/jiffies.h>
+#include <mach/hardware.h>
+
+#define DRIVER_NAME "imx2-wdt"
+
+#define IMX2_WDT_WCR 0x00 /* Control Register */
+#define IMX2_WDT_WCR_WT (0xFF << 8) /* -> Watchdog Timeout Field */
+#define IMX2_WDT_WCR_WRE (1 << 3) /* -> WDOG Reset Enable */
+#define IMX2_WDT_WCR_WDE (1 << 2) /* -> Watchdog Enable */
+
+#define IMX2_WDT_WSR 0x02 /* Service Register */
+#define IMX2_WDT_SEQ1 0x5555 /* -> service sequence 1 */
+#define IMX2_WDT_SEQ2 0xAAAA /* -> service sequence 2 */
+
+#define IMX2_WDT_MAX_TIME 128
+#define IMX2_WDT_DEFAULT_TIME 60 /* in seconds */
+
+#define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8)
+
+#define IMX2_WDT_STATUS_OPEN 0
+#define IMX2_WDT_STATUS_STARTED 1
+#define IMX2_WDT_EXPECT_CLOSE 2
+
+static struct {
+ struct clk *clk;
+ void __iomem *base;
+ unsigned timeout;
+ unsigned long status;
+ struct timer_list timer; /* Pings the watchdog when closed */
+} imx2_wdt;
+
+static struct miscdevice imx2_wdt_miscdev;
+
+static int nowayout = WATCHDOG_NOWAYOUT;
+module_param(nowayout, int, 0);
+MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
+ __MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
+
+
+static unsigned timeout = IMX2_WDT_DEFAULT_TIME;
+module_param(timeout, uint, 0);
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (default="
+ __MODULE_STRING(IMX2_WDT_DEFAULT_TIME) ")");
+
+static const struct watchdog_info imx2_wdt_info = {
+ .identity = "imx2+ watchdog",
+ .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE,
+};
+
+static inline void imx2_wdt_setup(void)
+{
+ u16 val = __raw_readw(imx2_wdt.base + IMX2_WDT_WCR);
+
+ /* Strip the old watchdog Time-Out value */
+ val &= ~IMX2_WDT_WCR_WT;
+ /* Generate reset if WDOG times out */
+ val &= ~IMX2_WDT_WCR_WRE;
+ /* Keep Watchdog Disabled */
+ val &= ~IMX2_WDT_WCR_WDE;
+ /* Set the watchdog's Time-Out value */
+ val |= WDOG_SEC_TO_COUNT(imx2_wdt.timeout);
+
+ __raw_writew(val, imx2_wdt.base + IMX2_WDT_WCR);
+
+ /* enable the watchdog */
+ val |= IMX2_WDT_WCR_WDE;
+ __raw_writew(val, imx2_wdt.base + IMX2_WDT_WCR);
+}
+
+static inline void imx2_wdt_ping(void)
+{
+ __raw_writew(IMX2_WDT_SEQ1, imx2_wdt.base + IMX2_WDT_WSR);
+ __raw_writew(IMX2_WDT_SEQ2, imx2_wdt.base + IMX2_WDT_WSR);
+}
+
+static void imx2_wdt_timer_ping(unsigned long arg)
+{
+ /* ping it every imx2_wdt.timeout / 2 seconds to prevent reboot */
+ imx2_wdt_ping();
+ mod_timer(&imx2_wdt.timer, jiffies + imx2_wdt.timeout * HZ / 2);
+}
+
+static void imx2_wdt_start(void)
+{
+ if (!test_and_set_bit(IMX2_WDT_STATUS_STARTED, &imx2_wdt.status)) {
+ /* at our first start we enable clock and do initialisations */
+ clk_enable(imx2_wdt.clk);
+
+ imx2_wdt_setup();
+ } else /* delete the timer that pings the watchdog after close */
+ del_timer_sync(&imx2_wdt.timer);
+
+ /* Watchdog is enabled - time to reload the timeout value */
+ imx2_wdt_ping();
+}
+
+static void imx2_wdt_stop(void)
+{
+ /* we don't need a clk_disable, it cannot be disabled once started.
+ * We use a timer to ping the watchdog while /dev/watchdog is closed */
+ imx2_wdt_timer_ping(0);
+}
+
+static void imx2_wdt_set_timeout(int new_timeout)
+{
+ u16 val = __raw_readw(imx2_wdt.base + IMX2_WDT_WCR);
+
+ /* set the new timeout value in the WSR */
+ val &= ~IMX2_WDT_WCR_WT;
+ val |= WDOG_SEC_TO_COUNT(new_timeout);
+ __raw_writew(val, imx2_wdt.base + IMX2_WDT_WCR);
+}
+
+static int imx2_wdt_open(struct inode *inode, struct file *file)
+{
+ if (test_and_set_bit(IMX2_WDT_STATUS_OPEN, &imx2_wdt.status))
+ return -EBUSY;
+
+ imx2_wdt_start();
+ return nonseekable_open(inode, file);
+}
+
+static int imx2_wdt_close(struct inode *inode, struct file *file)
+{
+ if (test_bit(IMX2_WDT_EXPECT_CLOSE, &imx2_wdt.status) && !nowayout)
+ imx2_wdt_stop();
+ else {
+ dev_crit(imx2_wdt_miscdev.parent,
+ "Unexpected close: Expect reboot!\n");
+ imx2_wdt_ping();
+ }
+
+ clear_bit(IMX2_WDT_EXPECT_CLOSE, &imx2_wdt.status);
+ clear_bit(IMX2_WDT_STATUS_OPEN, &imx2_wdt.status);
+ return 0;
+}
+
+static long imx2_wdt_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+ int __user *p = argp;
+ int new_value;
+
+ switch (cmd) {
+ case WDIOC_GETSUPPORT:
+ return copy_to_user(argp, &imx2_wdt_info,
+ sizeof(struct watchdog_info)) ? -EFAULT : 0;
+
+ case WDIOC_GETSTATUS:
+ case WDIOC_GETBOOTSTATUS:
+ return put_user(0, p);
+
+ case WDIOC_KEEPALIVE:
+ imx2_wdt_ping();
+ return 0;
+
+ case WDIOC_SETTIMEOUT:
+ if (get_user(new_value, p))
+ return -EFAULT;
+ if ((new_value < 1) || (new_value > IMX2_WDT_MAX_TIME))
+ return -EINVAL;
+ imx2_wdt_set_timeout(new_value);
+ imx2_wdt.timeout = new_value;
+ imx2_wdt_ping();
+
+ /* Fallthrough to return current value */
+ case WDIOC_GETTIMEOUT:
+ return put_user(imx2_wdt.timeout, p);
+
+ default:
+ return -ENOTTY;
+ }
+}
+
+static ssize_t imx2_wdt_write(struct file *file, const char __user *data,
+ size_t len, loff_t *ppos)
+{
+ size_t i;
+ char c;
+
+ if (len == 0) /* Can we see this even ? */
+ return 0;
+
+ clear_bit(IMX2_WDT_EXPECT_CLOSE, &imx2_wdt.status);
+ /* scan to see whether or not we got the magic character */
+ for (i = 0; i != len; i++) {
+ if (get_user(c, data + i))
+ return -EFAULT;
+ if (c == 'V')
+ set_bit(IMX2_WDT_EXPECT_CLOSE, &imx2_wdt.status);
+ }
+
+ imx2_wdt_ping();
+ return len;
+}
+
+static const struct file_operations imx2_wdt_fops = {
+ .owner = THIS_MODULE,
+ .llseek = no_llseek,
+ .unlocked_ioctl = imx2_wdt_ioctl,
+ .open = imx2_wdt_open,
+ .release = imx2_wdt_close,
+ .write = imx2_wdt_write,
+};
+
+static struct miscdevice imx2_wdt_miscdev = {
+ .minor = WATCHDOG_MINOR,
+ .name = "watchdog",
+ .fops = &imx2_wdt_fops,
+};
+
+static int __init imx2_wdt_probe(struct platform_device *pdev)
+{
+ int ret;
+ int res_size;
+ struct resource *res;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(&pdev->dev, "can't get device resources\n");
+ return -ENODEV;
+ }
+
+ res_size = resource_size(res);
+ if (!devm_request_mem_region(&pdev->dev, res->start, res_size,
+ res->name)) {
+ dev_err(&pdev->dev, "can't allocate %d bytes at %d address\n",
+ res_size, res->start);
+ return -ENOMEM;
+ }
+
+ imx2_wdt.base = devm_ioremap_nocache(&pdev->dev, res->start, res_size);
+ if (!imx2_wdt.base) {
+ dev_err(&pdev->dev, "ioremap failed\n");
+ return -ENOMEM;
+ }
+
+ imx2_wdt.clk = clk_get_sys("imx-wdt.0", NULL);
+ if (IS_ERR(imx2_wdt.clk)) {
+ dev_err(&pdev->dev, "can't get Watchdog clock\n");
+ return PTR_ERR(imx2_wdt.clk);
+ }
+
+ imx2_wdt.timeout = clamp_t(unsigned, timeout, 1, IMX2_WDT_MAX_TIME);
+ if (imx2_wdt.timeout != timeout)
+ dev_warn(&pdev->dev, "Initial timeout out of range! "
+ "Clamped from %u to %u\n", timeout, imx2_wdt.timeout);
+
+ setup_timer(&imx2_wdt.timer, imx2_wdt_timer_ping, 0);
+
+ imx2_wdt_miscdev.parent = &pdev->dev;
+ ret = misc_register(&imx2_wdt_miscdev);
+ if (ret)
+ goto fail;
+
+ dev_info(&pdev->dev,
+ "IMX2+ Watchdog Timer enabled. timeout=%ds (nowayout=%d)\n",
+ imx2_wdt.timeout, nowayout);
+ return 0;
+
+fail:
+ imx2_wdt_miscdev.parent = NULL;
+ clk_put(imx2_wdt.clk);
+ return ret;
+}
+
+static int __exit imx2_wdt_remove(struct platform_device *pdev)
+{
+ misc_deregister(&imx2_wdt_miscdev);
+
+ if (test_bit(IMX2_WDT_STATUS_STARTED, &imx2_wdt.status)) {
+ del_timer_sync(&imx2_wdt.timer);
+
+ dev_crit(imx2_wdt_miscdev.parent,
+ "Device removed: Expect reboot!\n");
+ } else
+ clk_put(imx2_wdt.clk);
+
+ imx2_wdt_miscdev.parent = NULL;
+ return 0;
+}
+
+static void imx2_wdt_shutdown(struct platform_device *pdev)
+{
+ if (test_bit(IMX2_WDT_STATUS_STARTED, &imx2_wdt.status)) {
+ /* we are running, we need to delete the timer but will give
+ * max timeout before reboot will take place */
+ del_timer_sync(&imx2_wdt.timer);
+ imx2_wdt_set_timeout(IMX2_WDT_MAX_TIME);
+ imx2_wdt_ping();
+
+ dev_crit(imx2_wdt_miscdev.parent,
+ "Device shutdown: Expect reboot!\n");
+ }
+}
+
+static struct platform_driver imx2_wdt_driver = {
+ .probe = imx2_wdt_probe,
+ .remove = __exit_p(imx2_wdt_remove),
+ .shutdown = imx2_wdt_shutdown,
+ .driver = {
+ .name = DRIVER_NAME,
+ .owner = THIS_MODULE,
+ },
+};
+
+static int __init imx2_wdt_init(void)
+{
+ return platform_driver_probe(&imx2_wdt_driver, imx2_wdt_probe);
+}
+module_init(imx2_wdt_init);
+
+static void __exit imx2_wdt_exit(void)
+{
+ platform_driver_unregister(&imx2_wdt_driver);
+}
+module_exit(imx2_wdt_exit);
+
+MODULE_AUTHOR("Wolfram Sang");
+MODULE_DESCRIPTION("Watchdog driver for IMX2 and later");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
+MODULE_ALIAS("platform:" DRIVER_NAME);
static u16 timeout = 0xffff;
module_param(timeout, ushort, 0);
MODULE_PARM_DESC(timeout,
- "Watchdog timeout in ticks. (0<timeout<65536, default=65535");
+ "Watchdog timeout in ticks. (0<timeout<65536, default=65535)");
static int reset = 1;
module_param(reset, bool, 0);
#define WDTO 0x11 /* Watchdog timeout register */
#define WDCFG 0x12 /* Watchdog config register */
-static int io = 0x2E; /* Address used on Portwell Boards */
+#define IO_DEFAULT 0x2E /* Address used on Portwell Boards */
+
+static int io = IO_DEFAULT;
static int timeout = DEFAULT_TIMEOUT; /* timeout value */
static unsigned long timer_enabled; /* is the timer enabled? */
MODULE_ALIAS_MISCDEV(WATCHDOG_MINOR);
module_param(io, int, 0);
-MODULE_PARM_DESC(io, MODNAME " I/O port (default: " __MODULE_STRING(io) ").");
+MODULE_PARM_DESC(io, MODNAME " I/O port (default: "
+ __MODULE_STRING(IO_DEFAULT) ").");
module_param(timeout, int, 0);
MODULE_PARM_DESC(timeout,
"Watchdog timeout in minutes (default="
- __MODULE_STRING(timeout) ").");
+ __MODULE_STRING(DEFAULT_TIMEOUT) ").");
module_param(nowayout, int, 0);
MODULE_PARM_DESC(nowayout,
#define PFX "pnx833x: "
#define WATCHDOG_TIMEOUT 30 /* 30 sec Maximum timeout */
#define WATCHDOG_COUNT_FREQUENCY 68000000U /* Watchdog counts at 68MHZ. */
+#define PNX_WATCHDOG_TIMEOUT (WATCHDOG_TIMEOUT * WATCHDOG_COUNT_FREQUENCY)
+#define PNX_TIMEOUT_VALUE 2040000000U
/** CONFIG block */
#define PNX833X_CONFIG (0x07000U)
static int pnx833x_wdt_alive;
/* Set default timeout in MHZ.*/
-static int pnx833x_wdt_timeout = (WATCHDOG_TIMEOUT * WATCHDOG_COUNT_FREQUENCY);
+static int pnx833x_wdt_timeout = PNX_WATCHDOG_TIMEOUT;
module_param(pnx833x_wdt_timeout, int, 0);
MODULE_PARM_DESC(timeout, "Watchdog timeout in Mhz. (68Mhz clock), default="
- __MODULE_STRING(pnx833x_wdt_timeout) "(30 seconds).");
+ __MODULE_STRING(PNX_TIMEOUT_VALUE) "(30 seconds).");
static int nowayout = WATCHDOG_NOWAYOUT;
module_param(nowayout, int, 0);
MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
-static int start_enabled = 1;
+#define START_DEFAULT 1
+static int start_enabled = START_DEFAULT;
module_param(start_enabled, int, 0);
MODULE_PARM_DESC(start_enabled, "Watchdog is started on module insertion "
- "(default=" __MODULE_STRING(start_enabled) ")");
+ "(default=" __MODULE_STRING(START_DEFAULT) ")");
static void pnx833x_wdt_start(void)
{
module_param(soft_noboot, int, 0);
module_param(debug, int, 0);
-MODULE_PARM_DESC(tmr_margin, "Watchdog tmr_margin in seconds. default="
+MODULE_PARM_DESC(tmr_margin, "Watchdog tmr_margin in seconds. (default="
__MODULE_STRING(CONFIG_S3C2410_WATCHDOG_DEFAULT_TIME) ")");
MODULE_PARM_DESC(tmr_atboot,
"Watchdog is started at boot time if set to 1, default="
MODULE_PARM_DESC(nowayout, "Watchdog cannot be stopped once started (default="
__MODULE_STRING(WATCHDOG_NOWAYOUT) ")");
MODULE_PARM_DESC(soft_noboot, "Watchdog action, set to 1 to ignore reboots, "
- "0 to reboot (default depends on ONLY_TESTING)");
-MODULE_PARM_DESC(debug, "Watchdog debug, set to >1 for debug, (default 0)");
+ "0 to reboot (default 0)");
+MODULE_PARM_DESC(debug, "Watchdog debug, set to >1 for debug (default 0)");
static unsigned long open_lock;
static struct device *wdt_dev; /* platform device attached to */
wdt_mem = request_mem_region(res->start, size, pdev->name);
if (wdt_mem == NULL) {
dev_err(dev, "failed to get memory region\n");
- ret = -ENOENT;
- goto err_req;
+ return -EBUSY;
}
wdt_base = ioremap(res->start, size);
module_param(clock_division_ratio, int, 0);
MODULE_PARM_DESC(clock_division_ratio,
"Clock division ratio. Valid ranges are from 0x5 (1.31ms) "
- "to 0x7 (5.25ms). (default=" __MODULE_STRING(clock_division_ratio) ")");
+ "to 0x7 (5.25ms). (default=" __MODULE_STRING(WTCSR_CKS_4096) ")");
module_param(heartbeat, int, 0);
MODULE_PARM_DESC(heartbeat,
twl4030_wdt_dev = pdev;
+ twl4030_wdt_disable(wdt);
+
ret = misc_register(&wdt->miscdev);
if (ret) {
dev_err(wdt->miscdev.parent,
static int type = 500;
module_param(type, int, 0);
MODULE_PARM_DESC(type,
- "WDT501-P Card type (500 or 501 , default=500)");
+ "WDT501-P Card type (500 or 501, default=500)");
/*
* Programming support
static DEFINE_SPINLOCK(spinlock);
module_param(timeout, int, 0);
-MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (60..15300), default="
+MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds (60..15300, default="
__MODULE_STRING(DEFAULT_TIMEOUT) ")");
module_param(testmode, int, 0);
MODULE_PARM_DESC(testmode, "Watchdog testmode (1 = no reboot), default=0");
kfree(str);
}
+#ifdef CONFIG_MAGIC_SYSRQ
static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
unsigned int len)
{
handle_sysrq(sysrq_key, NULL);
}
-static struct xenbus_watch shutdown_watch = {
- .node = "control/shutdown",
- .callback = shutdown_handler
-};
-
static struct xenbus_watch sysrq_watch = {
.node = "control/sysrq",
.callback = sysrq_handler
};
+#endif
+
+static struct xenbus_watch shutdown_watch = {
+ .node = "control/shutdown",
+ .callback = shutdown_handler
+};
static int setup_shutdown_watcher(void)
{
return err;
}
+#ifdef CONFIG_MAGIC_SYSRQ
err = register_xenbus_watch(&sysrq_watch);
if (err) {
printk(KERN_ERR "Failed to set sysrq watcher\n");
return err;
}
+#endif
return 0;
}
* use STACK_TOP because that can depend on attributes which aren't
* configured yet.
*/
+ BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
- vma->vm_flags = VM_STACK_FLAGS;
+ vma->vm_flags = VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
INIT_LIST_HEAD(&vma->anon_vma_chain);
err = insert_vm_struct(mm, vma);
else if (executable_stack == EXSTACK_DISABLE_X)
vm_flags &= ~VM_EXEC;
vm_flags |= mm->def_flags;
+ vm_flags |= VM_STACK_INCOMPLETE_SETUP;
ret = mprotect_fixup(vma, &prev, vma->vm_start, vma->vm_end,
vm_flags);
goto out_unlock;
}
+ /* mprotect_fixup is overkill to remove the temporary stack flags */
+ vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
+
stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
stack_size = vma->vm_end - vma->vm_start;
/*
while (*fclus < cluster) {
/* prevent the infinite loop of cluster chain */
if (*fclus > limit) {
- fat_fs_error(sb, "%s: detected the cluster chain loop"
- " (i_pos %lld)", __func__,
- MSDOS_I(inode)->i_pos);
+ fat_fs_error_ratelimit(sb,
+ "%s: detected the cluster chain loop"
+ " (i_pos %lld)", __func__,
+ MSDOS_I(inode)->i_pos);
nr = -EIO;
goto out;
}
if (nr < 0)
goto out;
else if (nr == FAT_ENT_FREE) {
- fat_fs_error(sb, "%s: invalid cluster chain"
- " (i_pos %lld)", __func__,
- MSDOS_I(inode)->i_pos);
+ fat_fs_error_ratelimit(sb, "%s: invalid cluster chain"
+ " (i_pos %lld)", __func__,
+ MSDOS_I(inode)->i_pos);
nr = -EIO;
goto out;
} else if (nr == FAT_ENT_EOF) {
#include <linux/nls.h>
#include <linux/fs.h>
#include <linux/mutex.h>
+#include <linux/ratelimit.h>
#include <linux/msdos_fs.h>
/*
struct fatent_operations *fatent_ops;
struct inode *fat_inode;
+ struct ratelimit_state ratelimit;
+
spinlock_t inode_hash_lock;
struct hlist_head inode_hashtable[FAT_HASH_SIZE];
};
extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
struct inode *i2);
/* fat/misc.c */
-extern void fat_fs_error(struct super_block *s, const char *fmt, ...)
- __attribute__ ((format (printf, 2, 3))) __cold;
+extern void
+__fat_fs_error(struct super_block *s, int report, const char *fmt, ...)
+ __attribute__ ((format (printf, 3, 4))) __cold;
+#define fat_fs_error(s, fmt, args...) \
+ __fat_fs_error(s, 1, fmt , ## args)
+#define fat_fs_error_ratelimit(s, fmt, args...) \
+ __fat_fs_error(s, __ratelimit(&MSDOS_SB(s)->ratelimit), fmt , ## args)
extern int fat_clusters_flush(struct super_block *sb);
extern int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster);
extern void fat_time_fat2unix(struct msdos_sb_info *sbi, struct timespec *ts,
sb->s_op = &fat_sops;
sb->s_export_op = &fat_export_ops;
sbi->dir_ops = fs_dir_inode_ops;
+ ratelimit_state_init(&sbi->ratelimit, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
error = parse_options(data, isvfat, silent, &debug, &sbi->options);
if (error)
* In case the file system is remounted read-only, it can be made writable
* again by remounting it.
*/
-void fat_fs_error(struct super_block *s, const char *fmt, ...)
+void __fat_fs_error(struct super_block *s, int report, const char *fmt, ...)
{
struct fat_mount_options *opts = &MSDOS_SB(s)->options;
va_list args;
- printk(KERN_ERR "FAT: Filesystem error (dev %s)\n", s->s_id);
+ if (report) {
+ printk(KERN_ERR "FAT: Filesystem error (dev %s)\n", s->s_id);
- printk(KERN_ERR " ");
- va_start(args, fmt);
- vprintk(fmt, args);
- va_end(args);
- printk("\n");
+ printk(KERN_ERR " ");
+ va_start(args, fmt);
+ vprintk(fmt, args);
+ va_end(args);
+ printk("\n");
+ }
if (opts->errors == FAT_ERRORS_PANIC)
- panic(" FAT fs panic from previous error\n");
+ panic("FAT: fs panic from previous error\n");
else if (opts->errors == FAT_ERRORS_RO && !(s->s_flags & MS_RDONLY)) {
s->s_flags |= MS_RDONLY;
- printk(KERN_ERR " File system has been set read-only\n");
+ printk(KERN_ERR "FAT: Filesystem has been set read-only\n");
}
}
-EXPORT_SYMBOL_GPL(fat_fs_error);
+EXPORT_SYMBOL_GPL(__fat_fs_error);
/* Flushes the number of free clusters on FAT32 */
/* XXX: Need to write one per FSINFO block. Currently only writes 1 */
wait_queue_head_t *wqh;
wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
- do {
+ while (inode->i_state & I_SYNC) {
spin_unlock(&inode_lock);
__wait_on_bit(wqh, &wq, inode_wait, TASK_UNINTERRUPTIBLE);
spin_lock(&inode_lock);
- } while (inode->i_state & I_SYNC);
+ }
}
/*
void *buffer, size_t size, int xtype)
{
struct inode *inode = dentry->d_inode;
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
struct posix_acl *acl;
int type;
int error;
+ if (!sdp->sd_args.ar_posix_acl)
+ return -EOPNOTSUPP;
+
type = gfs2_acl_type(name);
if (type < 0)
return type;
if (error)
goto out_drop_write;
+ error = -EACCES;
+ if (!is_owner_or_cap(inode))
+ goto out;
+
+ error = 0;
flags = ip->i_diskflags;
new_flags = (flags & ~mask) | (reqflags & mask);
if ((new_flags ^ flags) == 0)
{
struct inode *inode = filp->f_path.dentry->d_inode;
u32 fsflags, gfsflags;
+
if (get_user(fsflags, ptr))
return -EFAULT;
+
gfsflags = fsflags_cvt(fsflags_to_gfs2, fsflags);
if (!S_ISDIR(inode->i_mode)) {
if (gfsflags & GFS2_DIF_INHERIT_JDATA)
}
/**
- * gfs2_unlinked_inode_lookup - Lookup an unlinked inode for reclamation
+ * gfs2_process_unlinked_inode - Lookup an unlinked inode for reclamation
+ * and try to reclaim it by doing iput.
+ *
+ * This function assumes no rgrp locks are currently held.
+ *
* @sb: The super block
* no_addr: The inode number
- * @@inode: A pointer to the inode found, if any
*
- * Returns: 0 and *inode if no errors occurred. If an error occurs,
- * the resulting *inode may or may not be NULL.
*/
-int gfs2_unlinked_inode_lookup(struct super_block *sb, u64 no_addr,
- struct inode **inode)
+void gfs2_process_unlinked_inode(struct super_block *sb, u64 no_addr)
{
struct gfs2_sbd *sdp;
struct gfs2_inode *ip;
struct gfs2_glock *io_gl;
int error;
struct gfs2_holder gh;
+ struct inode *inode;
- *inode = gfs2_iget_skip(sb, no_addr);
+ inode = gfs2_iget_skip(sb, no_addr);
- if (!(*inode))
- return -ENOBUFS;
+ if (!inode)
+ return;
- if (!((*inode)->i_state & I_NEW))
- return -ENOBUFS;
+ /* If it's not a new inode, someone's using it, so leave it alone. */
+ if (!(inode->i_state & I_NEW)) {
+ iput(inode);
+ return;
+ }
- ip = GFS2_I(*inode);
- sdp = GFS2_SB(*inode);
+ ip = GFS2_I(inode);
+ sdp = GFS2_SB(inode);
ip->i_no_formal_ino = -1;
error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
set_bit(GIF_INVALID, &ip->i_flags);
error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, LM_FLAG_TRY | GL_EXACT,
&ip->i_iopen_gh);
- if (unlikely(error)) {
- if (error == GLR_TRYFAILED)
- error = 0;
+ if (unlikely(error))
goto fail_iopen;
- }
+
ip->i_iopen_gh.gh_gl->gl_object = ip;
gfs2_glock_put(io_gl);
- (*inode)->i_mode = DT2IF(DT_UNKNOWN);
+ inode->i_mode = DT2IF(DT_UNKNOWN);
/*
* We must read the inode in order to work out its type in
*/
error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY,
&gh);
- if (unlikely(error)) {
- if (error == GLR_TRYFAILED)
- error = 0;
+ if (unlikely(error))
goto fail_glock;
- }
+
/* Inode is now uptodate */
gfs2_glock_dq_uninit(&gh);
- gfs2_set_iop(*inode);
+ gfs2_set_iop(inode);
+
+ /* The iput will cause it to be deleted. */
+ iput(inode);
+ return;
- return 0;
fail_glock:
gfs2_glock_dq(&ip->i_iopen_gh);
fail_iopen:
ip->i_gl->gl_object = NULL;
gfs2_glock_put(ip->i_gl);
fail:
- return error;
+ iget_failed(inode);
+ return;
}
static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf)
extern void gfs2_set_iop(struct inode *inode);
extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type,
u64 no_addr, u64 no_formal_ino);
-extern int gfs2_unlinked_inode_lookup(struct super_block *sb, u64 no_addr,
- struct inode **inode);
+extern void gfs2_process_unlinked_inode(struct super_block *sb, u64 no_addr);
extern struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr);
extern int gfs2_inode_refresh(struct gfs2_inode *ip);
*
*/
-void __gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl)
+void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl)
{
struct gfs2_ail *ai;
sdp->sd_log_head = sdp->sd_log_tail = value;
}
-unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
+extern unsigned int gfs2_struct2blk(struct gfs2_sbd *sdp, unsigned int nstruct,
unsigned int ssize);
-int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
-void gfs2_log_incr_head(struct gfs2_sbd *sdp);
+extern int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks);
+extern void gfs2_log_incr_head(struct gfs2_sbd *sdp);
-struct buffer_head *gfs2_log_get_buf(struct gfs2_sbd *sdp);
-struct buffer_head *gfs2_log_fake_buf(struct gfs2_sbd *sdp,
+extern struct buffer_head *gfs2_log_get_buf(struct gfs2_sbd *sdp);
+extern struct buffer_head *gfs2_log_fake_buf(struct gfs2_sbd *sdp,
struct buffer_head *real);
-void __gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl);
+extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl);
+extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
+extern void gfs2_remove_from_ail(struct gfs2_bufdata *bd);
-static inline void gfs2_log_flush(struct gfs2_sbd *sbd, struct gfs2_glock *gl)
-{
- if (!gl || test_bit(GLF_LFLUSH, &gl->gl_flags))
- __gfs2_log_flush(sbd, gl);
-}
-
-void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
-void gfs2_remove_from_ail(struct gfs2_bufdata *bd);
-
-void gfs2_log_shutdown(struct gfs2_sbd *sdp);
-void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
-int gfs2_logd(void *data);
+extern void gfs2_log_shutdown(struct gfs2_sbd *sdp);
+extern void gfs2_meta_syncfs(struct gfs2_sbd *sdp);
+extern int gfs2_logd(void *data);
#endif /* __LOG_DOT_H__ */
{
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct gfs2_alloc *al = ip->i_alloc;
- struct inode *inode;
int error = 0;
u64 last_unlinked = NO_BLOCK, unlinked;
if (error)
return error;
+ /* Find an rgrp suitable for allocation. If it encounters any unlinked
+ dinodes along the way, error will equal -EAGAIN and unlinked will
+ contains it block address. We then need to look up that inode and
+ try to free it, and try the allocation again. */
error = get_local_rgrp(ip, &unlinked, &last_unlinked);
if (error) {
if (ip != GFS2_I(sdp->sd_rindex))
gfs2_glock_dq_uninit(&al->al_ri_gh);
if (error != -EAGAIN)
return error;
- error = gfs2_unlinked_inode_lookup(ip->i_inode.i_sb,
- unlinked, &inode);
- if (inode)
- iput(inode);
+
+ gfs2_process_unlinked_inode(ip->i_inode.i_sb, unlinked);
+ /* regardless of whether or not gfs2_process_unlinked_inode
+ was successful, we don't want to repeat it again. */
+ last_unlinked = unlinked;
gfs2_log_flush(sdp, NULL);
- if (error == GLR_TRYFAILED)
- error = 0;
+ error = 0;
+
goto try_again;
}
-
+ /* no error, so we have the rgrp set in the inode's allocation. */
al->al_file = file;
al->al_line = line;
goto out_nomem;
rc = strict_strtoul(string, 10, &option);
kfree(string);
- if (rc != 0 || option > USHORT_MAX)
+ if (rc != 0 || option > USHRT_MAX)
goto out_invalid_value;
mnt->nfs_server.port = option;
break;
goto out_nomem;
rc = strict_strtoul(string, 10, &option);
kfree(string);
- if (rc != 0 || option > USHORT_MAX)
+ if (rc != 0 || option > USHRT_MAX)
goto out_invalid_value;
mnt->mount_server.port = option;
break;
if (sscanf(buf, "%15s %4u", transport, &port) != 2)
return -EINVAL;
- if (port < 1 || port > USHORT_MAX)
+ if (port < 1 || port > USHRT_MAX)
return -EINVAL;
err = nfsd_create_serv();
if (sscanf(&buf[1], "%15s %4u", transport, &port) != 2)
return -EINVAL;
- if (port < 1 || port > USHORT_MAX || nfsd_serv == NULL)
+ if (port < 1 || port > USHRT_MAX || nfsd_serv == NULL)
return -EINVAL;
xprt = svc_find_xprt(nfsd_serv, transport, AF_UNSPEC, port);
* the page at all. For a more detailed explanation see ntfs_truncate() in
* fs/ntfs/inode.c.
*
- * @cached_page and @lru_pvec are just optimizations for dealing with multiple
- * pages.
- *
* Return 0 on success and -errno on error. In the case that an error is
* encountered it is possible that the initialized size will already have been
* incremented some way towards @new_init_size but it is guaranteed that if
* Locking: i_mutex on the vfs inode corrseponsind to the ntfs inode @ni must be
* held by the caller.
*/
-static int ntfs_attr_extend_initialized(ntfs_inode *ni, const s64 new_init_size,
- struct page **cached_page, struct pagevec *lru_pvec)
+static int ntfs_attr_extend_initialized(ntfs_inode *ni, const s64 new_init_size)
{
s64 old_init_size;
loff_t old_i_size;
* Obtain @nr_pages locked page cache pages from the mapping @mapping and
* starting at index @index.
*
- * If a page is newly created, increment its refcount and add it to the
- * caller's lru-buffering pagevec @lru_pvec.
- *
- * This is the same as mm/filemap.c::__grab_cache_page(), except that @nr_pages
- * are obtained at once instead of just one page and that 0 is returned on
- * success and -errno on error.
+ * If a page is newly created, add it to lru list
*
* Note, the page locks are obtained in ascending page index order.
*/
static inline int __ntfs_grab_cache_pages(struct address_space *mapping,
pgoff_t index, const unsigned nr_pages, struct page **pages,
- struct page **cached_page, struct pagevec *lru_pvec)
+ struct page **cached_page)
{
int err, nr;
goto err_out;
}
}
- err = add_to_page_cache(*cached_page, mapping, index,
+ err = add_to_page_cache_lru(*cached_page, mapping, index,
GFP_KERNEL);
if (unlikely(err)) {
if (err == -EEXIST)
goto err_out;
}
pages[nr] = *cached_page;
- page_cache_get(*cached_page);
- if (unlikely(!pagevec_add(lru_pvec, *cached_page)))
- __pagevec_lru_add_file(lru_pvec);
*cached_page = NULL;
}
index++;
ssize_t status, written;
unsigned nr_pages;
int err;
- struct pagevec lru_pvec;
ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, "
"pos 0x%llx, count 0x%lx.",
}
}
}
- pagevec_init(&lru_pvec, 0);
written = 0;
/*
* If the write starts beyond the initialized size, extend it up to the
ll = ni->initialized_size;
read_unlock_irqrestore(&ni->size_lock, flags);
if (pos > ll) {
- err = ntfs_attr_extend_initialized(ni, pos, &cached_page,
- &lru_pvec);
+ err = ntfs_attr_extend_initialized(ni, pos);
if (err < 0) {
ntfs_error(vol->sb, "Cannot perform write to inode "
"0x%lx, attribute type 0x%x, because "
ntfs_fault_in_pages_readable_iovec(iov, iov_ofs, bytes);
/* Get and lock @do_pages starting at index @start_idx. */
status = __ntfs_grab_cache_pages(mapping, start_idx, do_pages,
- pages, &cached_page, &lru_pvec);
+ pages, &cached_page);
if (unlikely(status))
break;
/*
*ppos = pos;
if (cached_page)
page_cache_release(cached_page);
- pagevec_lru_add_file(&lru_pvec);
ntfs_debug("Done. Returning %s (written 0x%lx, status %li).",
written ? "written" : "status", (unsigned long)written,
(long)status);
* No ecc'd ocfs2 structure is larger than 4K, so ecc will be no
* larger than 16 bits.
*/
- BUG_ON(ecc > USHORT_MAX);
+ BUG_ON(ecc > USHRT_MAX);
bc->bc_crc32e = cpu_to_le32(crc);
bc->bc_ecc = cpu_to_le16((u16)ecc);
* No ecc'd ocfs2 structure is larger than 4K, so ecc will be no
* larger than 16 bits.
*/
- BUG_ON(ecc > USHORT_MAX);
+ BUG_ON(ecc > USHRT_MAX);
bc->bc_crc32e = cpu_to_le32(crc);
bc->bc_ecc = cpu_to_le16((u16)ecc);
#include <linux/slab.h>
#include <linux/pagemap.h>
#include <linux/stringify.h>
+#include <linux/kernel.h>
#include "ldm.h"
#include "check.h"
#include "msdos.h"
int h;
/* high part */
- if ((x = src[0] - '0') <= '9'-'0') h = x;
- else if ((x = src[0] - 'a') <= 'f'-'a') h = x+10;
- else if ((x = src[0] - 'A') <= 'F'-'A') h = x+10;
- else return -1;
- h <<= 4;
+ x = h = hex_to_bin(src[0]);
+ if (h < 0)
+ return -1;
/* low part */
- if ((x = src[1] - '0') <= '9'-'0') return h | x;
- if ((x = src[1] - 'a') <= 'f'-'a') return h | (x+10);
- if ((x = src[1] - 'A') <= 'F'-'A') return h | (x+10);
- return -1;
+ h = hex_to_bin(src[1]);
+ if (h < 0)
+ return -1;
+
+ return (x << 4) + h;
}
/**
return err;
}
+#ifdef CONFIG_HUGETLB_PAGE
static u64 huge_pte_to_pagemap_entry(pte_t pte, int offset)
{
u64 pme = 0;
return err;
}
+#endif /* HUGETLB_PAGE */
/*
* /proc/pid/pagemap - an array mapping virtual pages to pfns
pagemap_walk.pmd_entry = pagemap_pte_range;
pagemap_walk.pte_hole = pagemap_pte_hole;
+#ifdef CONFIG_HUGETLB_PAGE
pagemap_walk.hugetlb_entry = pagemap_hugetlb_range;
+#endif
pagemap_walk.mm = mm;
pagemap_walk.private = ±
#include <linux/pagemap.h>
#include <linux/net.h>
#include <linux/namei.h>
-#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/system.h>
xfs_itable.o \
xfs_dfrag.o \
xfs_log.o \
+ xfs_log_cil.o \
xfs_log_recover.o \
xfs_mount.o \
xfs_mru_cache.o \
#include "xfs_sb.h"
#include "xfs_inum.h"
+#include "xfs_log.h"
#include "xfs_ag.h"
#include "xfs_dmapi.h"
#include "xfs_mount.h"
* Note that this in no way locks the underlying pages, so it is only
* useful for synchronizing concurrent use of buffer objects, not for
* synchronizing independent access to the underlying pages.
+ *
+ * If we come across a stale, pinned, locked buffer, we know that we
+ * are being asked to lock a buffer that has been reallocated. Because
+ * it is pinned, we know that the log has not been pushed to disk and
+ * hence it will still be locked. Rather than sleeping until someone
+ * else pushes the log, push it ourselves before trying to get the lock.
*/
void
xfs_buf_lock(
{
trace_xfs_buf_lock(bp, _RET_IP_);
+ if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE))
+ xfs_log_force(bp->b_mount, 0);
if (atomic_read(&bp->b_io_remaining))
blk_run_address_space(bp->b_target->bt_mapping);
down(&bp->b_sema);
#include "xfs_dmapi.h"
#include "xfs_sb.h"
#include "xfs_inum.h"
+#include "xfs_log.h"
#include "xfs_ag.h"
#include "xfs_mount.h"
#include "xfs_quota.h"
#define MNTOPT_DMAPI "dmapi" /* DMI enabled (DMAPI / XDSM) */
#define MNTOPT_XDSM "xdsm" /* DMI enabled (DMAPI / XDSM) */
#define MNTOPT_DMI "dmi" /* DMI enabled (DMAPI / XDSM) */
+#define MNTOPT_DELAYLOG "delaylog" /* Delayed loging enabled */
+#define MNTOPT_NODELAYLOG "nodelaylog" /* Delayed loging disabled */
/*
* Table driven mount option parser.
mp->m_flags |= XFS_MOUNT_DMAPI;
} else if (!strcmp(this_char, MNTOPT_DMI)) {
mp->m_flags |= XFS_MOUNT_DMAPI;
+ } else if (!strcmp(this_char, MNTOPT_DELAYLOG)) {
+ mp->m_flags |= XFS_MOUNT_DELAYLOG;
+ cmn_err(CE_WARN,
+ "Enabling EXPERIMENTAL delayed logging feature "
+ "- use at your own risk.\n");
+ } else if (!strcmp(this_char, MNTOPT_NODELAYLOG)) {
+ mp->m_flags &= ~XFS_MOUNT_DELAYLOG;
} else if (!strcmp(this_char, "ihashsize")) {
cmn_err(CE_WARN,
"XFS: ihashsize no longer used, option is deprecated.");
{ XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM },
{ XFS_MOUNT_DMAPI, "," MNTOPT_DMAPI },
{ XFS_MOUNT_GRPID, "," MNTOPT_GRPID },
+ { XFS_MOUNT_DELAYLOG, "," MNTOPT_DELAYLOG },
{ 0, NULL }
};
static struct proc_xfs_info xfs_info_unset[] = {
* but it is much faster.
*/
xfs_buf_item_zone = kmem_zone_init((sizeof(xfs_buf_log_item_t) +
- (((XFS_MAX_BLOCKSIZE / XFS_BLI_CHUNK) /
+ (((XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK) /
NBWORD) * sizeof(int))), "xfs_buf_item");
if (!xfs_buf_item_zone)
goto out_destroy_trans_zone;
);
+#define XFS_BUSY_SYNC \
+ { 0, "async" }, \
+ { 1, "sync" }
+
TRACE_EVENT(xfs_alloc_busy,
- TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
- xfs_extlen_t len, int slot),
- TP_ARGS(mp, agno, agbno, len, slot),
+ TP_PROTO(struct xfs_trans *trans, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_extlen_t len, int sync),
+ TP_ARGS(trans, agno, agbno, len, sync),
TP_STRUCT__entry(
__field(dev_t, dev)
+ __field(struct xfs_trans *, tp)
+ __field(int, tid)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, agbno)
__field(xfs_extlen_t, len)
- __field(int, slot)
+ __field(int, sync)
),
TP_fast_assign(
- __entry->dev = mp->m_super->s_dev;
+ __entry->dev = trans->t_mountp->m_super->s_dev;
+ __entry->tp = trans;
+ __entry->tid = trans->t_ticket->t_tid;
__entry->agno = agno;
__entry->agbno = agbno;
__entry->len = len;
- __entry->slot = slot;
+ __entry->sync = sync;
),
- TP_printk("dev %d:%d agno %u agbno %u len %u slot %d",
+ TP_printk("dev %d:%d trans 0x%p tid 0x%x agno %u agbno %u len %u %s",
MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->tp,
+ __entry->tid,
__entry->agno,
__entry->agbno,
__entry->len,
- __entry->slot)
+ __print_symbolic(__entry->sync, XFS_BUSY_SYNC))
);
-#define XFS_BUSY_STATES \
- { 0, "found" }, \
- { 1, "missing" }
-
TRACE_EVENT(xfs_alloc_unbusy,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
- int slot, int found),
- TP_ARGS(mp, agno, slot, found),
+ xfs_agblock_t agbno, xfs_extlen_t len),
+ TP_ARGS(mp, agno, agbno, len),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_agnumber_t, agno)
- __field(int, slot)
- __field(int, found)
+ __field(xfs_agblock_t, agbno)
+ __field(xfs_extlen_t, len)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->agno = agno;
- __entry->slot = slot;
- __entry->found = found;
+ __entry->agbno = agbno;
+ __entry->len = len;
),
- TP_printk("dev %d:%d agno %u slot %d %s",
+ TP_printk("dev %d:%d agno %u agbno %u len %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->agno,
- __entry->slot,
- __print_symbolic(__entry->found, XFS_BUSY_STATES))
+ __entry->agbno,
+ __entry->len)
);
+#define XFS_BUSY_STATES \
+ { 0, "missing" }, \
+ { 1, "found" }
+
TRACE_EVENT(xfs_alloc_busysearch,
- TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno,
- xfs_extlen_t len, xfs_lsn_t lsn),
- TP_ARGS(mp, agno, agbno, len, lsn),
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_extlen_t len, int found),
+ TP_ARGS(mp, agno, agbno, len, found),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, agbno)
__field(xfs_extlen_t, len)
- __field(xfs_lsn_t, lsn)
+ __field(int, found)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->agno = agno;
__entry->agbno = agbno;
__entry->len = len;
- __entry->lsn = lsn;
+ __entry->found = found;
),
- TP_printk("dev %d:%d agno %u agbno %u len %u force lsn 0x%llx",
+ TP_printk("dev %d:%d agno %u agbno %u len %u %s",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->agno,
__entry->agbno,
__entry->len,
+ __print_symbolic(__entry->found, XFS_BUSY_STATES))
+);
+
+TRACE_EVENT(xfs_trans_commit_lsn,
+ TP_PROTO(struct xfs_trans *trans),
+ TP_ARGS(trans),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(struct xfs_trans *, tp)
+ __field(xfs_lsn_t, lsn)
+ ),
+ TP_fast_assign(
+ __entry->dev = trans->t_mountp->m_super->s_dev;
+ __entry->tp = trans;
+ __entry->lsn = trans->t_commit_lsn;
+ ),
+ TP_printk("dev %d:%d trans 0x%p commit_lsn 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->tp,
__entry->lsn)
);
for (i = 0; i < q->qi_dqperchunk; i++, d++, curid++)
xfs_qm_dqinit_core(curid, type, d);
xfs_trans_dquot_buf(tp, bp,
- (type & XFS_DQ_USER ? XFS_BLI_UDQUOT_BUF :
- ((type & XFS_DQ_PROJ) ? XFS_BLI_PDQUOT_BUF :
- XFS_BLI_GDQUOT_BUF)));
+ (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF :
+ ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF :
+ XFS_BLF_GDQUOT_BUF)));
xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
}
} xfs_agfl_t;
/*
- * Busy block/extent entry. Used in perag to mark blocks that have been freed
- * but whose transactions aren't committed to disk yet.
+ * Busy block/extent entry. Indexed by a rbtree in perag to mark blocks that
+ * have been freed but whose transactions aren't committed to disk yet.
+ *
+ * Note that we use the transaction ID to record the transaction, not the
+ * transaction structure itself. See xfs_alloc_busy_insert() for details.
*/
-typedef struct xfs_perag_busy {
- xfs_agblock_t busy_start;
- xfs_extlen_t busy_length;
- struct xfs_trans *busy_tp; /* transaction that did the free */
-} xfs_perag_busy_t;
+struct xfs_busy_extent {
+ struct rb_node rb_node; /* ag by-bno indexed search tree */
+ struct list_head list; /* transaction busy extent list */
+ xfs_agnumber_t agno;
+ xfs_agblock_t bno;
+ xfs_extlen_t length;
+ xlog_tid_t tid; /* transaction that created this */
+};
/*
* Per-ag incore structure, copies of information in agf and agi,
xfs_agino_t pagl_leftrec;
xfs_agino_t pagl_rightrec;
#ifdef __KERNEL__
- spinlock_t pagb_lock; /* lock for pagb_list */
+ spinlock_t pagb_lock; /* lock for pagb_tree */
+ struct rb_root pagb_tree; /* ordered tree of busy extents */
atomic_t pagf_fstrms; /* # of filestreams active in this AG */
int pag_ici_reclaimable; /* reclaimable inodes */
#endif
int pagb_count; /* pagb slots in use */
- xfs_perag_busy_t pagb_list[XFS_PAGB_NUM_SLOTS]; /* unstable blocks */
} xfs_perag_t;
/*
#define XFSA_FIXUP_BNO_OK 1
#define XFSA_FIXUP_CNT_OK 2
-STATIC void
-xfs_alloc_search_busy(xfs_trans_t *tp,
- xfs_agnumber_t agno,
- xfs_agblock_t bno,
- xfs_extlen_t len);
+static int
+xfs_alloc_busy_search(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t bno, xfs_extlen_t len);
/*
* Prototypes for per-ag allocation routines
be32_to_cpu(agf->agf_length));
xfs_alloc_log_agf(args->tp, args->agbp,
XFS_AGF_FREEBLKS);
- /* search the busylist for these blocks */
- xfs_alloc_search_busy(args->tp, args->agno,
- args->agbno, args->len);
+ /*
+ * Search the busylist for these blocks and mark the
+ * transaction as synchronous if blocks are found. This
+ * avoids the need to block due to a synchronous log
+ * force to ensure correct ordering as the synchronous
+ * transaction will guarantee that for us.
+ */
+ if (xfs_alloc_busy_search(args->mp, args->agno,
+ args->agbno, args->len))
+ xfs_trans_set_sync(args->tp);
}
if (!args->isfl)
xfs_trans_mod_sb(args->tp,
* when the iclog commits to disk. If a busy block is allocated,
* the iclog is pushed up to the LSN that freed the block.
*/
- xfs_alloc_mark_busy(tp, agno, bno, len);
+ xfs_alloc_busy_insert(tp, agno, bno, len);
return 0;
error0:
*bnop = bno;
/*
- * As blocks are freed, they are added to the per-ag busy list
- * and remain there until the freeing transaction is committed to
- * disk. Now that we have allocated blocks, this list must be
- * searched to see if a block is being reused. If one is, then
- * the freeing transaction must be pushed to disk NOW by forcing
- * to disk all iclogs up that transaction's LSN.
+ * As blocks are freed, they are added to the per-ag busy list and
+ * remain there until the freeing transaction is committed to disk.
+ * Now that we have allocated blocks, this list must be searched to see
+ * if a block is being reused. If one is, then the freeing transaction
+ * must be pushed to disk before this transaction.
+ *
+ * We do this by setting the current transaction to a sync transaction
+ * which guarantees that the freeing transaction is on disk before this
+ * transaction. This is done instead of a synchronous log force here so
+ * that we don't sit and wait with the AGF locked in the transaction
+ * during the log force.
*/
- xfs_alloc_search_busy(tp, be32_to_cpu(agf->agf_seqno), bno, 1);
+ if (xfs_alloc_busy_search(mp, be32_to_cpu(agf->agf_seqno), bno, 1))
+ xfs_trans_set_sync(tp);
return 0;
}
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
spin_lock_init(&pag->pagb_lock);
pag->pagb_count = 0;
- memset(pag->pagb_list, 0, sizeof(pag->pagb_list));
+ pag->pagb_tree = RB_ROOT;
pag->pagf_init = 1;
}
#ifdef DEBUG
* list is reused, the transaction that freed it must be forced to disk
* before continuing to use the block.
*
- * xfs_alloc_mark_busy - add to the per-ag busy list
- * xfs_alloc_clear_busy - remove an item from the per-ag busy list
+ * xfs_alloc_busy_insert - add to the per-ag busy list
+ * xfs_alloc_busy_clear - remove an item from the per-ag busy list
+ * xfs_alloc_busy_search - search for a busy extent
+ */
+
+/*
+ * Insert a new extent into the busy tree.
+ *
+ * The busy extent tree is indexed by the start block of the busy extent.
+ * there can be multiple overlapping ranges in the busy extent tree but only
+ * ever one entry at a given start block. The reason for this is that
+ * multi-block extents can be freed, then smaller chunks of that extent
+ * allocated and freed again before the first transaction commit is on disk.
+ * If the exact same start block is freed a second time, we have to wait for
+ * that busy extent to pass out of the tree before the new extent is inserted.
+ * There are two main cases we have to handle here.
+ *
+ * The first case is a transaction that triggers a "free - allocate - free"
+ * cycle. This can occur during btree manipulations as a btree block is freed
+ * to the freelist, then allocated from the free list, then freed again. In
+ * this case, the second extxpnet free is what triggers the duplicate and as
+ * such the transaction IDs should match. Because the extent was allocated in
+ * this transaction, the transaction must be marked as synchronous. This is
+ * true for all cases where the free/alloc/free occurs in the one transaction,
+ * hence the addition of the ASSERT(tp->t_flags & XFS_TRANS_SYNC) to this case.
+ * This serves to catch violations of the second case quite effectively.
+ *
+ * The second case is where the free/alloc/free occur in different
+ * transactions. In this case, the thread freeing the extent the second time
+ * can't mark the extent busy immediately because it is already tracked in a
+ * transaction that may be committing. When the log commit for the existing
+ * busy extent completes, the busy extent will be removed from the tree. If we
+ * allow the second busy insert to continue using that busy extent structure,
+ * it can be freed before this transaction is safely in the log. Hence our
+ * only option in this case is to force the log to remove the existing busy
+ * extent from the list before we insert the new one with the current
+ * transaction ID.
+ *
+ * The problem we are trying to avoid in the free-alloc-free in separate
+ * transactions is most easily described with a timeline:
+ *
+ * Thread 1 Thread 2 Thread 3 xfslogd
+ * xact alloc
+ * free X
+ * mark busy
+ * commit xact
+ * free xact
+ * xact alloc
+ * alloc X
+ * busy search
+ * mark xact sync
+ * commit xact
+ * free xact
+ * force log
+ * checkpoint starts
+ * ....
+ * xact alloc
+ * free X
+ * mark busy
+ * finds match
+ * *** KABOOM! ***
+ * ....
+ * log IO completes
+ * unbusy X
+ * checkpoint completes
+ *
+ * By issuing a log force in thread 3 @ "KABOOM", the thread will block until
+ * the checkpoint completes, and the busy extent it matched will have been
+ * removed from the tree when it is woken. Hence it can then continue safely.
+ *
+ * However, to ensure this matching process is robust, we need to use the
+ * transaction ID for identifying transaction, as delayed logging results in
+ * the busy extent and transaction lifecycles being different. i.e. the busy
+ * extent is active for a lot longer than the transaction. Hence the
+ * transaction structure can be freed and reallocated, then mark the same
+ * extent busy again in the new transaction. In this case the new transaction
+ * will have a different tid but can have the same address, and hence we need
+ * to check against the tid.
+ *
+ * Future: for delayed logging, we could avoid the log force if the extent was
+ * first freed in the current checkpoint sequence. This, however, requires the
+ * ability to pin the current checkpoint in memory until this transaction
+ * commits to ensure that both the original free and the current one combine
+ * logically into the one checkpoint. If the checkpoint sequences are
+ * different, however, we still need to wait on a log force.
*/
void
-xfs_alloc_mark_busy(xfs_trans_t *tp,
- xfs_agnumber_t agno,
- xfs_agblock_t bno,
- xfs_extlen_t len)
+xfs_alloc_busy_insert(
+ struct xfs_trans *tp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t bno,
+ xfs_extlen_t len)
{
- xfs_perag_busy_t *bsy;
+ struct xfs_busy_extent *new;
+ struct xfs_busy_extent *busyp;
struct xfs_perag *pag;
- int n;
+ struct rb_node **rbp;
+ struct rb_node *parent;
+ int match;
- pag = xfs_perag_get(tp->t_mountp, agno);
- spin_lock(&pag->pagb_lock);
- /* search pagb_list for an open slot */
- for (bsy = pag->pagb_list, n = 0;
- n < XFS_PAGB_NUM_SLOTS;
- bsy++, n++) {
- if (bsy->busy_tp == NULL) {
- break;
- }
+ new = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
+ if (!new) {
+ /*
+ * No Memory! Since it is now not possible to track the free
+ * block, make this a synchronous transaction to insure that
+ * the block is not reused before this transaction commits.
+ */
+ trace_xfs_alloc_busy(tp, agno, bno, len, 1);
+ xfs_trans_set_sync(tp);
+ return;
}
- trace_xfs_alloc_busy(tp->t_mountp, agno, bno, len, n);
+ new->agno = agno;
+ new->bno = bno;
+ new->length = len;
+ new->tid = xfs_log_get_trans_ident(tp);
- if (n < XFS_PAGB_NUM_SLOTS) {
- bsy = &pag->pagb_list[n];
- pag->pagb_count++;
- bsy->busy_start = bno;
- bsy->busy_length = len;
- bsy->busy_tp = tp;
- xfs_trans_add_busy(tp, agno, n);
- } else {
+ INIT_LIST_HEAD(&new->list);
+
+ /* trace before insert to be able to see failed inserts */
+ trace_xfs_alloc_busy(tp, agno, bno, len, 0);
+
+ pag = xfs_perag_get(tp->t_mountp, new->agno);
+restart:
+ spin_lock(&pag->pagb_lock);
+ rbp = &pag->pagb_tree.rb_node;
+ parent = NULL;
+ busyp = NULL;
+ match = 0;
+ while (*rbp && match >= 0) {
+ parent = *rbp;
+ busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
+
+ if (new->bno < busyp->bno) {
+ /* may overlap, but exact start block is lower */
+ rbp = &(*rbp)->rb_left;
+ if (new->bno + new->length > busyp->bno)
+ match = busyp->tid == new->tid ? 1 : -1;
+ } else if (new->bno > busyp->bno) {
+ /* may overlap, but exact start block is higher */
+ rbp = &(*rbp)->rb_right;
+ if (bno < busyp->bno + busyp->length)
+ match = busyp->tid == new->tid ? 1 : -1;
+ } else {
+ match = busyp->tid == new->tid ? 1 : -1;
+ break;
+ }
+ }
+ if (match < 0) {
+ /* overlap marked busy in different transaction */
+ spin_unlock(&pag->pagb_lock);
+ xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ if (match > 0) {
/*
- * The busy list is full! Since it is now not possible to
- * track the free block, make this a synchronous transaction
- * to insure that the block is not reused before this
- * transaction commits.
+ * overlap marked busy in same transaction. Update if exact
+ * start block match, otherwise combine the busy extents into
+ * a single range.
*/
- xfs_trans_set_sync(tp);
- }
+ if (busyp->bno == new->bno) {
+ busyp->length = max(busyp->length, new->length);
+ spin_unlock(&pag->pagb_lock);
+ ASSERT(tp->t_flags & XFS_TRANS_SYNC);
+ xfs_perag_put(pag);
+ kmem_free(new);
+ return;
+ }
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ new->length = max(busyp->bno + busyp->length,
+ new->bno + new->length) -
+ min(busyp->bno, new->bno);
+ new->bno = min(busyp->bno, new->bno);
+ } else
+ busyp = NULL;
+ rb_link_node(&new->rb_node, parent, rbp);
+ rb_insert_color(&new->rb_node, &pag->pagb_tree);
+
+ list_add(&new->list, &tp->t_busy);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
+ kmem_free(busyp);
}
-void
-xfs_alloc_clear_busy(xfs_trans_t *tp,
- xfs_agnumber_t agno,
- int idx)
+/*
+ * Search for a busy extent within the range of the extent we are about to
+ * allocate. You need to be holding the busy extent tree lock when calling
+ * xfs_alloc_busy_search(). This function returns 0 for no overlapping busy
+ * extent, -1 for an overlapping but not exact busy extent, and 1 for an exact
+ * match. This is done so that a non-zero return indicates an overlap that
+ * will require a synchronous transaction, but it can still be
+ * used to distinguish between a partial or exact match.
+ */
+static int
+xfs_alloc_busy_search(
+ struct xfs_mount *mp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t bno,
+ xfs_extlen_t len)
{
struct xfs_perag *pag;
- xfs_perag_busy_t *list;
+ struct rb_node *rbp;
+ struct xfs_busy_extent *busyp;
+ int match = 0;
- ASSERT(idx < XFS_PAGB_NUM_SLOTS);
- pag = xfs_perag_get(tp->t_mountp, agno);
+ pag = xfs_perag_get(mp, agno);
spin_lock(&pag->pagb_lock);
- list = pag->pagb_list;
- trace_xfs_alloc_unbusy(tp->t_mountp, agno, idx, list[idx].busy_tp == tp);
-
- if (list[idx].busy_tp == tp) {
- list[idx].busy_tp = NULL;
- pag->pagb_count--;
+ rbp = pag->pagb_tree.rb_node;
+
+ /* find closest start bno overlap */
+ while (rbp) {
+ busyp = rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ if (bno < busyp->bno) {
+ /* may overlap, but exact start block is lower */
+ if (bno + len > busyp->bno)
+ match = -1;
+ rbp = rbp->rb_left;
+ } else if (bno > busyp->bno) {
+ /* may overlap, but exact start block is higher */
+ if (bno < busyp->bno + busyp->length)
+ match = -1;
+ rbp = rbp->rb_right;
+ } else {
+ /* bno matches busyp, length determines exact match */
+ match = (busyp->length == len) ? 1 : -1;
+ break;
+ }
}
-
spin_unlock(&pag->pagb_lock);
+ trace_xfs_alloc_busysearch(mp, agno, bno, len, !!match);
xfs_perag_put(pag);
+ return match;
}
-
-/*
- * If we find the extent in the busy list, force the log out to get the
- * extent out of the busy list so the caller can use it straight away.
- */
-STATIC void
-xfs_alloc_search_busy(xfs_trans_t *tp,
- xfs_agnumber_t agno,
- xfs_agblock_t bno,
- xfs_extlen_t len)
+void
+xfs_alloc_busy_clear(
+ struct xfs_mount *mp,
+ struct xfs_busy_extent *busyp)
{
struct xfs_perag *pag;
- xfs_perag_busy_t *bsy;
- xfs_agblock_t uend, bend;
- xfs_lsn_t lsn = 0;
- int cnt;
- pag = xfs_perag_get(tp->t_mountp, agno);
- spin_lock(&pag->pagb_lock);
- cnt = pag->pagb_count;
+ trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
+ busyp->length);
- /*
- * search pagb_list for this slot, skipping open slots. We have to
- * search the entire array as there may be multiple overlaps and
- * we have to get the most recent LSN for the log force to push out
- * all the transactions that span the range.
- */
- uend = bno + len - 1;
- for (cnt = 0; cnt < pag->pagb_count; cnt++) {
- bsy = &pag->pagb_list[cnt];
- if (!bsy->busy_tp)
- continue;
+ ASSERT(xfs_alloc_busy_search(mp, busyp->agno, busyp->bno,
+ busyp->length) == 1);
- bend = bsy->busy_start + bsy->busy_length - 1;
- if (bno > bend || uend < bsy->busy_start)
- continue;
+ list_del_init(&busyp->list);
- /* (start1,length1) within (start2, length2) */
- if (XFS_LSN_CMP(bsy->busy_tp->t_commit_lsn, lsn) > 0)
- lsn = bsy->busy_tp->t_commit_lsn;
- }
+ pag = xfs_perag_get(mp, busyp->agno);
+ spin_lock(&pag->pagb_lock);
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
- trace_xfs_alloc_busysearch(tp->t_mountp, agno, bno, len, lsn);
- /*
- * If a block was found, force the log through the LSN of the
- * transaction that freed the block
- */
- if (lsn)
- xfs_log_force_lsn(tp->t_mountp, lsn, XFS_LOG_SYNC);
+ kmem_free(busyp);
}
struct xfs_mount;
struct xfs_perag;
struct xfs_trans;
+struct xfs_busy_extent;
/*
* Freespace allocation types. Argument to xfs_alloc_[v]extent.
#ifdef __KERNEL__
void
-xfs_alloc_mark_busy(xfs_trans_t *tp,
+xfs_alloc_busy_insert(xfs_trans_t *tp,
xfs_agnumber_t agno,
xfs_agblock_t bno,
xfs_extlen_t len);
void
-xfs_alloc_clear_busy(xfs_trans_t *tp,
- xfs_agnumber_t ag,
- int idx);
+xfs_alloc_busy_clear(struct xfs_mount *mp, struct xfs_busy_extent *busyp);
#endif /* __KERNEL__ */
* disk. If a busy block is allocated, the iclog is pushed up to the
* LSN that freed the block.
*/
- xfs_alloc_mark_busy(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1);
+ xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1);
xfs_trans_agbtree_delta(cur->bc_tp, -1);
return 0;
}
nbytes = last - first + 1;
bfset(bip->bli_logged, first, nbytes);
for (x = 0; x < nbytes; x++) {
- chunk_num = byte >> XFS_BLI_SHIFT;
+ chunk_num = byte >> XFS_BLF_SHIFT;
word_num = chunk_num >> BIT_TO_WORD_SHIFT;
bit_num = chunk_num & (NBWORD - 1);
wordp = &(bip->bli_format.blf_data_map[word_num]);
* cancel flag in it.
*/
trace_xfs_buf_item_size_stale(bip);
- ASSERT(bip->bli_format.blf_flags & XFS_BLI_CANCEL);
+ ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
return 1;
}
} else if (next_bit != last_bit + 1) {
last_bit = next_bit;
nvecs++;
- } else if (xfs_buf_offset(bp, next_bit * XFS_BLI_CHUNK) !=
- (xfs_buf_offset(bp, last_bit * XFS_BLI_CHUNK) +
- XFS_BLI_CHUNK)) {
+ } else if (xfs_buf_offset(bp, next_bit * XFS_BLF_CHUNK) !=
+ (xfs_buf_offset(bp, last_bit * XFS_BLF_CHUNK) +
+ XFS_BLF_CHUNK)) {
last_bit = next_bit;
nvecs++;
} else {
vecp++;
nvecs = 1;
+ /*
+ * If it is an inode buffer, transfer the in-memory state to the
+ * format flags and clear the in-memory state. We do not transfer
+ * this state if the inode buffer allocation has not yet been committed
+ * to the log as setting the XFS_BLI_INODE_BUF flag will prevent
+ * correct replay of the inode allocation.
+ */
+ if (bip->bli_flags & XFS_BLI_INODE_BUF) {
+ if (!((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) &&
+ xfs_log_item_in_current_chkpt(&bip->bli_item)))
+ bip->bli_format.blf_flags |= XFS_BLF_INODE_BUF;
+ bip->bli_flags &= ~XFS_BLI_INODE_BUF;
+ }
+
if (bip->bli_flags & XFS_BLI_STALE) {
/*
* The buffer is stale, so all we need to log
* cancel flag in it.
*/
trace_xfs_buf_item_format_stale(bip);
- ASSERT(bip->bli_format.blf_flags & XFS_BLI_CANCEL);
+ ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
bip->bli_format.blf_size = nvecs;
return;
}
* keep counting and scanning.
*/
if (next_bit == -1) {
- buffer_offset = first_bit * XFS_BLI_CHUNK;
+ buffer_offset = first_bit * XFS_BLF_CHUNK;
vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
- vecp->i_len = nbits * XFS_BLI_CHUNK;
+ vecp->i_len = nbits * XFS_BLF_CHUNK;
vecp->i_type = XLOG_REG_TYPE_BCHUNK;
nvecs++;
break;
} else if (next_bit != last_bit + 1) {
- buffer_offset = first_bit * XFS_BLI_CHUNK;
+ buffer_offset = first_bit * XFS_BLF_CHUNK;
vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
- vecp->i_len = nbits * XFS_BLI_CHUNK;
+ vecp->i_len = nbits * XFS_BLF_CHUNK;
vecp->i_type = XLOG_REG_TYPE_BCHUNK;
nvecs++;
vecp++;
first_bit = next_bit;
last_bit = next_bit;
nbits = 1;
- } else if (xfs_buf_offset(bp, next_bit << XFS_BLI_SHIFT) !=
- (xfs_buf_offset(bp, last_bit << XFS_BLI_SHIFT) +
- XFS_BLI_CHUNK)) {
- buffer_offset = first_bit * XFS_BLI_CHUNK;
+ } else if (xfs_buf_offset(bp, next_bit << XFS_BLF_SHIFT) !=
+ (xfs_buf_offset(bp, last_bit << XFS_BLF_SHIFT) +
+ XFS_BLF_CHUNK)) {
+ buffer_offset = first_bit * XFS_BLF_CHUNK;
vecp->i_addr = xfs_buf_offset(bp, buffer_offset);
- vecp->i_len = nbits * XFS_BLI_CHUNK;
+ vecp->i_len = nbits * XFS_BLF_CHUNK;
vecp->i_type = XLOG_REG_TYPE_BCHUNK;
/* You would think we need to bump the nvecs here too, but we do not
* this number is used by recovery, and it gets confused by the boundary
}
/*
- * This is called to pin the buffer associated with the buf log
- * item in memory so it cannot be written out. Simply call bpin()
- * on the buffer to do this.
+ * This is called to pin the buffer associated with the buf log item in memory
+ * so it cannot be written out. Simply call bpin() on the buffer to do this.
+ *
+ * We also always take a reference to the buffer log item here so that the bli
+ * is held while the item is pinned in memory. This means that we can
+ * unconditionally drop the reference count a transaction holds when the
+ * transaction is completed.
*/
+
STATIC void
xfs_buf_item_pin(
xfs_buf_log_item_t *bip)
ASSERT(atomic_read(&bip->bli_refcount) > 0);
ASSERT((bip->bli_flags & XFS_BLI_LOGGED) ||
(bip->bli_flags & XFS_BLI_STALE));
+ atomic_inc(&bip->bli_refcount);
trace_xfs_buf_item_pin(bip);
xfs_bpin(bp);
}
ASSERT(XFS_BUF_VALUSEMA(bp) <= 0);
ASSERT(!(XFS_BUF_ISDELAYWRITE(bp)));
ASSERT(XFS_BUF_ISSTALE(bp));
- ASSERT(bip->bli_format.blf_flags & XFS_BLI_CANCEL);
+ ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
trace_xfs_buf_item_unpin_stale(bip);
/*
}
/*
- * Release the buffer associated with the buf log item.
- * If there is no dirty logged data associated with the
- * buffer recorded in the buf log item, then free the
- * buf log item and remove the reference to it in the
- * buffer.
+ * Release the buffer associated with the buf log item. If there is no dirty
+ * logged data associated with the buffer recorded in the buf log item, then
+ * free the buf log item and remove the reference to it in the buffer.
+ *
+ * This call ignores the recursion count. It is only called when the buffer
+ * should REALLY be unlocked, regardless of the recursion count.
*
- * This call ignores the recursion count. It is only called
- * when the buffer should REALLY be unlocked, regardless
- * of the recursion count.
+ * We unconditionally drop the transaction's reference to the log item. If the
+ * item was logged, then another reference was taken when it was pinned, so we
+ * can safely drop the transaction reference now. This also allows us to avoid
+ * potential races with the unpin code freeing the bli by not referencing the
+ * bli after we've dropped the reference count.
*
- * If the XFS_BLI_HOLD flag is set in the buf log item, then
- * free the log item if necessary but do not unlock the buffer.
- * This is for support of xfs_trans_bhold(). Make sure the
- * XFS_BLI_HOLD field is cleared if we don't free the item.
+ * If the XFS_BLI_HOLD flag is set in the buf log item, then free the log item
+ * if necessary but do not unlock the buffer. This is for support of
+ * xfs_trans_bhold(). Make sure the XFS_BLI_HOLD field is cleared if we don't
+ * free the item.
*/
STATIC void
xfs_buf_item_unlock(
bp = bip->bli_buf;
- /*
- * Clear the buffer's association with this transaction.
- */
+ /* Clear the buffer's association with this transaction. */
XFS_BUF_SET_FSPRIVATE2(bp, NULL);
/*
- * If this is a transaction abort, don't return early.
- * Instead, allow the brelse to happen.
- * Normally it would be done for stale (cancelled) buffers
- * at unpin time, but we'll never go through the pin/unpin
- * cycle if we abort inside commit.
+ * If this is a transaction abort, don't return early. Instead, allow
+ * the brelse to happen. Normally it would be done for stale
+ * (cancelled) buffers at unpin time, but we'll never go through the
+ * pin/unpin cycle if we abort inside commit.
*/
aborted = (bip->bli_item.li_flags & XFS_LI_ABORTED) != 0;
/*
- * If the buf item is marked stale, then don't do anything.
- * We'll unlock the buffer and free the buf item when the
- * buffer is unpinned for the last time.
+ * Before possibly freeing the buf item, determine if we should
+ * release the buffer at the end of this routine.
*/
- if (bip->bli_flags & XFS_BLI_STALE) {
- bip->bli_flags &= ~XFS_BLI_LOGGED;
- trace_xfs_buf_item_unlock_stale(bip);
- ASSERT(bip->bli_format.blf_flags & XFS_BLI_CANCEL);
- if (!aborted)
- return;
- }
+ hold = bip->bli_flags & XFS_BLI_HOLD;
+
+ /* Clear the per transaction state. */
+ bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD);
/*
- * Drop the transaction's reference to the log item if
- * it was not logged as part of the transaction. Otherwise
- * we'll drop the reference in xfs_buf_item_unpin() when
- * the transaction is really through with the buffer.
+ * If the buf item is marked stale, then don't do anything. We'll
+ * unlock the buffer and free the buf item when the buffer is unpinned
+ * for the last time.
*/
- if (!(bip->bli_flags & XFS_BLI_LOGGED)) {
- atomic_dec(&bip->bli_refcount);
- } else {
- /*
- * Clear the logged flag since this is per
- * transaction state.
- */
- bip->bli_flags &= ~XFS_BLI_LOGGED;
+ if (bip->bli_flags & XFS_BLI_STALE) {
+ trace_xfs_buf_item_unlock_stale(bip);
+ ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
+ if (!aborted) {
+ atomic_dec(&bip->bli_refcount);
+ return;
+ }
}
- /*
- * Before possibly freeing the buf item, determine if we should
- * release the buffer at the end of this routine.
- */
- hold = bip->bli_flags & XFS_BLI_HOLD;
trace_xfs_buf_item_unlock(bip);
/*
- * If the buf item isn't tracking any data, free it.
- * Otherwise, if XFS_BLI_HOLD is set clear it.
+ * If the buf item isn't tracking any data, free it, otherwise drop the
+ * reference we hold to it.
*/
if (xfs_bitmap_empty(bip->bli_format.blf_data_map,
- bip->bli_format.blf_map_size)) {
+ bip->bli_format.blf_map_size))
xfs_buf_item_relse(bp);
- } else if (hold) {
- bip->bli_flags &= ~XFS_BLI_HOLD;
- }
+ else
+ atomic_dec(&bip->bli_refcount);
- /*
- * Release the buffer if XFS_BLI_HOLD was not set.
- */
- if (!hold) {
+ if (!hold)
xfs_buf_relse(bp);
- }
}
/*
}
/*
- * chunks is the number of XFS_BLI_CHUNK size pieces
+ * chunks is the number of XFS_BLF_CHUNK size pieces
* the buffer can be divided into. Make sure not to
* truncate any pieces. map_size is the size of the
* bitmap needed to describe the chunks of the buffer.
*/
- chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLI_CHUNK - 1)) >> XFS_BLI_SHIFT);
+ chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLF_CHUNK - 1)) >> XFS_BLF_SHIFT);
map_size = (int)((chunks + NBWORD) >> BIT_TO_WORD_SHIFT);
bip = (xfs_buf_log_item_t*)kmem_zone_zalloc(xfs_buf_item_zone,
/*
* Convert byte offsets to bit numbers.
*/
- first_bit = first >> XFS_BLI_SHIFT;
- last_bit = last >> XFS_BLI_SHIFT;
+ first_bit = first >> XFS_BLF_SHIFT;
+ last_bit = last >> XFS_BLF_SHIFT;
/*
* Calculate the total number of bits to be set.
* This flag indicates that the buffer contains on disk inodes
* and requires special recovery handling.
*/
-#define XFS_BLI_INODE_BUF 0x1
+#define XFS_BLF_INODE_BUF 0x1
/*
* This flag indicates that the buffer should not be replayed
* during recovery because its blocks are being freed.
*/
-#define XFS_BLI_CANCEL 0x2
+#define XFS_BLF_CANCEL 0x2
/*
* This flag indicates that the buffer contains on disk
* user or group dquots and may require special recovery handling.
*/
-#define XFS_BLI_UDQUOT_BUF 0x4
-#define XFS_BLI_PDQUOT_BUF 0x8
-#define XFS_BLI_GDQUOT_BUF 0x10
+#define XFS_BLF_UDQUOT_BUF 0x4
+#define XFS_BLF_PDQUOT_BUF 0x8
+#define XFS_BLF_GDQUOT_BUF 0x10
-#define XFS_BLI_CHUNK 128
-#define XFS_BLI_SHIFT 7
+#define XFS_BLF_CHUNK 128
+#define XFS_BLF_SHIFT 7
#define BIT_TO_WORD_SHIFT 5
#define NBWORD (NBBY * sizeof(unsigned int))
#define XFS_BLI_LOGGED 0x08
#define XFS_BLI_INODE_ALLOC_BUF 0x10
#define XFS_BLI_STALE_INODE 0x20
+#define XFS_BLI_INODE_BUF 0x40
#define XFS_BLI_FLAGS \
{ XFS_BLI_HOLD, "HOLD" }, \
{ XFS_BLI_STALE, "STALE" }, \
{ XFS_BLI_LOGGED, "LOGGED" }, \
{ XFS_BLI_INODE_ALLOC_BUF, "INODE_ALLOC" }, \
- { XFS_BLI_STALE_INODE, "STALE_INODE" }
+ { XFS_BLI_STALE_INODE, "STALE_INODE" }, \
+ { XFS_BLI_INODE_BUF, "INODE_BUF" }
#ifdef __KERNEL__
va_list ap;
#ifdef DEBUG
- xfs_panic_mask |= XFS_PTAG_SHUTDOWN_CORRUPT;
+ xfs_panic_mask |= (XFS_PTAG_SHUTDOWN_CORRUPT | XFS_PTAG_LOGRES);
#endif
if (xfs_panic_mask && (xfs_panic_mask & panic_tag)
STATIC int xlog_space_left(xlog_t *log, int cycle, int bytes);
STATIC int xlog_sync(xlog_t *log, xlog_in_core_t *iclog);
STATIC void xlog_dealloc_log(xlog_t *log);
-STATIC int xlog_write(struct log *log, struct xfs_log_vec *log_vector,
- struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
- xlog_in_core_t **commit_iclog, uint flags);
/* local state machine functions */
STATIC void xlog_state_done_syncing(xlog_in_core_t *iclog, int);
STATIC void xlog_ungrant_log_space(xlog_t *log,
xlog_ticket_t *ticket);
-
-/* local ticket functions */
-STATIC xlog_ticket_t *xlog_ticket_alloc(xlog_t *log,
- int unit_bytes,
- int count,
- char clientid,
- uint flags);
-
#if defined(DEBUG)
STATIC void xlog_verify_dest_ptr(xlog_t *log, char *ptr);
STATIC void xlog_verify_grant_head(xlog_t *log, int equals);
ASSERT(flags & XFS_LOG_PERM_RESERV);
internal_ticket = *ticket;
+ /*
+ * this is a new transaction on the ticket, so we need to
+ * change the transaction ID so that the next transaction has a
+ * different TID in the log. Just add one to the existing tid
+ * so that we can see chains of rolling transactions in the log
+ * easily.
+ */
+ internal_ticket->t_tid++;
+
trace_xfs_log_reserve(log, internal_ticket);
xlog_grant_push_ail(mp, internal_ticket->t_unit_res);
} else {
/* may sleep if need to allocate more tickets */
internal_ticket = xlog_ticket_alloc(log, unit_bytes, cnt,
- client, flags);
+ client, flags,
+ KM_SLEEP|KM_MAYFAIL);
if (!internal_ticket)
return XFS_ERROR(ENOMEM);
internal_ticket->t_trans_type = t_type;
/* Normal transactions can now occur */
mp->m_log->l_flags &= ~XLOG_ACTIVE_RECOVERY;
+ /*
+ * Now the log has been fully initialised and we know were our
+ * space grant counters are, we can initialise the permanent ticket
+ * needed for delayed logging to work.
+ */
+ xlog_cil_init_post_recovery(mp->m_log);
+
return 0;
out_destroy_ail:
item->li_ailp = mp->m_ail;
item->li_type = type;
item->li_ops = ops;
+ item->li_lv = NULL;
+
+ INIT_LIST_HEAD(&item->li_ail);
+ INIT_LIST_HEAD(&item->li_cil);
}
/*
*iclogp = log->l_iclog; /* complete ring */
log->l_iclog->ic_prev = prev_iclog; /* re-write 1st prev ptr */
+ error = xlog_cil_init(log);
+ if (error)
+ goto out_free_iclog;
return log;
out_free_iclog:
xlog_in_core_t *iclog, *next_iclog;
int i;
+ xlog_cil_destroy(log);
+
iclog = log->l_iclog;
for (i=0; i<log->l_iclog_bufs; i++) {
sv_destroy(&iclog->ic_force_wait);
* print out info relating to regions written which consume
* the reservation
*/
-STATIC void
-xlog_print_tic_res(xfs_mount_t *mp, xlog_ticket_t *ticket)
+void
+xlog_print_tic_res(
+ struct xfs_mount *mp,
+ struct xlog_ticket *ticket)
{
uint i;
uint ophdr_spc = ticket->t_res_num_ophdrs * (uint)sizeof(xlog_op_header_t);
"bad-rtype" : res_type_str[r_type-1]),
ticket->t_res_arr[i].r_len);
}
+
+ xfs_cmn_err(XFS_PTAG_LOGRES, CE_ALERT, mp,
+ "xfs_log_write: reservation ran out. Need to up reservation");
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
}
/*
* we don't update ic_offset until the end when we know exactly how many
* bytes have been written out.
*/
-STATIC int
+int
xlog_write(
struct log *log,
struct xfs_log_vec *log_vector,
*start_lsn = 0;
len = xlog_write_calc_vec_length(ticket, log_vector);
- if (ticket->t_curr_res < len) {
- xlog_print_tic_res(log->l_mp, ticket);
-#ifdef DEBUG
- xlog_panic(
- "xfs_log_write: reservation ran out. Need to up reservation");
-#else
- /* Customer configurable panic */
- xfs_cmn_err(XFS_PTAG_LOGRES, CE_ALERT, log->l_mp,
- "xfs_log_write: reservation ran out. Need to up reservation");
+ if (log->l_cilp) {
+ /*
+ * Region headers and bytes are already accounted for.
+ * We only need to take into account start records and
+ * split regions in this function.
+ */
+ if (ticket->t_flags & XLOG_TIC_INITED)
+ ticket->t_curr_res -= sizeof(xlog_op_header_t);
- /* If we did not panic, shutdown the filesystem */
- xfs_force_shutdown(log->l_mp, SHUTDOWN_CORRUPT_INCORE);
-#endif
- }
+ /*
+ * Commit record headers need to be accounted for. These
+ * come in as separate writes so are easy to detect.
+ */
+ if (flags & (XLOG_COMMIT_TRANS | XLOG_UNMOUNT_TRANS))
+ ticket->t_curr_res -= sizeof(xlog_op_header_t);
+ } else
+ ticket->t_curr_res -= len;
- ticket->t_curr_res -= len;
+ if (ticket->t_curr_res < 0)
+ xlog_print_tic_res(log->l_mp, ticket);
index = 0;
lv = log_vector;
XFS_STATS_INC(xs_log_force);
+ xlog_cil_push(log, 1);
+
spin_lock(&log->l_icloglock);
iclog = log->l_iclog;
XFS_STATS_INC(xs_log_force);
+ if (log->l_cilp) {
+ lsn = xlog_cil_push_lsn(log, lsn);
+ if (lsn == NULLCOMMITLSN)
+ return 0;
+ }
+
try_again:
spin_lock(&log->l_icloglock);
iclog = log->l_iclog;
return ticket;
}
+xlog_tid_t
+xfs_log_get_trans_ident(
+ struct xfs_trans *tp)
+{
+ return tp->t_ticket->t_tid;
+}
+
/*
* Allocate and initialise a new log ticket.
*/
-STATIC xlog_ticket_t *
+xlog_ticket_t *
xlog_ticket_alloc(
struct log *log,
int unit_bytes,
int cnt,
char client,
- uint xflags)
+ uint xflags,
+ int alloc_flags)
{
struct xlog_ticket *tic;
uint num_headers;
int iclog_space;
- tic = kmem_zone_zalloc(xfs_log_ticket_zone, KM_SLEEP|KM_MAYFAIL);
+ tic = kmem_zone_zalloc(xfs_log_ticket_zone, alloc_flags);
if (!tic)
return NULL;
* c. nothing new gets queued up after (a) and (b) are done.
* d. if !logerror, flush the iclogs to disk, then seal them off
* for business.
+ *
+ * Note: for delayed logging the !logerror case needs to flush the regions
+ * held in memory out to the iclogs before flushing them to disk. This needs
+ * to be done before the log is marked as shutdown, otherwise the flush to the
+ * iclogs will fail.
*/
int
xfs_log_force_umount(
return 1;
}
retval = 0;
+
+ /*
+ * Flush the in memory commit item list before marking the log as
+ * being shut down. We need to do it in this order to ensure all the
+ * completed transactions are flushed to disk with the xfs_log_force()
+ * call below.
+ */
+ if (!logerror && (mp->m_flags & XFS_MOUNT_DELAYLOG))
+ xlog_cil_push(log, 1);
+
/*
* We must hold both the GRANT lock and the LOG lock,
* before we mark the filesystem SHUTDOWN and wake
#define __XFS_LOG_H__
/* get lsn fields */
-
#define CYCLE_LSN(lsn) ((uint)((lsn)>>32))
#define BLOCK_LSN(lsn) ((uint)(lsn))
struct xfs_log_vec *lv_next; /* next lv in build list */
int lv_niovecs; /* number of iovecs in lv */
struct xfs_log_iovec *lv_iovecp; /* iovec array */
+ struct xfs_log_item *lv_item; /* owner */
+ char *lv_buf; /* formatted buffer */
+ int lv_buf_len; /* size of formatted buffer */
};
/*
struct xlog_ticket;
struct xfs_log_item;
struct xfs_item_ops;
+struct xfs_trans;
void xfs_log_item_init(struct xfs_mount *mp,
struct xfs_log_item *item,
void xlog_iodone(struct xfs_buf *);
-struct xlog_ticket * xfs_log_ticket_get(struct xlog_ticket *ticket);
+struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
void xfs_log_ticket_put(struct xlog_ticket *ticket);
+xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
+
+int xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct xfs_log_vec *log_vector,
+ xfs_lsn_t *commit_lsn, int flags);
+bool xfs_log_item_in_current_chkpt(struct xfs_log_item *lip);
+
#endif
--- /dev/null
+/*
+ * Copyright (c) 2010 Red Hat, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_types.h"
+#include "xfs_bit.h"
+#include "xfs_log.h"
+#include "xfs_inum.h"
+#include "xfs_trans.h"
+#include "xfs_trans_priv.h"
+#include "xfs_log_priv.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_dmapi.h"
+#include "xfs_mount.h"
+#include "xfs_error.h"
+#include "xfs_alloc.h"
+
+/*
+ * Perform initial CIL structure initialisation. If the CIL is not
+ * enabled in this filesystem, ensure the log->l_cilp is null so
+ * we can check this conditional to determine if we are doing delayed
+ * logging or not.
+ */
+int
+xlog_cil_init(
+ struct log *log)
+{
+ struct xfs_cil *cil;
+ struct xfs_cil_ctx *ctx;
+
+ log->l_cilp = NULL;
+ if (!(log->l_mp->m_flags & XFS_MOUNT_DELAYLOG))
+ return 0;
+
+ cil = kmem_zalloc(sizeof(*cil), KM_SLEEP|KM_MAYFAIL);
+ if (!cil)
+ return ENOMEM;
+
+ ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP|KM_MAYFAIL);
+ if (!ctx) {
+ kmem_free(cil);
+ return ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&cil->xc_cil);
+ INIT_LIST_HEAD(&cil->xc_committing);
+ spin_lock_init(&cil->xc_cil_lock);
+ init_rwsem(&cil->xc_ctx_lock);
+ sv_init(&cil->xc_commit_wait, SV_DEFAULT, "cilwait");
+
+ INIT_LIST_HEAD(&ctx->committing);
+ INIT_LIST_HEAD(&ctx->busy_extents);
+ ctx->sequence = 1;
+ ctx->cil = cil;
+ cil->xc_ctx = ctx;
+
+ cil->xc_log = log;
+ log->l_cilp = cil;
+ return 0;
+}
+
+void
+xlog_cil_destroy(
+ struct log *log)
+{
+ if (!log->l_cilp)
+ return;
+
+ if (log->l_cilp->xc_ctx) {
+ if (log->l_cilp->xc_ctx->ticket)
+ xfs_log_ticket_put(log->l_cilp->xc_ctx->ticket);
+ kmem_free(log->l_cilp->xc_ctx);
+ }
+
+ ASSERT(list_empty(&log->l_cilp->xc_cil));
+ kmem_free(log->l_cilp);
+}
+
+/*
+ * Allocate a new ticket. Failing to get a new ticket makes it really hard to
+ * recover, so we don't allow failure here. Also, we allocate in a context that
+ * we don't want to be issuing transactions from, so we need to tell the
+ * allocation code this as well.
+ *
+ * We don't reserve any space for the ticket - we are going to steal whatever
+ * space we require from transactions as they commit. To ensure we reserve all
+ * the space required, we need to set the current reservation of the ticket to
+ * zero so that we know to steal the initial transaction overhead from the
+ * first transaction commit.
+ */
+static struct xlog_ticket *
+xlog_cil_ticket_alloc(
+ struct log *log)
+{
+ struct xlog_ticket *tic;
+
+ tic = xlog_ticket_alloc(log, 0, 1, XFS_TRANSACTION, 0,
+ KM_SLEEP|KM_NOFS);
+ tic->t_trans_type = XFS_TRANS_CHECKPOINT;
+
+ /*
+ * set the current reservation to zero so we know to steal the basic
+ * transaction overhead reservation from the first transaction commit.
+ */
+ tic->t_curr_res = 0;
+ return tic;
+}
+
+/*
+ * After the first stage of log recovery is done, we know where the head and
+ * tail of the log are. We need this log initialisation done before we can
+ * initialise the first CIL checkpoint context.
+ *
+ * Here we allocate a log ticket to track space usage during a CIL push. This
+ * ticket is passed to xlog_write() directly so that we don't slowly leak log
+ * space by failing to account for space used by log headers and additional
+ * region headers for split regions.
+ */
+void
+xlog_cil_init_post_recovery(
+ struct log *log)
+{
+ if (!log->l_cilp)
+ return;
+
+ log->l_cilp->xc_ctx->ticket = xlog_cil_ticket_alloc(log);
+ log->l_cilp->xc_ctx->sequence = 1;
+ log->l_cilp->xc_ctx->commit_lsn = xlog_assign_lsn(log->l_curr_cycle,
+ log->l_curr_block);
+}
+
+/*
+ * Insert the log item into the CIL and calculate the difference in space
+ * consumed by the item. Add the space to the checkpoint ticket and calculate
+ * if the change requires additional log metadata. If it does, take that space
+ * as well. Remove the amount of space we addded to the checkpoint ticket from
+ * the current transaction ticket so that the accounting works out correctly.
+ *
+ * If this is the first time the item is being placed into the CIL in this
+ * context, pin it so it can't be written to disk until the CIL is flushed to
+ * the iclog and the iclog written to disk.
+ */
+static void
+xlog_cil_insert(
+ struct log *log,
+ struct xlog_ticket *ticket,
+ struct xfs_log_item *item,
+ struct xfs_log_vec *lv)
+{
+ struct xfs_cil *cil = log->l_cilp;
+ struct xfs_log_vec *old = lv->lv_item->li_lv;
+ struct xfs_cil_ctx *ctx = cil->xc_ctx;
+ int len;
+ int diff_iovecs;
+ int iclog_space;
+
+ if (old) {
+ /* existing lv on log item, space used is a delta */
+ ASSERT(!list_empty(&item->li_cil));
+ ASSERT(old->lv_buf && old->lv_buf_len && old->lv_niovecs);
+
+ len = lv->lv_buf_len - old->lv_buf_len;
+ diff_iovecs = lv->lv_niovecs - old->lv_niovecs;
+ kmem_free(old->lv_buf);
+ kmem_free(old);
+ } else {
+ /* new lv, must pin the log item */
+ ASSERT(!lv->lv_item->li_lv);
+ ASSERT(list_empty(&item->li_cil));
+
+ len = lv->lv_buf_len;
+ diff_iovecs = lv->lv_niovecs;
+ IOP_PIN(lv->lv_item);
+
+ }
+ len += diff_iovecs * sizeof(xlog_op_header_t);
+
+ /* attach new log vector to log item */
+ lv->lv_item->li_lv = lv;
+
+ spin_lock(&cil->xc_cil_lock);
+ list_move_tail(&item->li_cil, &cil->xc_cil);
+ ctx->nvecs += diff_iovecs;
+
+ /*
+ * If this is the first time the item is being committed to the CIL,
+ * store the sequence number on the log item so we can tell
+ * in future commits whether this is the first checkpoint the item is
+ * being committed into.
+ */
+ if (!item->li_seq)
+ item->li_seq = ctx->sequence;
+
+ /*
+ * Now transfer enough transaction reservation to the context ticket
+ * for the checkpoint. The context ticket is special - the unit
+ * reservation has to grow as well as the current reservation as we
+ * steal from tickets so we can correctly determine the space used
+ * during the transaction commit.
+ */
+ if (ctx->ticket->t_curr_res == 0) {
+ /* first commit in checkpoint, steal the header reservation */
+ ASSERT(ticket->t_curr_res >= ctx->ticket->t_unit_res + len);
+ ctx->ticket->t_curr_res = ctx->ticket->t_unit_res;
+ ticket->t_curr_res -= ctx->ticket->t_unit_res;
+ }
+
+ /* do we need space for more log record headers? */
+ iclog_space = log->l_iclog_size - log->l_iclog_hsize;
+ if (len > 0 && (ctx->space_used / iclog_space !=
+ (ctx->space_used + len) / iclog_space)) {
+ int hdrs;
+
+ hdrs = (len + iclog_space - 1) / iclog_space;
+ /* need to take into account split region headers, too */
+ hdrs *= log->l_iclog_hsize + sizeof(struct xlog_op_header);
+ ctx->ticket->t_unit_res += hdrs;
+ ctx->ticket->t_curr_res += hdrs;
+ ticket->t_curr_res -= hdrs;
+ ASSERT(ticket->t_curr_res >= len);
+ }
+ ticket->t_curr_res -= len;
+ ctx->space_used += len;
+
+ spin_unlock(&cil->xc_cil_lock);
+}
+
+/*
+ * Format log item into a flat buffers
+ *
+ * For delayed logging, we need to hold a formatted buffer containing all the
+ * changes on the log item. This enables us to relog the item in memory and
+ * write it out asynchronously without needing to relock the object that was
+ * modified at the time it gets written into the iclog.
+ *
+ * This function builds a vector for the changes in each log item in the
+ * transaction. It then works out the length of the buffer needed for each log
+ * item, allocates them and formats the vector for the item into the buffer.
+ * The buffer is then attached to the log item are then inserted into the
+ * Committed Item List for tracking until the next checkpoint is written out.
+ *
+ * We don't set up region headers during this process; we simply copy the
+ * regions into the flat buffer. We can do this because we still have to do a
+ * formatting step to write the regions into the iclog buffer. Writing the
+ * ophdrs during the iclog write means that we can support splitting large
+ * regions across iclog boundares without needing a change in the format of the
+ * item/region encapsulation.
+ *
+ * Hence what we need to do now is change the rewrite the vector array to point
+ * to the copied region inside the buffer we just allocated. This allows us to
+ * format the regions into the iclog as though they are being formatted
+ * directly out of the objects themselves.
+ */
+static void
+xlog_cil_format_items(
+ struct log *log,
+ struct xfs_log_vec *log_vector,
+ struct xlog_ticket *ticket,
+ xfs_lsn_t *start_lsn)
+{
+ struct xfs_log_vec *lv;
+
+ if (start_lsn)
+ *start_lsn = log->l_cilp->xc_ctx->sequence;
+
+ ASSERT(log_vector);
+ for (lv = log_vector; lv; lv = lv->lv_next) {
+ void *ptr;
+ int index;
+ int len = 0;
+
+ /* build the vector array and calculate it's length */
+ IOP_FORMAT(lv->lv_item, lv->lv_iovecp);
+ for (index = 0; index < lv->lv_niovecs; index++)
+ len += lv->lv_iovecp[index].i_len;
+
+ lv->lv_buf_len = len;
+ lv->lv_buf = kmem_zalloc(lv->lv_buf_len, KM_SLEEP|KM_NOFS);
+ ptr = lv->lv_buf;
+
+ for (index = 0; index < lv->lv_niovecs; index++) {
+ struct xfs_log_iovec *vec = &lv->lv_iovecp[index];
+
+ memcpy(ptr, vec->i_addr, vec->i_len);
+ vec->i_addr = ptr;
+ ptr += vec->i_len;
+ }
+ ASSERT(ptr == lv->lv_buf + lv->lv_buf_len);
+
+ xlog_cil_insert(log, ticket, lv->lv_item, lv);
+ }
+}
+
+static void
+xlog_cil_free_logvec(
+ struct xfs_log_vec *log_vector)
+{
+ struct xfs_log_vec *lv;
+
+ for (lv = log_vector; lv; ) {
+ struct xfs_log_vec *next = lv->lv_next;
+ kmem_free(lv->lv_buf);
+ kmem_free(lv);
+ lv = next;
+ }
+}
+
+/*
+ * Commit a transaction with the given vector to the Committed Item List.
+ *
+ * To do this, we need to format the item, pin it in memory if required and
+ * account for the space used by the transaction. Once we have done that we
+ * need to release the unused reservation for the transaction, attach the
+ * transaction to the checkpoint context so we carry the busy extents through
+ * to checkpoint completion, and then unlock all the items in the transaction.
+ *
+ * For more specific information about the order of operations in
+ * xfs_log_commit_cil() please refer to the comments in
+ * xfs_trans_commit_iclog().
+ *
+ * Called with the context lock already held in read mode to lock out
+ * background commit, returns without it held once background commits are
+ * allowed again.
+ */
+int
+xfs_log_commit_cil(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_log_vec *log_vector,
+ xfs_lsn_t *commit_lsn,
+ int flags)
+{
+ struct log *log = mp->m_log;
+ int log_flags = 0;
+ int push = 0;
+
+ if (flags & XFS_TRANS_RELEASE_LOG_RES)
+ log_flags = XFS_LOG_REL_PERM_RESERV;
+
+ if (XLOG_FORCED_SHUTDOWN(log)) {
+ xlog_cil_free_logvec(log_vector);
+ return XFS_ERROR(EIO);
+ }
+
+ /* lock out background commit */
+ down_read(&log->l_cilp->xc_ctx_lock);
+ xlog_cil_format_items(log, log_vector, tp->t_ticket, commit_lsn);
+
+ /* check we didn't blow the reservation */
+ if (tp->t_ticket->t_curr_res < 0)
+ xlog_print_tic_res(log->l_mp, tp->t_ticket);
+
+ /* attach the transaction to the CIL if it has any busy extents */
+ if (!list_empty(&tp->t_busy)) {
+ spin_lock(&log->l_cilp->xc_cil_lock);
+ list_splice_init(&tp->t_busy,
+ &log->l_cilp->xc_ctx->busy_extents);
+ spin_unlock(&log->l_cilp->xc_cil_lock);
+ }
+
+ tp->t_commit_lsn = *commit_lsn;
+ xfs_log_done(mp, tp->t_ticket, NULL, log_flags);
+ xfs_trans_unreserve_and_mod_sb(tp);
+
+ /* check for background commit before unlock */
+ if (log->l_cilp->xc_ctx->space_used > XLOG_CIL_SPACE_LIMIT(log))
+ push = 1;
+ up_read(&log->l_cilp->xc_ctx_lock);
+
+ /*
+ * We need to push CIL every so often so we don't cache more than we
+ * can fit in the log. The limit really is that a checkpoint can't be
+ * more than half the log (the current checkpoint is not allowed to
+ * overwrite the previous checkpoint), but commit latency and memory
+ * usage limit this to a smaller size in most cases.
+ */
+ if (push)
+ xlog_cil_push(log, 0);
+ return 0;
+}
+
+/*
+ * Mark all items committed and clear busy extents. We free the log vector
+ * chains in a separate pass so that we unpin the log items as quickly as
+ * possible.
+ */
+static void
+xlog_cil_committed(
+ void *args,
+ int abort)
+{
+ struct xfs_cil_ctx *ctx = args;
+ struct xfs_log_vec *lv;
+ int abortflag = abort ? XFS_LI_ABORTED : 0;
+ struct xfs_busy_extent *busyp, *n;
+
+ /* unpin all the log items */
+ for (lv = ctx->lv_chain; lv; lv = lv->lv_next ) {
+ xfs_trans_item_committed(lv->lv_item, ctx->start_lsn,
+ abortflag);
+ }
+
+ list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
+ xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
+
+ spin_lock(&ctx->cil->xc_cil_lock);
+ list_del(&ctx->committing);
+ spin_unlock(&ctx->cil->xc_cil_lock);
+
+ xlog_cil_free_logvec(ctx->lv_chain);
+ kmem_free(ctx);
+}
+
+/*
+ * Push the Committed Item List to the log. If the push_now flag is not set,
+ * then it is a background flush and so we can chose to ignore it.
+ */
+int
+xlog_cil_push(
+ struct log *log,
+ int push_now)
+{
+ struct xfs_cil *cil = log->l_cilp;
+ struct xfs_log_vec *lv;
+ struct xfs_cil_ctx *ctx;
+ struct xfs_cil_ctx *new_ctx;
+ struct xlog_in_core *commit_iclog;
+ struct xlog_ticket *tic;
+ int num_lv;
+ int num_iovecs;
+ int len;
+ int error = 0;
+ struct xfs_trans_header thdr;
+ struct xfs_log_iovec lhdr;
+ struct xfs_log_vec lvhdr = { NULL };
+ xfs_lsn_t commit_lsn;
+
+ if (!cil)
+ return 0;
+
+ new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
+ new_ctx->ticket = xlog_cil_ticket_alloc(log);
+
+ /* lock out transaction commit, but don't block on background push */
+ if (!down_write_trylock(&cil->xc_ctx_lock)) {
+ if (!push_now)
+ goto out_free_ticket;
+ down_write(&cil->xc_ctx_lock);
+ }
+ ctx = cil->xc_ctx;
+
+ /* check if we've anything to push */
+ if (list_empty(&cil->xc_cil))
+ goto out_skip;
+
+ /* check for spurious background flush */
+ if (!push_now && cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log))
+ goto out_skip;
+
+ /*
+ * pull all the log vectors off the items in the CIL, and
+ * remove the items from the CIL. We don't need the CIL lock
+ * here because it's only needed on the transaction commit
+ * side which is currently locked out by the flush lock.
+ */
+ lv = NULL;
+ num_lv = 0;
+ num_iovecs = 0;
+ len = 0;
+ while (!list_empty(&cil->xc_cil)) {
+ struct xfs_log_item *item;
+ int i;
+
+ item = list_first_entry(&cil->xc_cil,
+ struct xfs_log_item, li_cil);
+ list_del_init(&item->li_cil);
+ if (!ctx->lv_chain)
+ ctx->lv_chain = item->li_lv;
+ else
+ lv->lv_next = item->li_lv;
+ lv = item->li_lv;
+ item->li_lv = NULL;
+
+ num_lv++;
+ num_iovecs += lv->lv_niovecs;
+ for (i = 0; i < lv->lv_niovecs; i++)
+ len += lv->lv_iovecp[i].i_len;
+ }
+
+ /*
+ * initialise the new context and attach it to the CIL. Then attach
+ * the current context to the CIL committing lsit so it can be found
+ * during log forces to extract the commit lsn of the sequence that
+ * needs to be forced.
+ */
+ INIT_LIST_HEAD(&new_ctx->committing);
+ INIT_LIST_HEAD(&new_ctx->busy_extents);
+ new_ctx->sequence = ctx->sequence + 1;
+ new_ctx->cil = cil;
+ cil->xc_ctx = new_ctx;
+
+ /*
+ * The switch is now done, so we can drop the context lock and move out
+ * of a shared context. We can't just go straight to the commit record,
+ * though - we need to synchronise with previous and future commits so
+ * that the commit records are correctly ordered in the log to ensure
+ * that we process items during log IO completion in the correct order.
+ *
+ * For example, if we get an EFI in one checkpoint and the EFD in the
+ * next (e.g. due to log forces), we do not want the checkpoint with
+ * the EFD to be committed before the checkpoint with the EFI. Hence
+ * we must strictly order the commit records of the checkpoints so
+ * that: a) the checkpoint callbacks are attached to the iclogs in the
+ * correct order; and b) the checkpoints are replayed in correct order
+ * in log recovery.
+ *
+ * Hence we need to add this context to the committing context list so
+ * that higher sequences will wait for us to write out a commit record
+ * before they do.
+ */
+ spin_lock(&cil->xc_cil_lock);
+ list_add(&ctx->committing, &cil->xc_committing);
+ spin_unlock(&cil->xc_cil_lock);
+ up_write(&cil->xc_ctx_lock);
+
+ /*
+ * Build a checkpoint transaction header and write it to the log to
+ * begin the transaction. We need to account for the space used by the
+ * transaction header here as it is not accounted for in xlog_write().
+ *
+ * The LSN we need to pass to the log items on transaction commit is
+ * the LSN reported by the first log vector write. If we use the commit
+ * record lsn then we can move the tail beyond the grant write head.
+ */
+ tic = ctx->ticket;
+ thdr.th_magic = XFS_TRANS_HEADER_MAGIC;
+ thdr.th_type = XFS_TRANS_CHECKPOINT;
+ thdr.th_tid = tic->t_tid;
+ thdr.th_num_items = num_iovecs;
+ lhdr.i_addr = (xfs_caddr_t)&thdr;
+ lhdr.i_len = sizeof(xfs_trans_header_t);
+ lhdr.i_type = XLOG_REG_TYPE_TRANSHDR;
+ tic->t_curr_res -= lhdr.i_len + sizeof(xlog_op_header_t);
+
+ lvhdr.lv_niovecs = 1;
+ lvhdr.lv_iovecp = &lhdr;
+ lvhdr.lv_next = ctx->lv_chain;
+
+ error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0);
+ if (error)
+ goto out_abort;
+
+ /*
+ * now that we've written the checkpoint into the log, strictly
+ * order the commit records so replay will get them in the right order.
+ */
+restart:
+ spin_lock(&cil->xc_cil_lock);
+ list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
+ /*
+ * Higher sequences will wait for this one so skip them.
+ * Don't wait for own own sequence, either.
+ */
+ if (new_ctx->sequence >= ctx->sequence)
+ continue;
+ if (!new_ctx->commit_lsn) {
+ /*
+ * It is still being pushed! Wait for the push to
+ * complete, then start again from the beginning.
+ */
+ sv_wait(&cil->xc_commit_wait, 0, &cil->xc_cil_lock, 0);
+ goto restart;
+ }
+ }
+ spin_unlock(&cil->xc_cil_lock);
+
+ commit_lsn = xfs_log_done(log->l_mp, tic, &commit_iclog, 0);
+ if (error || commit_lsn == -1)
+ goto out_abort;
+
+ /* attach all the transactions w/ busy extents to iclog */
+ ctx->log_cb.cb_func = xlog_cil_committed;
+ ctx->log_cb.cb_arg = ctx;
+ error = xfs_log_notify(log->l_mp, commit_iclog, &ctx->log_cb);
+ if (error)
+ goto out_abort;
+
+ /*
+ * now the checkpoint commit is complete and we've attached the
+ * callbacks to the iclog we can assign the commit LSN to the context
+ * and wake up anyone who is waiting for the commit to complete.
+ */
+ spin_lock(&cil->xc_cil_lock);
+ ctx->commit_lsn = commit_lsn;
+ sv_broadcast(&cil->xc_commit_wait);
+ spin_unlock(&cil->xc_cil_lock);
+
+ /* release the hounds! */
+ return xfs_log_release_iclog(log->l_mp, commit_iclog);
+
+out_skip:
+ up_write(&cil->xc_ctx_lock);
+out_free_ticket:
+ xfs_log_ticket_put(new_ctx->ticket);
+ kmem_free(new_ctx);
+ return 0;
+
+out_abort:
+ xlog_cil_committed(ctx, XFS_LI_ABORTED);
+ return XFS_ERROR(EIO);
+}
+
+/*
+ * Conditionally push the CIL based on the sequence passed in.
+ *
+ * We only need to push if we haven't already pushed the sequence
+ * number given. Hence the only time we will trigger a push here is
+ * if the push sequence is the same as the current context.
+ *
+ * We return the current commit lsn to allow the callers to determine if a
+ * iclog flush is necessary following this call.
+ *
+ * XXX: Initially, just push the CIL unconditionally and return whatever
+ * commit lsn is there. It'll be empty, so this is broken for now.
+ */
+xfs_lsn_t
+xlog_cil_push_lsn(
+ struct log *log,
+ xfs_lsn_t push_seq)
+{
+ struct xfs_cil *cil = log->l_cilp;
+ struct xfs_cil_ctx *ctx;
+ xfs_lsn_t commit_lsn = NULLCOMMITLSN;
+
+restart:
+ down_write(&cil->xc_ctx_lock);
+ ASSERT(push_seq <= cil->xc_ctx->sequence);
+
+ /* check to see if we need to force out the current context */
+ if (push_seq == cil->xc_ctx->sequence) {
+ up_write(&cil->xc_ctx_lock);
+ xlog_cil_push(log, 1);
+ goto restart;
+ }
+
+ /*
+ * See if we can find a previous sequence still committing.
+ * We can drop the flush lock as soon as we have the cil lock
+ * because we are now only comparing contexts protected by
+ * the cil lock.
+ *
+ * We need to wait for all previous sequence commits to complete
+ * before allowing the force of push_seq to go ahead. Hence block
+ * on commits for those as well.
+ */
+ spin_lock(&cil->xc_cil_lock);
+ up_write(&cil->xc_ctx_lock);
+ list_for_each_entry(ctx, &cil->xc_committing, committing) {
+ if (ctx->sequence > push_seq)
+ continue;
+ if (!ctx->commit_lsn) {
+ /*
+ * It is still being pushed! Wait for the push to
+ * complete, then start again from the beginning.
+ */
+ sv_wait(&cil->xc_commit_wait, 0, &cil->xc_cil_lock, 0);
+ goto restart;
+ }
+ if (ctx->sequence != push_seq)
+ continue;
+ /* found it! */
+ commit_lsn = ctx->commit_lsn;
+ }
+ spin_unlock(&cil->xc_cil_lock);
+ return commit_lsn;
+}
+
+/*
+ * Check if the current log item was first committed in this sequence.
+ * We can't rely on just the log item being in the CIL, we have to check
+ * the recorded commit sequence number.
+ *
+ * Note: for this to be used in a non-racy manner, it has to be called with
+ * CIL flushing locked out. As a result, it should only be used during the
+ * transaction commit process when deciding what to format into the item.
+ */
+bool
+xfs_log_item_in_current_chkpt(
+ struct xfs_log_item *lip)
+{
+ struct xfs_cil_ctx *ctx;
+
+ if (!(lip->li_mountp->m_flags & XFS_MOUNT_DELAYLOG))
+ return false;
+ if (list_empty(&lip->li_cil))
+ return false;
+
+ ctx = lip->li_mountp->m_log->l_cilp->xc_ctx;
+
+ /*
+ * li_seq is written on the first commit of a log item to record the
+ * first checkpoint it is written to. Hence if it is different to the
+ * current sequence, we're in a new checkpoint.
+ */
+ if (XFS_LSN_CMP(lip->li_seq, ctx->sequence) != 0)
+ return false;
+ return true;
+}
#define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */
#define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being
shutdown */
-typedef __uint32_t xlog_tid_t;
-
#ifdef __KERNEL__
/*
#define ic_header ic_data->hic_header
} xlog_in_core_t;
+/*
+ * The CIL context is used to aggregate per-transaction details as well be
+ * passed to the iclog for checkpoint post-commit processing. After being
+ * passed to the iclog, another context needs to be allocated for tracking the
+ * next set of transactions to be aggregated into a checkpoint.
+ */
+struct xfs_cil;
+
+struct xfs_cil_ctx {
+ struct xfs_cil *cil;
+ xfs_lsn_t sequence; /* chkpt sequence # */
+ xfs_lsn_t start_lsn; /* first LSN of chkpt commit */
+ xfs_lsn_t commit_lsn; /* chkpt commit record lsn */
+ struct xlog_ticket *ticket; /* chkpt ticket */
+ int nvecs; /* number of regions */
+ int space_used; /* aggregate size of regions */
+ struct list_head busy_extents; /* busy extents in chkpt */
+ struct xfs_log_vec *lv_chain; /* logvecs being pushed */
+ xfs_log_callback_t log_cb; /* completion callback hook. */
+ struct list_head committing; /* ctx committing list */
+};
+
+/*
+ * Committed Item List structure
+ *
+ * This structure is used to track log items that have been committed but not
+ * yet written into the log. It is used only when the delayed logging mount
+ * option is enabled.
+ *
+ * This structure tracks the list of committing checkpoint contexts so
+ * we can avoid the problem of having to hold out new transactions during a
+ * flush until we have a the commit record LSN of the checkpoint. We can
+ * traverse the list of committing contexts in xlog_cil_push_lsn() to find a
+ * sequence match and extract the commit LSN directly from there. If the
+ * checkpoint is still in the process of committing, we can block waiting for
+ * the commit LSN to be determined as well. This should make synchronous
+ * operations almost as efficient as the old logging methods.
+ */
+struct xfs_cil {
+ struct log *xc_log;
+ struct list_head xc_cil;
+ spinlock_t xc_cil_lock;
+ struct xfs_cil_ctx *xc_ctx;
+ struct rw_semaphore xc_ctx_lock;
+ struct list_head xc_committing;
+ sv_t xc_commit_wait;
+};
+
+/*
+ * The amount of log space we should the CIL to aggregate is difficult to size.
+ * Whatever we chose we have to make we can get a reservation for the log space
+ * effectively, that it is large enough to capture sufficient relogging to
+ * reduce log buffer IO significantly, but it is not too large for the log or
+ * induces too much latency when writing out through the iclogs. We track both
+ * space consumed and the number of vectors in the checkpoint context, so we
+ * need to decide which to use for limiting.
+ *
+ * Every log buffer we write out during a push needs a header reserved, which
+ * is at least one sector and more for v2 logs. Hence we need a reservation of
+ * at least 512 bytes per 32k of log space just for the LR headers. That means
+ * 16KB of reservation per megabyte of delayed logging space we will consume,
+ * plus various headers. The number of headers will vary based on the num of
+ * io vectors, so limiting on a specific number of vectors is going to result
+ * in transactions of varying size. IOWs, it is more consistent to track and
+ * limit space consumed in the log rather than by the number of objects being
+ * logged in order to prevent checkpoint ticket overruns.
+ *
+ * Further, use of static reservations through the log grant mechanism is
+ * problematic. It introduces a lot of complexity (e.g. reserve grant vs write
+ * grant) and a significant deadlock potential because regranting write space
+ * can block on log pushes. Hence if we have to regrant log space during a log
+ * push, we can deadlock.
+ *
+ * However, we can avoid this by use of a dynamic "reservation stealing"
+ * technique during transaction commit whereby unused reservation space in the
+ * transaction ticket is transferred to the CIL ctx commit ticket to cover the
+ * space needed by the checkpoint transaction. This means that we never need to
+ * specifically reserve space for the CIL checkpoint transaction, nor do we
+ * need to regrant space once the checkpoint completes. This also means the
+ * checkpoint transaction ticket is specific to the checkpoint context, rather
+ * than the CIL itself.
+ *
+ * With dynamic reservations, we can basically make up arbitrary limits for the
+ * checkpoint size so long as they don't violate any other size rules. Hence
+ * the initial maximum size for the checkpoint transaction will be set to a
+ * quarter of the log or 8MB, which ever is smaller. 8MB is an arbitrary limit
+ * right now based on the latency of writing out a large amount of data through
+ * the circular iclog buffers.
+ */
+
+#define XLOG_CIL_SPACE_LIMIT(log) \
+ (min((log->l_logsize >> 2), (8 * 1024 * 1024)))
+
/*
* The reservation head lsn is not made up of a cycle number and block number.
* Instead, it uses a cycle number and byte number. Logs don't expect to
/* The following fields don't need locking */
struct xfs_mount *l_mp; /* mount point */
struct xfs_ail *l_ailp; /* AIL log is working with */
+ struct xfs_cil *l_cilp; /* CIL log is working with */
struct xfs_buf *l_xbuf; /* extra buffer for log
* wrapping */
struct xfs_buftarg *l_targ; /* buftarg of log */
#define XLOG_FORCED_SHUTDOWN(log) ((log)->l_flags & XLOG_IO_ERROR)
-
/* common routines */
extern xfs_lsn_t xlog_assign_tail_lsn(struct xfs_mount *mp);
extern int xlog_recover(xlog_t *log);
extern int xlog_recover_finish(xlog_t *log);
extern void xlog_pack_data(xlog_t *log, xlog_in_core_t *iclog, int);
-extern kmem_zone_t *xfs_log_ticket_zone;
+extern kmem_zone_t *xfs_log_ticket_zone;
+struct xlog_ticket *xlog_ticket_alloc(struct log *log, int unit_bytes,
+ int count, char client, uint xflags,
+ int alloc_flags);
+
static inline void
xlog_write_adv_cnt(void **ptr, int *len, int *off, size_t bytes)
*off += bytes;
}
+void xlog_print_tic_res(struct xfs_mount *mp, struct xlog_ticket *ticket);
+int xlog_write(struct log *log, struct xfs_log_vec *log_vector,
+ struct xlog_ticket *tic, xfs_lsn_t *start_lsn,
+ xlog_in_core_t **commit_iclog, uint flags);
+
+/*
+ * Committed Item List interfaces
+ */
+int xlog_cil_init(struct log *log);
+void xlog_cil_init_post_recovery(struct log *log);
+void xlog_cil_destroy(struct log *log);
+
+int xlog_cil_push(struct log *log, int push_now);
+xfs_lsn_t xlog_cil_push_lsn(struct log *log, xfs_lsn_t push_sequence);
+
/*
* Unmount record type is used as a pseudo transaction type for the ticket.
* It's value must be outside the range of XFS_TRANS_* values.
switch (ITEM_TYPE(item)) {
case XFS_LI_BUF:
- if (!(buf_f->blf_flags & XFS_BLI_CANCEL)) {
+ if (!(buf_f->blf_flags & XFS_BLF_CANCEL)) {
trace_xfs_log_recover_item_reorder_head(log,
trans, item, pass);
list_move(&item->ri_list, &trans->r_itemq);
/*
* If this isn't a cancel buffer item, then just return.
*/
- if (!(flags & XFS_BLI_CANCEL)) {
+ if (!(flags & XFS_BLF_CANCEL)) {
trace_xfs_log_recover_buf_not_cancel(log, buf_f);
return;
}
* Check to see whether the buffer being recovered has a corresponding
* entry in the buffer cancel record table. If it does then return 1
* so that it will be cancelled, otherwise return 0. If the buffer is
- * actually a buffer cancel item (XFS_BLI_CANCEL is set), then decrement
+ * actually a buffer cancel item (XFS_BLF_CANCEL is set), then decrement
* the refcount on the entry in the table and remove it from the table
* if this is the last reference.
*
* There is nothing in the table built in pass one,
* so this buffer must not be cancelled.
*/
- ASSERT(!(flags & XFS_BLI_CANCEL));
+ ASSERT(!(flags & XFS_BLF_CANCEL));
return 0;
}
* There is no corresponding entry in the table built
* in pass one, so this buffer has not been cancelled.
*/
- ASSERT(!(flags & XFS_BLI_CANCEL));
+ ASSERT(!(flags & XFS_BLF_CANCEL));
return 0;
}
* one in the table and remove it if this is the
* last reference.
*/
- if (flags & XFS_BLI_CANCEL) {
+ if (flags & XFS_BLF_CANCEL) {
bcp->bc_refcount--;
if (bcp->bc_refcount == 0) {
if (prevp == NULL) {
* We didn't find a corresponding entry in the table, so
* return 0 so that the buffer is NOT cancelled.
*/
- ASSERT(!(flags & XFS_BLI_CANCEL));
+ ASSERT(!(flags & XFS_BLF_CANCEL));
return 0;
}
nbits = xfs_contig_bits(data_map, map_size,
bit);
ASSERT(nbits > 0);
- reg_buf_offset = bit << XFS_BLI_SHIFT;
- reg_buf_bytes = nbits << XFS_BLI_SHIFT;
+ reg_buf_offset = bit << XFS_BLF_SHIFT;
+ reg_buf_bytes = nbits << XFS_BLF_SHIFT;
item_index++;
}
}
ASSERT(item->ri_buf[item_index].i_addr != NULL);
- ASSERT((item->ri_buf[item_index].i_len % XFS_BLI_CHUNK) == 0);
+ ASSERT((item->ri_buf[item_index].i_len % XFS_BLF_CHUNK) == 0);
ASSERT((reg_buf_offset + reg_buf_bytes) <= XFS_BUF_COUNT(bp));
/*
nbits = xfs_contig_bits(data_map, map_size, bit);
ASSERT(nbits > 0);
ASSERT(item->ri_buf[i].i_addr != NULL);
- ASSERT(item->ri_buf[i].i_len % XFS_BLI_CHUNK == 0);
+ ASSERT(item->ri_buf[i].i_len % XFS_BLF_CHUNK == 0);
ASSERT(XFS_BUF_COUNT(bp) >=
- ((uint)bit << XFS_BLI_SHIFT)+(nbits<<XFS_BLI_SHIFT));
+ ((uint)bit << XFS_BLF_SHIFT)+(nbits<<XFS_BLF_SHIFT));
/*
* Do a sanity check if this is a dquot buffer. Just checking
*/
error = 0;
if (buf_f->blf_flags &
- (XFS_BLI_UDQUOT_BUF|XFS_BLI_PDQUOT_BUF|XFS_BLI_GDQUOT_BUF)) {
+ (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
if (item->ri_buf[i].i_addr == NULL) {
cmn_err(CE_ALERT,
"XFS: NULL dquot in %s.", __func__);
}
memcpy(xfs_buf_offset(bp,
- (uint)bit << XFS_BLI_SHIFT), /* dest */
+ (uint)bit << XFS_BLF_SHIFT), /* dest */
item->ri_buf[i].i_addr, /* source */
- nbits<<XFS_BLI_SHIFT); /* length */
+ nbits<<XFS_BLF_SHIFT); /* length */
next:
i++;
bit += nbits;
}
type = 0;
- if (buf_f->blf_flags & XFS_BLI_UDQUOT_BUF)
+ if (buf_f->blf_flags & XFS_BLF_UDQUOT_BUF)
type |= XFS_DQ_USER;
- if (buf_f->blf_flags & XFS_BLI_PDQUOT_BUF)
+ if (buf_f->blf_flags & XFS_BLF_PDQUOT_BUF)
type |= XFS_DQ_PROJ;
- if (buf_f->blf_flags & XFS_BLI_GDQUOT_BUF)
+ if (buf_f->blf_flags & XFS_BLF_GDQUOT_BUF)
type |= XFS_DQ_GROUP;
/*
* This type of quotas was turned off, so ignore this buffer
* here which overlaps that may be stale.
*
* When meta-data buffers are freed at run time we log a buffer item
- * with the XFS_BLI_CANCEL bit set to indicate that previous copies
+ * with the XFS_BLF_CANCEL bit set to indicate that previous copies
* of the buffer in the log should not be replayed at recovery time.
* This is so that if the blocks covered by the buffer are reused for
* file data before we crash we don't end up replaying old, freed
if (pass == XLOG_RECOVER_PASS1) {
/*
* In this pass we're only looking for buf items
- * with the XFS_BLI_CANCEL bit set.
+ * with the XFS_BLF_CANCEL bit set.
*/
xlog_recover_do_buffer_pass1(log, buf_f);
return 0;
mp = log->l_mp;
buf_flags = XBF_LOCK;
- if (!(flags & XFS_BLI_INODE_BUF))
+ if (!(flags & XFS_BLF_INODE_BUF))
buf_flags |= XBF_MAPPED;
bp = xfs_buf_read(mp->m_ddev_targp, blkno, len, buf_flags);
}
error = 0;
- if (flags & XFS_BLI_INODE_BUF) {
+ if (flags & XFS_BLF_INODE_BUF) {
error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
} else if (flags &
- (XFS_BLI_UDQUOT_BUF|XFS_BLI_PDQUOT_BUF|XFS_BLI_GDQUOT_BUF)) {
+ (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f);
} else {
xlog_recover_do_reg_buffer(mp, item, bp, buf_f);
#define XLOG_RHASH(tid) \
((((__uint32_t)tid)>>XLOG_RHASH_SHIFT) & (XLOG_RHASH_SIZE-1))
-#define XLOG_MAX_REGIONS_IN_ITEM (XFS_MAX_BLOCKSIZE / XFS_BLI_CHUNK / 2 + 1)
+#define XLOG_MAX_REGIONS_IN_ITEM (XFS_MAX_BLOCKSIZE / XFS_BLF_CHUNK / 2 + 1)
/*
#define XFS_MOUNT_WSYNC (1ULL << 0) /* for nfs - all metadata ops
must be synchronous except
for space allocations */
+#define XFS_MOUNT_DELAYLOG (1ULL << 1) /* delayed logging is enabled */
#define XFS_MOUNT_DMAPI (1ULL << 2) /* dmapi is enabled */
#define XFS_MOUNT_WAS_CLEAN (1ULL << 3)
#define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem
#include "xfs_trans_priv.h"
#include "xfs_trans_space.h"
#include "xfs_inode_item.h"
+#include "xfs_trace.h"
kmem_zone_t *xfs_trans_zone;
tp->t_type = type;
tp->t_mountp = mp;
tp->t_items_free = XFS_LIC_NUM_SLOTS;
- tp->t_busy_free = XFS_LBC_NUM_SLOTS;
xfs_lic_init(&(tp->t_items));
- XFS_LBC_INIT(&(tp->t_busy));
+ INIT_LIST_HEAD(&tp->t_busy);
return tp;
}
*/
STATIC void
xfs_trans_free(
- xfs_trans_t *tp)
+ struct xfs_trans *tp)
{
+ struct xfs_busy_extent *busyp, *n;
+
+ list_for_each_entry_safe(busyp, n, &tp->t_busy, list)
+ xfs_alloc_busy_clear(tp->t_mountp, busyp);
+
atomic_dec(&tp->t_mountp->m_active_trans);
xfs_trans_free_dqinfo(tp);
kmem_zone_free(xfs_trans_zone, tp);
ntp->t_type = tp->t_type;
ntp->t_mountp = tp->t_mountp;
ntp->t_items_free = XFS_LIC_NUM_SLOTS;
- ntp->t_busy_free = XFS_LBC_NUM_SLOTS;
xfs_lic_init(&(ntp->t_items));
- XFS_LBC_INIT(&(ntp->t_busy));
+ INIT_LIST_HEAD(&ntp->t_busy);
ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
ASSERT(tp->t_ticket != NULL);
return error;
}
-
/*
* Record the indicated change to the given field for application
* to the file system's superblock when the transaction commits.
* XFS_TRANS_SB_DIRTY will not be set when the transaction is updated but we
* still need to update the incore superblock with the changes.
*/
-STATIC void
+void
xfs_trans_unreserve_and_mod_sb(
xfs_trans_t *tp)
{
* they could be immediately flushed and we'd have to race with the flusher
* trying to pull the item from the AIL as we add it.
*/
-static void
+void
xfs_trans_item_committed(
struct xfs_log_item *lip,
xfs_lsn_t commit_lsn,
IOP_UNPIN(lip);
}
-/* Clear all the per-AG busy list items listed in this transaction */
-static void
-xfs_trans_clear_busy_extents(
- struct xfs_trans *tp)
-{
- xfs_log_busy_chunk_t *lbcp;
- xfs_log_busy_slot_t *lbsp;
- int i;
-
- for (lbcp = &tp->t_busy; lbcp != NULL; lbcp = lbcp->lbc_next) {
- i = 0;
- for (lbsp = lbcp->lbc_busy; i < lbcp->lbc_unused; i++, lbsp++) {
- if (XFS_LBC_ISFREE(lbcp, i))
- continue;
- xfs_alloc_clear_busy(tp, lbsp->lbc_ag, lbsp->lbc_idx);
- }
- }
- xfs_trans_free_busy(tp);
-}
-
/*
* This is typically called by the LM when a transaction has been fully
* committed to disk. It needs to unpin the items which have
kmem_free(licp);
}
- xfs_trans_clear_busy_extents(tp);
xfs_trans_free(tp);
}
xfs_trans_unreserve_and_mod_sb(tp);
xfs_trans_unreserve_and_mod_dquots(tp);
- xfs_trans_free_items(tp, flags);
- xfs_trans_free_busy(tp);
+ xfs_trans_free_items(tp, NULLCOMMITLSN, flags);
xfs_trans_free(tp);
}
*commit_lsn = xfs_log_done(mp, tp->t_ticket, &commit_iclog, log_flags);
tp->t_commit_lsn = *commit_lsn;
+ trace_xfs_trans_commit_lsn(tp);
+
if (nvec > XFS_TRANS_LOGVEC_COUNT)
kmem_free(log_vector);
return xfs_log_release_iclog(mp, commit_iclog);
}
+/*
+ * Walk the log items and allocate log vector structures for
+ * each item large enough to fit all the vectors they require.
+ * Note that this format differs from the old log vector format in
+ * that there is no transaction header in these log vectors.
+ */
+STATIC struct xfs_log_vec *
+xfs_trans_alloc_log_vecs(
+ xfs_trans_t *tp)
+{
+ xfs_log_item_desc_t *lidp;
+ struct xfs_log_vec *lv = NULL;
+ struct xfs_log_vec *ret_lv = NULL;
+
+ lidp = xfs_trans_first_item(tp);
+
+ /* Bail out if we didn't find a log item. */
+ if (!lidp) {
+ ASSERT(0);
+ return NULL;
+ }
+
+ while (lidp != NULL) {
+ struct xfs_log_vec *new_lv;
+
+ /* Skip items which aren't dirty in this transaction. */
+ if (!(lidp->lid_flags & XFS_LID_DIRTY)) {
+ lidp = xfs_trans_next_item(tp, lidp);
+ continue;
+ }
+
+ /* Skip items that do not have any vectors for writing */
+ lidp->lid_size = IOP_SIZE(lidp->lid_item);
+ if (!lidp->lid_size) {
+ lidp = xfs_trans_next_item(tp, lidp);
+ continue;
+ }
+
+ new_lv = kmem_zalloc(sizeof(*new_lv) +
+ lidp->lid_size * sizeof(struct xfs_log_iovec),
+ KM_SLEEP);
+
+ /* The allocated iovec region lies beyond the log vector. */
+ new_lv->lv_iovecp = (struct xfs_log_iovec *)&new_lv[1];
+ new_lv->lv_niovecs = lidp->lid_size;
+ new_lv->lv_item = lidp->lid_item;
+ if (!ret_lv)
+ ret_lv = new_lv;
+ else
+ lv->lv_next = new_lv;
+ lv = new_lv;
+ lidp = xfs_trans_next_item(tp, lidp);
+ }
+
+ return ret_lv;
+}
+
+static int
+xfs_trans_commit_cil(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ xfs_lsn_t *commit_lsn,
+ int flags)
+{
+ struct xfs_log_vec *log_vector;
+ int error;
+
+ /*
+ * Get each log item to allocate a vector structure for
+ * the log item to to pass to the log write code. The
+ * CIL commit code will format the vector and save it away.
+ */
+ log_vector = xfs_trans_alloc_log_vecs(tp);
+ if (!log_vector)
+ return ENOMEM;
+
+ error = xfs_log_commit_cil(mp, tp, log_vector, commit_lsn, flags);
+ if (error)
+ return error;
+
+ current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
+
+ /* xfs_trans_free_items() unlocks them first */
+ xfs_trans_free_items(tp, *commit_lsn, 0);
+ xfs_trans_free(tp);
+ return 0;
+}
/*
* xfs_trans_commit
xfs_trans_apply_sb_deltas(tp);
xfs_trans_apply_dquot_deltas(tp);
- error = xfs_trans_commit_iclog(mp, tp, &commit_lsn, flags);
+ if (mp->m_flags & XFS_MOUNT_DELAYLOG)
+ error = xfs_trans_commit_cil(mp, tp, &commit_lsn, flags);
+ else
+ error = xfs_trans_commit_iclog(mp, tp, &commit_lsn, flags);
+
if (error == ENOMEM) {
xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR);
error = XFS_ERROR(EIO);
error = XFS_ERROR(EIO);
}
current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
- xfs_trans_free_items(tp, error ? XFS_TRANS_ABORT : 0);
- xfs_trans_free_busy(tp);
+ xfs_trans_free_items(tp, NULLCOMMITLSN, error ? XFS_TRANS_ABORT : 0);
xfs_trans_free(tp);
XFS_STATS_INC(xs_trans_empty);
/* mark this thread as no longer being in a transaction */
current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS);
- xfs_trans_free_items(tp, flags);
- xfs_trans_free_busy(tp);
+ xfs_trans_free_items(tp, NULLCOMMITLSN, flags);
xfs_trans_free(tp);
}
#define XFS_TRANS_GROWFSRT_FREE 39
#define XFS_TRANS_SWAPEXT 40
#define XFS_TRANS_SB_COUNT 41
-#define XFS_TRANS_TYPE_MAX 41
+#define XFS_TRANS_CHECKPOINT 42
+#define XFS_TRANS_TYPE_MAX 42
/* new transaction types need to be reflected in xfs_logprint(8) */
#define XFS_TRANS_TYPES \
{ XFS_TRANS_GROWFSRT_FREE, "GROWFSRT_FREE" }, \
{ XFS_TRANS_SWAPEXT, "SWAPEXT" }, \
{ XFS_TRANS_SB_COUNT, "SB_COUNT" }, \
+ { XFS_TRANS_CHECKPOINT, "CHECKPOINT" }, \
{ XFS_TRANS_DUMMY1, "DUMMY1" }, \
{ XFS_TRANS_DUMMY2, "DUMMY2" }, \
{ XLOG_UNMOUNT_REC_TYPE, "UNMOUNT" }
struct xfs_mount;
struct xfs_trans;
struct xfs_dquot_acct;
+struct xfs_busy_extent;
typedef struct xfs_log_item {
struct list_head li_ail; /* AIL pointers */
/* buffer item iodone */
/* callback func */
struct xfs_item_ops *li_ops; /* function list */
+
+ /* delayed logging */
+ struct list_head li_cil; /* CIL pointers */
+ struct xfs_log_vec *li_lv; /* active log vector */
+ xfs_lsn_t li_seq; /* CIL commit seq */
} xfs_log_item_t;
#define XFS_LI_IN_AIL 0x1
#define XFS_ITEM_LOCKED 2
#define XFS_ITEM_PUSHBUF 3
-/*
- * This structure is used to maintain a list of block ranges that have been
- * freed in the transaction. The ranges are listed in the perag[] busy list
- * between when they're freed and the transaction is committed to disk.
- */
-
-typedef struct xfs_log_busy_slot {
- xfs_agnumber_t lbc_ag;
- ushort lbc_idx; /* index in perag.busy[] */
-} xfs_log_busy_slot_t;
-
-#define XFS_LBC_NUM_SLOTS 31
-typedef struct xfs_log_busy_chunk {
- struct xfs_log_busy_chunk *lbc_next;
- uint lbc_free; /* free slots bitmask */
- ushort lbc_unused; /* first unused */
- xfs_log_busy_slot_t lbc_busy[XFS_LBC_NUM_SLOTS];
-} xfs_log_busy_chunk_t;
-
-#define XFS_LBC_MAX_SLOT (XFS_LBC_NUM_SLOTS - 1)
-#define XFS_LBC_FREEMASK ((1U << XFS_LBC_NUM_SLOTS) - 1)
-
-#define XFS_LBC_INIT(cp) ((cp)->lbc_free = XFS_LBC_FREEMASK)
-#define XFS_LBC_CLAIM(cp, slot) ((cp)->lbc_free &= ~(1 << (slot)))
-#define XFS_LBC_SLOT(cp, slot) (&((cp)->lbc_busy[(slot)]))
-#define XFS_LBC_VACANCY(cp) (((cp)->lbc_free) & XFS_LBC_FREEMASK)
-#define XFS_LBC_ISFREE(cp, slot) ((cp)->lbc_free & (1 << (slot)))
-
/*
* This is the type of function which can be given to xfs_trans_callback()
* to be called upon the transaction's commit to disk.
unsigned int t_items_free; /* log item descs free */
xfs_log_item_chunk_t t_items; /* first log item desc chunk */
xfs_trans_header_t t_header; /* header for in-log trans */
- unsigned int t_busy_free; /* busy descs free */
- xfs_log_busy_chunk_t t_busy; /* busy/async free blocks */
+ struct list_head t_busy; /* list of busy extents */
unsigned long t_pflags; /* saved process flags state */
} xfs_trans_t;
void xfs_trans_cancel(xfs_trans_t *, int);
int xfs_trans_ail_init(struct xfs_mount *);
void xfs_trans_ail_destroy(struct xfs_mount *);
-xfs_log_busy_slot_t *xfs_trans_add_busy(xfs_trans_t *tp,
- xfs_agnumber_t ag,
- xfs_extlen_t idx);
extern kmem_zone_t *xfs_trans_zone;
xfs_buf_item_init(bp, tp->t_mountp);
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
- ASSERT(!(bip->bli_format.blf_flags & XFS_BLI_CANCEL));
+ ASSERT(!(bip->bli_format.blf_flags & XFS_BLF_CANCEL));
ASSERT(!(bip->bli_flags & XFS_BLI_LOGGED));
if (reset_recur)
bip->bli_recur = 0;
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(bip->bli_item.li_type == XFS_LI_BUF);
ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
- ASSERT(!(bip->bli_format.blf_flags & XFS_BLI_CANCEL));
+ ASSERT(!(bip->bli_format.blf_flags & XFS_BLF_CANCEL));
ASSERT(atomic_read(&bip->bli_refcount) > 0);
/*
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
- ASSERT(!(bip->bli_format.blf_flags & XFS_BLI_CANCEL));
+ ASSERT(!(bip->bli_format.blf_flags & XFS_BLF_CANCEL));
ASSERT(atomic_read(&bip->bli_refcount) > 0);
bip->bli_flags |= XFS_BLI_HOLD;
trace_xfs_trans_bhold(bip);
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
- ASSERT(!(bip->bli_format.blf_flags & XFS_BLI_CANCEL));
+ ASSERT(!(bip->bli_format.blf_flags & XFS_BLF_CANCEL));
ASSERT(atomic_read(&bip->bli_refcount) > 0);
ASSERT(bip->bli_flags & XFS_BLI_HOLD);
bip->bli_flags &= ~XFS_BLI_HOLD;
bip->bli_flags &= ~XFS_BLI_STALE;
ASSERT(XFS_BUF_ISSTALE(bp));
XFS_BUF_UNSTALE(bp);
- bip->bli_format.blf_flags &= ~XFS_BLI_CANCEL;
+ bip->bli_format.blf_flags &= ~XFS_BLF_CANCEL;
}
lidp = xfs_trans_find_item(tp, (xfs_log_item_t*)bip);
ASSERT(!(XFS_BUF_ISDELAYWRITE(bp)));
ASSERT(XFS_BUF_ISSTALE(bp));
ASSERT(!(bip->bli_flags & (XFS_BLI_LOGGED | XFS_BLI_DIRTY)));
- ASSERT(!(bip->bli_format.blf_flags & XFS_BLI_INODE_BUF));
- ASSERT(bip->bli_format.blf_flags & XFS_BLI_CANCEL);
+ ASSERT(!(bip->bli_format.blf_flags & XFS_BLF_INODE_BUF));
+ ASSERT(bip->bli_format.blf_flags & XFS_BLF_CANCEL);
ASSERT(lidp->lid_flags & XFS_LID_DIRTY);
ASSERT(tp->t_flags & XFS_TRANS_DIRTY);
return;
* in the buf log item. The STALE flag will be used in
* xfs_buf_item_unpin() to determine if it should clean up
* when the last reference to the buf item is given up.
- * We set the XFS_BLI_CANCEL flag in the buf log format structure
+ * We set the XFS_BLF_CANCEL flag in the buf log format structure
* and log the buf item. This will be used at recovery time
* to determine that copies of the buffer in the log before
* this should not be replayed.
XFS_BUF_UNDELAYWRITE(bp);
XFS_BUF_STALE(bp);
bip->bli_flags |= XFS_BLI_STALE;
- bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_DIRTY);
- bip->bli_format.blf_flags &= ~XFS_BLI_INODE_BUF;
- bip->bli_format.blf_flags |= XFS_BLI_CANCEL;
+ bip->bli_flags &= ~(XFS_BLI_INODE_BUF | XFS_BLI_LOGGED | XFS_BLI_DIRTY);
+ bip->bli_format.blf_flags &= ~XFS_BLF_INODE_BUF;
+ bip->bli_format.blf_flags |= XFS_BLF_CANCEL;
memset((char *)(bip->bli_format.blf_data_map), 0,
(bip->bli_format.blf_map_size * sizeof(uint)));
lidp->lid_flags |= XFS_LID_DIRTY;
}
/*
- * This call is used to indicate that the buffer contains on-disk
- * inodes which must be handled specially during recovery. They
- * require special handling because only the di_next_unlinked from
- * the inodes in the buffer should be recovered. The rest of the
- * data in the buffer is logged via the inodes themselves.
+ * This call is used to indicate that the buffer contains on-disk inodes which
+ * must be handled specially during recovery. They require special handling
+ * because only the di_next_unlinked from the inodes in the buffer should be
+ * recovered. The rest of the data in the buffer is logged via the inodes
+ * themselves.
*
- * All we do is set the XFS_BLI_INODE_BUF flag in the buffer's log
- * format structure so that we'll know what to do at recovery time.
+ * All we do is set the XFS_BLI_INODE_BUF flag in the items flags so it can be
+ * transferred to the buffer's log format structure so that we'll know what to
+ * do at recovery time.
*/
-/* ARGSUSED */
void
xfs_trans_inode_buf(
xfs_trans_t *tp,
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(atomic_read(&bip->bli_refcount) > 0);
- bip->bli_format.blf_flags |= XFS_BLI_INODE_BUF;
+ bip->bli_flags |= XFS_BLI_INODE_BUF;
}
/*
ASSERT(XFS_BUF_ISBUSY(bp));
ASSERT(XFS_BUF_FSPRIVATE2(bp, xfs_trans_t *) == tp);
ASSERT(XFS_BUF_FSPRIVATE(bp, void *) != NULL);
- ASSERT(type == XFS_BLI_UDQUOT_BUF ||
- type == XFS_BLI_PDQUOT_BUF ||
- type == XFS_BLI_GDQUOT_BUF);
+ ASSERT(type == XFS_BLF_UDQUOT_BUF ||
+ type == XFS_BLF_PDQUOT_BUF ||
+ type == XFS_BLF_GDQUOT_BUF);
bip = XFS_BUF_FSPRIVATE(bp, xfs_buf_log_item_t *);
ASSERT(atomic_read(&bip->bli_refcount) > 0);
void
xfs_trans_free_items(
xfs_trans_t *tp,
+ xfs_lsn_t commit_lsn,
int flags)
{
xfs_log_item_chunk_t *licp;
* Special case the embedded chunk so we don't free it below.
*/
if (!xfs_lic_are_all_free(licp)) {
- (void) xfs_trans_unlock_chunk(licp, 1, abort, NULLCOMMITLSN);
+ (void) xfs_trans_unlock_chunk(licp, 1, abort, commit_lsn);
xfs_lic_all_free(licp);
licp->lic_unused = 0;
}
*/
while (licp != NULL) {
ASSERT(!xfs_lic_are_all_free(licp));
- (void) xfs_trans_unlock_chunk(licp, 1, abort, NULLCOMMITLSN);
+ (void) xfs_trans_unlock_chunk(licp, 1, abort, commit_lsn);
next_licp = licp->lic_next;
kmem_free(licp);
licp = next_licp;
return freed;
}
-
-
-/*
- * This is called to add the given busy item to the transaction's
- * list of busy items. It must find a free busy item descriptor
- * or allocate a new one and add the item to that descriptor.
- * The function returns a pointer to busy descriptor used to point
- * to the new busy entry. The log busy entry will now point to its new
- * descriptor with its ???? field.
- */
-xfs_log_busy_slot_t *
-xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
-{
- xfs_log_busy_chunk_t *lbcp;
- xfs_log_busy_slot_t *lbsp;
- int i=0;
-
- /*
- * If there are no free descriptors, allocate a new chunk
- * of them and put it at the front of the chunk list.
- */
- if (tp->t_busy_free == 0) {
- lbcp = (xfs_log_busy_chunk_t*)
- kmem_alloc(sizeof(xfs_log_busy_chunk_t), KM_SLEEP);
- ASSERT(lbcp != NULL);
- /*
- * Initialize the chunk, and then
- * claim the first slot in the newly allocated chunk.
- */
- XFS_LBC_INIT(lbcp);
- XFS_LBC_CLAIM(lbcp, 0);
- lbcp->lbc_unused = 1;
- lbsp = XFS_LBC_SLOT(lbcp, 0);
-
- /*
- * Link in the new chunk and update the free count.
- */
- lbcp->lbc_next = tp->t_busy.lbc_next;
- tp->t_busy.lbc_next = lbcp;
- tp->t_busy_free = XFS_LIC_NUM_SLOTS - 1;
-
- /*
- * Initialize the descriptor and the generic portion
- * of the log item.
- *
- * Point the new slot at this item and return it.
- * Also point the log item at its currently active
- * descriptor and set the item's mount pointer.
- */
- lbsp->lbc_ag = ag;
- lbsp->lbc_idx = idx;
- return lbsp;
- }
-
- /*
- * Find the free descriptor. It is somewhere in the chunklist
- * of descriptors.
- */
- lbcp = &tp->t_busy;
- while (lbcp != NULL) {
- if (XFS_LBC_VACANCY(lbcp)) {
- if (lbcp->lbc_unused <= XFS_LBC_MAX_SLOT) {
- i = lbcp->lbc_unused;
- break;
- } else {
- /* out-of-order vacancy */
- cmn_err(CE_DEBUG, "OOO vacancy lbcp 0x%p\n", lbcp);
- ASSERT(0);
- }
- }
- lbcp = lbcp->lbc_next;
- }
- ASSERT(lbcp != NULL);
- /*
- * If we find a free descriptor, claim it,
- * initialize it, and return it.
- */
- XFS_LBC_CLAIM(lbcp, i);
- if (lbcp->lbc_unused <= i) {
- lbcp->lbc_unused = i + 1;
- }
- lbsp = XFS_LBC_SLOT(lbcp, i);
- tp->t_busy_free--;
- lbsp->lbc_ag = ag;
- lbsp->lbc_idx = idx;
- return lbsp;
-}
-
-
-/*
- * xfs_trans_free_busy
- * Free all of the busy lists from a transaction
- */
-void
-xfs_trans_free_busy(xfs_trans_t *tp)
-{
- xfs_log_busy_chunk_t *lbcp;
- xfs_log_busy_chunk_t *lbcq;
-
- lbcp = tp->t_busy.lbc_next;
- while (lbcp != NULL) {
- lbcq = lbcp->lbc_next;
- kmem_free(lbcp);
- lbcp = lbcq;
- }
-
- XFS_LBC_INIT(&tp->t_busy);
- tp->t_busy.lbc_unused = 0;
-}
struct xfs_log_item_desc *xfs_trans_first_item(struct xfs_trans *);
struct xfs_log_item_desc *xfs_trans_next_item(struct xfs_trans *,
struct xfs_log_item_desc *);
-void xfs_trans_free_items(struct xfs_trans *, int);
-void xfs_trans_unlock_items(struct xfs_trans *,
- xfs_lsn_t);
-void xfs_trans_free_busy(xfs_trans_t *tp);
-xfs_log_busy_slot_t *xfs_trans_add_busy(xfs_trans_t *tp,
- xfs_agnumber_t ag,
- xfs_extlen_t idx);
+
+void xfs_trans_unlock_items(struct xfs_trans *tp, xfs_lsn_t commit_lsn);
+void xfs_trans_free_items(struct xfs_trans *tp, xfs_lsn_t commit_lsn,
+ int flags);
+
+void xfs_trans_item_committed(struct xfs_log_item *lip,
+ xfs_lsn_t commit_lsn, int aborted);
+void xfs_trans_unreserve_and_mod_sb(struct xfs_trans *tp);
/*
* AIL traversal cursor.
typedef __uint16_t xfs_prid_t; /* prid_t truncated to 16bits in XFS */
+typedef __uint32_t xlog_tid_t; /* transaction ID type */
+
/*
* These types are 64 bits on disk but are either 32 or 64 bits in memory.
* Disk based types:
* atomic_read - read atomic variable
* @v: pointer of type atomic_t
*
- * Atomically reads the value of @v. Note that the guaranteed
- * useful range of an atomic_t is only 24 bits.
+ * Atomically reads the value of @v.
*/
#define atomic_read(v) (*(volatile int *)&(v)->counter)
* @v: pointer of type atomic_t
* @i: required value
*
- * Atomically sets the value of @v to @i. Note that the guaranteed
- * useful range of an atomic_t is only 24 bits.
+ * Atomically sets the value of @v to @i.
*/
#define atomic_set(v, i) (((v)->counter) = (i))
* @v: pointer of type atomic_t
*
* Atomically adds @i to @v and returns the result
- * Note that the guaranteed useful range of an atomic_t is only 24 bits.
*/
static inline int atomic_add_return(int i, atomic_t *v)
{
* @v: pointer of type atomic_t
*
* Atomically subtracts @i from @v and returns the result
- * Note that the guaranteed useful range of an atomic_t is only 24 bits.
*/
static inline int atomic_sub_return(int i, atomic_t *v)
{
KMAP_D(17) KM_NMI,
KMAP_D(18) KM_NMI_PTE,
KMAP_D(19) KM_KDB,
+/*
+ * Remember to update debug_kmap_atomic() when adding new kmap types!
+ */
KMAP_D(20) KM_TYPE_NR
};
#ifndef __BIG_ENDIAN_BITFIELD
#define __BIG_ENDIAN_BITFIELD
#endif
+#ifndef __BYTE_ORDER
+#define __BYTE_ORDER __BIG_ENDIAN
+#endif
#include <linux/types.h>
#include <linux/swab.h>
#ifndef __LITTLE_ENDIAN_BITFIELD
#define __LITTLE_ENDIAN_BITFIELD
#endif
+#ifndef __BYTE_ORDER
+#define __BYTE_ORDER __LITTLE_ENDIAN
+#endif
#include <linux/types.h>
#include <linux/swab.h>
--- /dev/null
+#ifndef _LINUX_COMPACTION_H
+#define _LINUX_COMPACTION_H
+
+/* Return values for compact_zone() and try_to_compact_pages() */
+/* compaction didn't start as it was not possible or direct reclaim was more suitable */
+#define COMPACT_SKIPPED 0
+/* compaction should continue to another pageblock */
+#define COMPACT_CONTINUE 1
+/* direct compaction partially compacted a zone and there are suitable pages */
+#define COMPACT_PARTIAL 2
+/* The full zone was compacted */
+#define COMPACT_COMPLETE 3
+
+#ifdef CONFIG_COMPACTION
+extern int sysctl_compact_memory;
+extern int sysctl_compaction_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos);
+extern int sysctl_extfrag_threshold;
+extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos);
+
+extern int fragmentation_index(struct zone *zone, unsigned int order);
+extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
+ int order, gfp_t gfp_mask, nodemask_t *mask);
+
+/* Do not skip compaction more than 64 times */
+#define COMPACT_MAX_DEFER_SHIFT 6
+
+/*
+ * Compaction is deferred when compaction fails to result in a page
+ * allocation success. 1 << compact_defer_limit compactions are skipped up
+ * to a limit of 1 << COMPACT_MAX_DEFER_SHIFT
+ */
+static inline void defer_compaction(struct zone *zone)
+{
+ zone->compact_considered = 0;
+ zone->compact_defer_shift++;
+
+ if (zone->compact_defer_shift > COMPACT_MAX_DEFER_SHIFT)
+ zone->compact_defer_shift = COMPACT_MAX_DEFER_SHIFT;
+}
+
+/* Returns true if compaction should be skipped this time */
+static inline bool compaction_deferred(struct zone *zone)
+{
+ unsigned long defer_limit = 1UL << zone->compact_defer_shift;
+
+ /* Avoid possible overflow */
+ if (++zone->compact_considered > defer_limit)
+ zone->compact_considered = defer_limit;
+
+ return zone->compact_considered < (1UL << zone->compact_defer_shift);
+}
+
+#else
+static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
+ int order, gfp_t gfp_mask, nodemask_t *nodemask)
+{
+ return COMPACT_CONTINUE;
+}
+
+static inline void defer_compaction(struct zone *zone)
+{
+}
+
+static inline bool compaction_deferred(struct zone *zone)
+{
+ return 1;
+}
+
+#endif /* CONFIG_COMPACTION */
+
+#if defined(CONFIG_COMPACTION) && defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+extern int compaction_register_node(struct node *node);
+extern void compaction_unregister_node(struct node *node);
+
+#else
+
+static inline int compaction_register_node(struct node *node)
+{
+ return 0;
+}
+
+static inline void compaction_unregister_node(struct node *node)
+{
+}
+#endif /* CONFIG_COMPACTION && CONFIG_SYSFS && CONFIG_NUMA */
+
+#endif /* _LINUX_COMPACTION_H */
extern void cpuset_print_task_mems_allowed(struct task_struct *p);
+/*
+ * reading current mems_allowed and mempolicy in the fastpath must protected
+ * by get_mems_allowed()
+ */
+static inline void get_mems_allowed(void)
+{
+ current->mems_allowed_change_disable++;
+
+ /*
+ * ensure that reading mems_allowed and mempolicy happens after the
+ * update of ->mems_allowed_change_disable.
+ *
+ * the write-side task finds ->mems_allowed_change_disable is not 0,
+ * and knows the read-side task is reading mems_allowed or mempolicy,
+ * so it will clear old bits lazily.
+ */
+ smp_mb();
+}
+
+static inline void put_mems_allowed(void)
+{
+ /*
+ * ensure that reading mems_allowed and mempolicy before reducing
+ * mems_allowed_change_disable.
+ *
+ * the write-side task will know that the read-side task is still
+ * reading mems_allowed or mempolicy, don't clears old bits in the
+ * nodemask.
+ */
+ smp_mb();
+ --ACCESS_ONCE(current->mems_allowed_change_disable);
+}
+
static inline void set_mems_allowed(nodemask_t nodemask)
{
+ task_lock(current);
current->mems_allowed = nodemask;
+ task_unlock(current);
}
#else /* !CONFIG_CPUSETS */
{
}
+static inline void get_mems_allowed(void)
+{
+}
+
+static inline void put_mems_allowed(void)
+{
+}
+
#endif /* !CONFIG_CPUSETS */
#endif /* _LINUX_CPUSET_H */
/*
* The flags field controls the behaviour at the callsite.
* The bits here are changed dynamically when the user
- * writes commands to <debugfs>/dynamic_debug/ddebug
+ * writes commands to <debugfs>/dynamic_debug/control
*/
#define _DPRINTK_FLAGS_PRINT (1<<0) /* printk() a message using the format */
#define _DPRINTK_FLAGS_DEFAULT 0
#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)
-static inline void *ERR_PTR(long error)
+static inline void * __must_check ERR_PTR(long error)
{
return (void *) error;
}
-static inline long PTR_ERR(const void *ptr)
+static inline long __must_check PTR_ERR(const void *ptr)
{
return (long) ptr;
}
-static inline long IS_ERR(const void *ptr)
+static inline long __must_check IS_ERR(const void *ptr)
{
return IS_ERR_VALUE((unsigned long)ptr);
}
-static inline long IS_ERR_OR_NULL(const void *ptr)
+static inline long __must_check IS_ERR_OR_NULL(const void *ptr)
{
return !ptr || IS_ERR_VALUE((unsigned long)ptr);
}
* Explicitly cast an error-valued pointer to another pointer type in such a
* way as to make it clear that's what's going on.
*/
-static inline void *ERR_CAST(const void *ptr)
+static inline void * __must_check ERR_CAST(const void *ptr)
{
/* cast away the const */
return (void *) ptr;
#define FBIOGET_HWCINFO 0x4616
#define FBIOPUT_MODEINFO 0x4617
#define FBIOGET_DISPINFO 0x4618
-
+#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
#define FB_TYPE_PACKED_PIXELS 0 /* Packed Pixels */
#define FB_TYPE_PLANES 1 /* Non interleaved planes */
* Zone modifiers (see linux/mmzone.h - low three bits)
*
* Do not put any conditional on these. If necessary modify the definitions
- * without the underscores and use the consistently. The definitions here may
+ * without the underscores and use them consistently. The definitions here may
* be used in bit comparisons.
*/
#define __GFP_DMA ((__force gfp_t)0x01u)
__GFP_NORETRY|__GFP_NOMEMALLOC)
/* Control slab gfp mask during early boot */
-#define GFP_BOOT_MASK __GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS)
+#define GFP_BOOT_MASK (__GFP_BITS_MASK & ~(__GFP_WAIT|__GFP_IO|__GFP_FS))
/* Control allocation constraints */
#define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
* GFP_ZONE_TABLE is a word size bitstring that is used for looking up the
* zone to use given the lowest 4 bits of gfp_t. Entries are ZONE_SHIFT long
* and there are 16 of them to cover all possible combinations of
- * __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM
+ * __GFP_DMA, __GFP_DMA32, __GFP_MOVABLE and __GFP_HIGHMEM.
*
* The zone fallback order is MOVABLE=>HIGHMEM=>NORMAL=>DMA32=>DMA.
* But GFP_MOVABLE is not only a zone specifier but also an allocation
* policy. Therefore __GFP_MOVABLE plus another zone selector is valid.
- * Only 1bit of the lowest 3 bit (DMA,DMA32,HIGHMEM) can be set to "1".
+ * Only 1 bit of the lowest 3 bits (DMA,DMA32,HIGHMEM) can be set to "1".
*
* bit result
* =================
#define GFP_ZONE_TABLE ( \
(ZONE_NORMAL << 0 * ZONES_SHIFT) \
- | (OPT_ZONE_DMA << __GFP_DMA * ZONES_SHIFT) \
+ | (OPT_ZONE_DMA << __GFP_DMA * ZONES_SHIFT) \
| (OPT_ZONE_HIGHMEM << __GFP_HIGHMEM * ZONES_SHIFT) \
| (OPT_ZONE_DMA32 << __GFP_DMA32 * ZONES_SHIFT) \
| (ZONE_NORMAL << __GFP_MOVABLE * ZONES_SHIFT) \
)
/*
- * GFP_ZONE_BAD is a bitmap for all combination of __GFP_DMA, __GFP_DMA32
+ * GFP_ZONE_BAD is a bitmap for all combinations of __GFP_DMA, __GFP_DMA32
* __GFP_HIGHMEM and __GFP_MOVABLE that are not permitted. One flag per
* entry starting with bit 0. Bit is set if the combination is not
* allowed.
void free_pages_exact(void *virt, size_t size);
#define __get_free_page(gfp_mask) \
- __get_free_pages((gfp_mask),0)
+ __get_free_pages((gfp_mask), 0)
#define __get_dma_pages(gfp_mask, order) \
- __get_free_pages((gfp_mask) | GFP_DMA,(order))
+ __get_free_pages((gfp_mask) | GFP_DMA, (order))
extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
extern void free_hot_cold_page(struct page *page, int cold);
#define __free_page(page) __free_pages((page), 0)
-#define free_page(addr) free_pages((addr),0)
+#define free_page(addr) free_pages((addr), 0)
void page_alloc_init(void);
void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
#include <asm/kmap_types.h>
-#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT)
+#ifdef CONFIG_DEBUG_HIGHMEM
void debug_kmap_atomic(enum km_type type);
};
#define IVTVFB_IOC_DMA_FRAME _IOW('V', BASE_VIDIOC_PRIVATE+0, struct ivtvfb_dma_frame)
-#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
#endif
extern const char linux_banner[];
extern const char linux_proc_banner[];
-#define USHORT_MAX ((u16)(~0U))
-#define SHORT_MAX ((s16)(USHORT_MAX>>1))
-#define SHORT_MIN (-SHORT_MAX - 1)
+#define USHRT_MAX ((u16)(~0U))
+#define SHRT_MAX ((s16)(USHRT_MAX>>1))
+#define SHRT_MIN ((s16)(-SHRT_MAX - 1))
#define INT_MAX ((int)(~0U>>1))
#define INT_MIN (-INT_MAX - 1)
#define UINT_MAX (~0U)
return buf;
}
+extern int hex_to_bin(char ch);
+
#ifndef pr_fmt
#define pr_fmt(fmt) fmt
#endif
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
#define pr_warning(fmt, ...) \
printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn pr_warning
#define pr_notice(fmt, ...) \
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
#define pr_info(fmt, ...) \
* no local ratelimit_state used in the !PRINTK case
*/
#ifdef CONFIG_PRINTK
-#define printk_ratelimited(fmt, ...) ({ \
- static struct ratelimit_state _rs = { \
- .interval = DEFAULT_RATELIMIT_INTERVAL, \
- .burst = DEFAULT_RATELIMIT_BURST, \
- }; \
- \
- if (__ratelimit(&_rs)) \
- printk(fmt, ##__VA_ARGS__); \
+#define printk_ratelimited(fmt, ...) ({ \
+ static DEFINE_RATELIMIT_STATE(_rs, \
+ DEFAULT_RATELIMIT_INTERVAL, \
+ DEFAULT_RATELIMIT_BURST); \
+ \
+ if (__ratelimit(&_rs)) \
+ printk(fmt, ##__VA_ARGS__); \
})
#else
/* No effect, but we still get type checking even in the !PRINTK case: */
printk_ratelimited(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
#define pr_warning_ratelimited(fmt, ...) \
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+#define pr_warn_ratelimited pr_warning_ratelimited
#define pr_notice_ratelimited(fmt, ...) \
printk_ratelimited(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
#define pr_info_ratelimited(fmt, ...) \
#define LIS3_IRQ1_FF_WU_12 (3 << 0)
#define LIS3_IRQ1_DATA_READY (4 << 0)
#define LIS3_IRQ1_CLICK (7 << 0)
+#define LIS3_IRQ1_MASK (7 << 0)
#define LIS3_IRQ2_DISABLE (0 << 3)
#define LIS3_IRQ2_FF_WU_1 (1 << 3)
#define LIS3_IRQ2_FF_WU_2 (2 << 3)
#define LIS3_IRQ2_FF_WU_12 (3 << 3)
#define LIS3_IRQ2_DATA_READY (4 << 3)
#define LIS3_IRQ2_CLICK (7 << 3)
+#define LIS3_IRQ2_MASK (7 << 3)
#define LIS3_IRQ_OPEN_DRAIN (1 << 6)
#define LIS3_IRQ_ACTIVE_LOW (1 << 7)
unsigned char irq_cfg;
#define LIS3_WAKEUP_Z_HI (1 << 5)
unsigned char wakeup_flags;
unsigned char wakeup_thresh;
+ unsigned char wakeup_flags2;
+ unsigned char wakeup_thresh2;
+#define LIS3_HIPASS_CUTFF_8HZ 0
+#define LIS3_HIPASS_CUTFF_4HZ 1
+#define LIS3_HIPASS_CUTFF_2HZ 2
+#define LIS3_HIPASS_CUTFF_1HZ 3
+#define LIS3_HIPASS1_DISABLE (1 << 2)
+#define LIS3_HIPASS2_DISABLE (1 << 3)
+ unsigned char hipass_ctrl;
#define LIS3_NO_MAP 0
#define LIS3_DEV_X 1
#define LIS3_DEV_Y 2
/* Limits for selftest are specified in chip data sheet */
s16 st_min_limits[3]; /* min pass limit x, y, z */
s16 st_max_limits[3]; /* max pass limit x, y, z */
+ int irq2;
};
#endif /* __LIS3LV02D_H_ */
#include <asm/ioctl.h>
#include <linux/types.h>
#include <linux/videodev2.h>
+#include <linux/fb.h>
struct matroxioc_output_mode {
__u32 output; /* which output */
MATROXFB_CID_LAST
};
-#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
-
#endif
struct page;
struct mm_struct;
+extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
+ struct list_head *dst,
+ unsigned long *scanned, int order,
+ int mode, struct zone *z,
+ struct mem_cgroup *mem_cont,
+ int active, int file);
+
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
/*
* All "charge" functions with gfp_mask should use GFP_KERNEL or
extern int mem_cgroup_shmem_charge_fallback(struct page *page,
struct mm_struct *mm, gfp_t gfp_mask);
-extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
- struct list_head *dst,
- unsigned long *scanned, int order,
- int mode, struct zone *z,
- struct mem_cgroup *mem_cont,
- int active, int file);
extern void mem_cgroup_out_of_memory(struct mem_cgroup *mem, gfp_t gfp_mask);
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem);
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
+extern int mem_online_node(int nid);
extern int add_memory(int nid, u64 start, u64 size);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int remove_memory(u64 start, u64 size);
MPOL_MAX, /* always last member of enum */
};
+enum mpol_rebind_step {
+ MPOL_REBIND_ONCE, /* do rebind work at once(not by two step) */
+ MPOL_REBIND_STEP1, /* first step(set all the newly nodes) */
+ MPOL_REBIND_STEP2, /* second step(clean all the disallowed nodes)*/
+ MPOL_REBIND_NSTEP,
+};
+
/* Flags for set_mempolicy */
#define MPOL_F_STATIC_NODES (1 << 15)
#define MPOL_F_RELATIVE_NODES (1 << 14)
*/
#define MPOL_F_SHARED (1 << 0) /* identify shared policies */
#define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */
+#define MPOL_F_REBINDING (1 << 2) /* identify policies in rebinding */
#ifdef __KERNEL__
extern void numa_default_policy(void);
extern void numa_policy_init(void);
-extern void mpol_rebind_task(struct task_struct *tsk,
- const nodemask_t *new);
+extern void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new,
+ enum mpol_rebind_step step);
extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
extern void mpol_fix_fork_child_flag(struct task_struct *p);
}
static inline void mpol_rebind_task(struct task_struct *tsk,
- const nodemask_t *new)
+ const nodemask_t *new,
+ enum mpol_rebind_step step)
{
}
#ifdef CONFIG_MIGRATION
#define PAGE_MIGRATION 1
-extern int putback_lru_pages(struct list_head *l);
+extern void putback_lru_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *);
extern int migrate_pages(struct list_head *l, new_page_t x,
struct page *, struct page *);
extern int migrate_prep(void);
+extern int migrate_prep_local(void);
extern int migrate_vmas(struct mm_struct *mm,
const nodemask_t *from, const nodemask_t *to,
unsigned long flags);
#else
#define PAGE_MIGRATION 0
-static inline int putback_lru_pages(struct list_head *l) { return 0; }
+static inline void putback_lru_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, int offlining) { return -ENOSYS; }
static inline int migrate_prep(void) { return -ENOSYS; }
+static inline int migrate_prep_local(void) { return -ENOSYS; }
static inline int migrate_vmas(struct mm_struct *mm,
const nodemask_t *from, const nodemask_t *to,
#include <linux/debug_locks.h>
#include <linux/mm_types.h>
#include <linux/range.h>
+#include <linux/pfn.h>
struct mempolicy;
struct anon_vma;
#define VM_PFN_AT_MMAP 0x40000000 /* PFNMAP vma that is fully mapped at mmap time */
#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+/* Bits set in the VMA until the stack is in its final location */
+#define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ)
+
#ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
#define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
#endif
void put_pages_list(struct list_head *pages);
void split_page(struct page *page, unsigned int order);
+int split_free_page(struct page *page);
/*
* Compound pages have a destructor function. Provide a
static __always_inline void *lowmem_page_address(struct page *page)
{
- return __va(page_to_pfn(page) << PAGE_SHIFT);
+ return __va(PFN_PHYS(page_to_pfn(page)));
}
#if defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL)
unsigned long *pageblock_flags;
#endif /* CONFIG_SPARSEMEM */
+#ifdef CONFIG_COMPACTION
+ /*
+ * On compaction failure, 1<<compact_defer_shift compactions
+ * are skipped before trying again. The number attempted since
+ * last failure is tracked with compact_considered.
+ */
+ unsigned int compact_considered;
+ unsigned int compact_defer_shift;
+#endif
ZONE_PADDING(_pad1_)
#include <linux/memory_hotplug.h>
+extern struct mutex zonelists_mutex;
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
-void build_all_zonelists(void);
+void build_all_zonelists(void *data);
void wakeup_kswapd(struct zone *zone, int order);
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
#endif
#define SECTION_NR_TO_ROOT(sec) ((sec) / SECTIONS_PER_ROOT)
-#define NR_SECTION_ROOTS (NR_MEM_SECTIONS / SECTIONS_PER_ROOT)
+#define NR_SECTION_ROOTS DIV_ROUND_UP(NR_MEM_SECTIONS, SECTIONS_PER_ROOT)
#define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1)
#ifdef CONFIG_SPARSEMEM_EXTREME
#define _LINUX_RATELIMIT_H
#include <linux/param.h>
-#include <linux/spinlock_types.h>
+#include <linux/spinlock.h>
#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
#define DEFAULT_RATELIMIT_BURST 10
.burst = burst_init, \
}
+static inline void ratelimit_state_init(struct ratelimit_state *rs,
+ int interval, int burst)
+{
+ spin_lock_init(&rs->lock);
+ rs->interval = interval;
+ rs->burst = burst;
+ rs->printed = 0;
+ rs->missed = 0;
+ rs->begin = 0;
+}
+
extern int ___ratelimit(struct ratelimit_state *rs, const char *func);
#define __ratelimit(state) ___ratelimit(state, __func__)
*/
struct anon_vma {
spinlock_t lock; /* Serialize access to vma list */
-#ifdef CONFIG_KSM
- atomic_t ksm_refcount;
+#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
+
+ /*
+ * The external_refcount is taken by either KSM or page migration
+ * to take a reference to an anon_vma when there is no
+ * guarantee that the vma of page tables will exist for
+ * the duration of the operation. A caller that takes
+ * the reference is responsible for clearing up the
+ * anon_vma if they are the last user on release
+ */
+ atomic_t external_refcount;
#endif
/*
* NOTE: the LSB of the head.next is set by
};
#ifdef CONFIG_MMU
-#ifdef CONFIG_KSM
-static inline void ksm_refcount_init(struct anon_vma *anon_vma)
+#if defined(CONFIG_KSM) || defined(CONFIG_MIGRATION)
+static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
{
- atomic_set(&anon_vma->ksm_refcount, 0);
+ atomic_set(&anon_vma->external_refcount, 0);
}
-static inline int ksm_refcount(struct anon_vma *anon_vma)
+static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
{
- return atomic_read(&anon_vma->ksm_refcount);
+ return atomic_read(&anon_vma->external_refcount);
}
#else
-static inline void ksm_refcount_init(struct anon_vma *anon_vma)
+static inline void anonvma_external_refcount_init(struct anon_vma *anon_vma)
{
}
-static inline int ksm_refcount(struct anon_vma *anon_vma)
+static inline int anonvma_external_refcount(struct anon_vma *anon_vma)
{
return 0;
}
* 1-3 now and depends on arch. We use "5" as safe margin, here.
*/
#define MAPCOUNT_ELF_CORE_MARGIN (5)
-#define DEFAULT_MAX_MAP_COUNT (USHORT_MAX - MAPCOUNT_ELF_CORE_MARGIN)
+#define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN)
extern int sysctl_max_map_count;
#endif
#ifdef CONFIG_CPUSETS
nodemask_t mems_allowed; /* Protected by alloc_lock */
+ int mems_allowed_change_disable;
int cpuset_mem_spread_rotor;
#endif
#ifdef CONFIG_CGROUPS
};
#define SWAP_CLUSTER_MAX 32
+#define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX
#define SWAP_MAP_MAX 0x3e /* Max duplication count, in first swap_map */
#define SWAP_MAP_BAD 0x3f /* Note pageblock is bad, in first swap_map */
__lru_cache_add(page, LRU_INACTIVE_ANON);
}
-static inline void lru_cache_add_active_anon(struct page *page)
-{
- __lru_cache_add(page, LRU_ACTIVE_ANON);
-}
-
static inline void lru_cache_add_file(struct page *page)
{
__lru_cache_add(page, LRU_INACTIVE_FILE);
}
-static inline void lru_cache_add_active_file(struct page *page)
-{
- __lru_cache_add(page, LRU_ACTIVE_FILE);
-}
+/* LRU Isolation modes. */
+#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
+#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
+#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
/* linux/mm/vmscan.c */
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+#ifdef CONFIG_COMPACTION
+ COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
+ COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+#endif
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
#endif
IP_DEFRAG_LOCAL_DELIVER,
IP_DEFRAG_CALL_RA_CHAIN,
IP_DEFRAG_CONNTRACK_IN,
- __IP_DEFRAG_CONNTRACK_IN_END = IP_DEFRAG_CONNTRACK_IN + USHORT_MAX,
+ __IP_DEFRAG_CONNTRACK_IN_END = IP_DEFRAG_CONNTRACK_IN + USHRT_MAX,
IP_DEFRAG_CONNTRACK_OUT,
- __IP_DEFRAG_CONNTRACK_OUT_END = IP_DEFRAG_CONNTRACK_OUT + USHORT_MAX,
+ __IP_DEFRAG_CONNTRACK_OUT_END = IP_DEFRAG_CONNTRACK_OUT + USHRT_MAX,
IP_DEFRAG_CONNTRACK_BRIDGE_IN,
- __IP_DEFRAG_CONNTRACK_BRIDGE_IN = IP_DEFRAG_CONNTRACK_BRIDGE_IN + USHORT_MAX,
+ __IP_DEFRAG_CONNTRACK_BRIDGE_IN = IP_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX,
IP_DEFRAG_VS_IN,
IP_DEFRAG_VS_OUT,
IP_DEFRAG_VS_FWD
enum ip6_defrag_users {
IP6_DEFRAG_LOCAL_DELIVER,
IP6_DEFRAG_CONNTRACK_IN,
- __IP6_DEFRAG_CONNTRACK_IN = IP6_DEFRAG_CONNTRACK_IN + USHORT_MAX,
+ __IP6_DEFRAG_CONNTRACK_IN = IP6_DEFRAG_CONNTRACK_IN + USHRT_MAX,
IP6_DEFRAG_CONNTRACK_OUT,
- __IP6_DEFRAG_CONNTRACK_OUT = IP6_DEFRAG_CONNTRACK_OUT + USHORT_MAX,
+ __IP6_DEFRAG_CONNTRACK_OUT = IP6_DEFRAG_CONNTRACK_OUT + USHRT_MAX,
IP6_DEFRAG_CONNTRACK_BRIDGE_IN,
- __IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHORT_MAX,
+ __IP6_DEFRAG_CONNTRACK_BRIDGE_IN = IP6_DEFRAG_CONNTRACK_BRIDGE_IN + USHRT_MAX,
};
struct ip6_create_arg {
#define FBIPUT_COLOR _IOW('F', 6, int)
#define FBIPUT_HSYNC _IOW('F', 9, int)
#define FBIPUT_VSYNC _IOW('F', 10, int)
+#define FBIO_WAITFORVSYNC _IOW('F', 0x20, u_int32_t)
#endif /* ifndef DA8XX_FB_H */
#define LCDC_FLAGS_HSCNT (1 << 3) /* Disable HSYNC during VBLANK */
#define LCDC_FLAGS_DWCNT (1 << 4) /* Disable dotclock during blanking */
-#define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
-
struct sh_mobile_lcdc_sys_bus_cfg {
unsigned long ldmt2r;
unsigned long ldmt3r;
setup_per_cpu_areas();
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
- build_all_zonelists();
+ build_all_zonelists(NULL);
page_alloc_init();
printk(KERN_NOTICE "Kernel command line: %s\n", boot_command_line);
out.msg_rtime = in->msg_rtime;
out.msg_ctime = in->msg_ctime;
- if (in->msg_cbytes > USHORT_MAX)
- out.msg_cbytes = USHORT_MAX;
+ if (in->msg_cbytes > USHRT_MAX)
+ out.msg_cbytes = USHRT_MAX;
else
out.msg_cbytes = in->msg_cbytes;
out.msg_lcbytes = in->msg_cbytes;
- if (in->msg_qnum > USHORT_MAX)
- out.msg_qnum = USHORT_MAX;
+ if (in->msg_qnum > USHRT_MAX)
+ out.msg_qnum = USHRT_MAX;
else
out.msg_qnum = in->msg_qnum;
- if (in->msg_qbytes > USHORT_MAX)
- out.msg_qbytes = USHORT_MAX;
+ if (in->msg_qbytes > USHRT_MAX)
+ out.msg_qbytes = USHRT_MAX;
else
out.msg_qbytes = in->msg_qbytes;
out.msg_lqbytes = in->msg_qbytes;
ids->seq = 0;
{
int seq_limit = INT_MAX/SEQ_MULTIPLIER;
- if (seq_limit > USHORT_MAX)
- ids->seq_max = USHORT_MAX;
+ if (seq_limit > USHRT_MAX)
+ ids->seq_max = USHRT_MAX;
else
ids->seq_max = seq_limit;
}
int __cpuinit cpu_up(unsigned int cpu)
{
int err = 0;
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+ int nid;
+ pg_data_t *pgdat;
+#endif
+
if (!cpu_possible(cpu)) {
printk(KERN_ERR "can't online cpu %d because it is not "
"configured as may-hotadd at boot time\n", cpu);
return -EINVAL;
}
+#ifdef CONFIG_MEMORY_HOTPLUG
+ nid = cpu_to_node(cpu);
+ if (!node_online(nid)) {
+ err = mem_online_node(nid);
+ if (err)
+ return err;
+ }
+
+ pgdat = NODE_DATA(nid);
+ if (!pgdat) {
+ printk(KERN_ERR
+ "Can't online cpu %d due to NULL pgdat\n", cpu);
+ return -ENOMEM;
+ }
+
+ if (pgdat->node_zonelists->_zonerefs->zone == NULL) {
+ mutex_lock(&zonelists_mutex);
+ build_all_zonelists(NULL);
+ mutex_unlock(&zonelists_mutex);
+ }
+#endif
+
cpu_maps_update_begin();
if (cpu_hotplug_disabled) {
* In order to avoid seeing no nodes if the old and new nodes are disjoint,
* we structure updates as setting all new allowed nodes, then clearing newly
* disallowed ones.
- *
- * Called with task's alloc_lock held
*/
static void cpuset_change_task_nodemask(struct task_struct *tsk,
nodemask_t *newmems)
{
+repeat:
+ /*
+ * Allow tasks that have access to memory reserves because they have
+ * been OOM killed to get memory anywhere.
+ */
+ if (unlikely(test_thread_flag(TIF_MEMDIE)))
+ return;
+ if (current->flags & PF_EXITING) /* Let dying task have memory */
+ return;
+
+ task_lock(tsk);
nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
- mpol_rebind_task(tsk, &tsk->mems_allowed);
- mpol_rebind_task(tsk, newmems);
+ mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP1);
+
+
+ /*
+ * ensure checking ->mems_allowed_change_disable after setting all new
+ * allowed nodes.
+ *
+ * the read-side task can see an nodemask with new allowed nodes and
+ * old allowed nodes. and if it allocates page when cpuset clears newly
+ * disallowed ones continuous, it can see the new allowed bits.
+ *
+ * And if setting all new allowed nodes is after the checking, setting
+ * all new allowed nodes and clearing newly disallowed ones will be done
+ * continuous, and the read-side task may find no node to alloc page.
+ */
+ smp_mb();
+
+ /*
+ * Allocation of memory is very fast, we needn't sleep when waiting
+ * for the read-side.
+ */
+ while (ACCESS_ONCE(tsk->mems_allowed_change_disable)) {
+ task_unlock(tsk);
+ if (!task_curr(tsk))
+ yield();
+ goto repeat;
+ }
+
+ /*
+ * ensure checking ->mems_allowed_change_disable before clearing all new
+ * disallowed nodes.
+ *
+ * if clearing newly disallowed bits before the checking, the read-side
+ * task may find no node to alloc page.
+ */
+ smp_mb();
+
+ mpol_rebind_task(tsk, newmems, MPOL_REBIND_STEP2);
tsk->mems_allowed = *newmems;
+ task_unlock(tsk);
}
/*
cs = cgroup_cs(scan->cg);
guarantee_online_mems(cs, newmems);
- task_lock(p);
cpuset_change_task_nodemask(p, newmems);
- task_unlock(p);
NODEMASK_FREE(newmems);
err = set_cpus_allowed_ptr(tsk, cpus_attach);
WARN_ON_ONCE(err);
- task_lock(tsk);
cpuset_change_task_nodemask(tsk, to);
- task_unlock(tsk);
cpuset_update_task_spread_flag(cs, tsk);
}
exit_notify(tsk, group_dead);
#ifdef CONFIG_NUMA
+ task_lock(tsk);
mpol_put(tsk->mempolicy);
tsk->mempolicy = NULL;
+ task_unlock(tsk);
#endif
#ifdef CONFIG_FUTEX
if (unlikely(current->pi_state_cache))
extern const struct kernel_symbol __stop___ksymtab_gpl[];
extern const struct kernel_symbol __start___ksymtab_gpl_future[];
extern const struct kernel_symbol __stop___ksymtab_gpl_future[];
-extern const struct kernel_symbol __start___ksymtab_gpl_future[];
-extern const struct kernel_symbol __stop___ksymtab_gpl_future[];
extern const unsigned long __start___kcrctab[];
extern const unsigned long __start___kcrctab_gpl[];
extern const unsigned long __start___kcrctab_gpl_future[];
#include <linux/highuid.h>
#include <linux/writeback.h>
#include <linux/ratelimit.h>
+#include <linux/compaction.h>
#include <linux/hugetlb.h>
#include <linux/initrd.h>
#include <linux/key.h>
static int max_sched_shares_ratelimit = NSEC_PER_SEC; /* 1 second */
#endif
+#ifdef CONFIG_COMPACTION
+static int min_extfrag_threshold;
+static int max_extfrag_threshold = 1000;
+#endif
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
.mode = 0644,
.proc_handler = drop_caches_sysctl_handler,
},
+#ifdef CONFIG_COMPACTION
+ {
+ .procname = "compact_memory",
+ .data = &sysctl_compact_memory,
+ .maxlen = sizeof(int),
+ .mode = 0200,
+ .proc_handler = sysctl_compaction_handler,
+ },
+ {
+ .procname = "extfrag_threshold",
+ .data = &sysctl_extfrag_threshold,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = sysctl_extfrag_handler,
+ .extra1 = &min_extfrag_threshold,
+ .extra2 = &max_extfrag_threshold,
+ },
+
+#endif /* CONFIG_COMPACTION */
{
.procname = "min_free_kbytes",
.data = &min_free_kbytes,
#include <linux/file.h>
#include <linux/ctype.h>
#include <linux/netdevice.h>
+#include <linux/kernel.h>
#include <linux/slab.h>
#ifdef CONFIG_SYSCTL_SYSCALL
return result;
}
-static unsigned hex_value(int ch)
-{
- return isdigit(ch) ? ch - '0' : ((ch | 0x20) - 'a') + 10;
-}
-
static ssize_t bin_uuid(struct file *file,
void __user *oldval, size_t oldlen, void __user *newval, size_t newlen)
{
if (!isxdigit(str[0]) || !isxdigit(str[1]))
goto out;
- uuid[i] = (hex_value(str[0]) << 4) | hex_value(str[1]);
+ uuid[i] = (hex_to_bin(str[0]) << 4) |
+ hex_to_bin(str[1]);
str += 2;
if (*str == '-')
str++;
Usage:
- Dynamic debugging is controlled via the 'dynamic_debug/ddebug' file,
+ Dynamic debugging is controlled via the 'dynamic_debug/control' file,
which is contained in the 'debugfs' filesystem. Thus, the debugfs
filesystem must first be mounted before making use of this feature.
- We refer the control file as: <debugfs>/dynamic_debug/ddebug. This
+ We refer the control file as: <debugfs>/dynamic_debug/control. This
file contains a list of the debug statements that can be enabled. The
format for each line of the file is:
From a live system:
- nullarbor:~ # cat <debugfs>/dynamic_debug/ddebug
+ nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
fs/aio.c:222 [aio]__put_ioctx - "__put_ioctx:\040freeing\040%p\012"
fs/aio.c:248 [aio]ioctx_alloc - "ENOMEM:\040nr_events\040too\040high\012"
// enable the message at line 1603 of file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
- <debugfs>/dynamic_debug/ddebug
+ <debugfs>/dynamic_debug/control
// enable all the messages in file svcsock.c
nullarbor:~ # echo -n 'file svcsock.c +p' >
- <debugfs>/dynamic_debug/ddebug
+ <debugfs>/dynamic_debug/control
// enable all the messages in the NFS server module
nullarbor:~ # echo -n 'module nfsd +p' >
- <debugfs>/dynamic_debug/ddebug
+ <debugfs>/dynamic_debug/control
// enable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process +p' >
- <debugfs>/dynamic_debug/ddebug
+ <debugfs>/dynamic_debug/control
// disable all 12 messages in the function svc_process()
nullarbor:~ # echo -n 'func svc_process -p' >
- <debugfs>/dynamic_debug/ddebug
+ <debugfs>/dynamic_debug/control
See Documentation/dynamic-debug-howto.txt for additional information.
#if CRC_LE_BITS == 8 || CRC_BE_BITS == 8
static inline u32
-crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 *tab)
+crc32_body(u32 crc, unsigned char const *buf, size_t len, const u32 (*tab)[256])
{
-# ifdef __LITTLE_ENDIAN
-# define DO_CRC(x) crc = tab[(crc ^ (x)) & 255 ] ^ (crc >> 8)
+# if __BYTE_ORDER == __LITTLE_ENDIAN
+# define DO_CRC(x) crc = tab[0][(crc ^ (x)) & 255] ^ (crc >> 8)
+# define DO_CRC4 crc = tab[3][(crc) & 255] ^ \
+ tab[2][(crc >> 8) & 255] ^ \
+ tab[1][(crc >> 16) & 255] ^ \
+ tab[0][(crc >> 24) & 255]
# else
-# define DO_CRC(x) crc = tab[((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
+# define DO_CRC(x) crc = tab[0][((crc >> 24) ^ (x)) & 255] ^ (crc << 8)
+# define DO_CRC4 crc = tab[0][(crc) & 255] ^ \
+ tab[1][(crc >> 8) & 255] ^ \
+ tab[2][(crc >> 16) & 255] ^ \
+ tab[3][(crc >> 24) & 255]
# endif
const u32 *b;
size_t rem_len;
b = (const u32 *)buf;
for (--b; len; --len) {
crc ^= *++b; /* use pre increment for speed */
- DO_CRC(0);
- DO_CRC(0);
- DO_CRC(0);
- DO_CRC(0);
+ DO_CRC4;
}
len = rem_len;
/* And the last few bytes */
}
return crc;
#undef DO_CRC
+#undef DO_CRC4
}
#endif
/**
u32 __pure crc32_le(u32 crc, unsigned char const *p, size_t len)
{
# if CRC_LE_BITS == 8
- const u32 *tab = crc32table_le;
+ const u32 (*tab)[] = crc32table_le;
crc = __cpu_to_le32(crc);
crc = crc32_body(crc, p, len, tab);
u32 __pure crc32_be(u32 crc, unsigned char const *p, size_t len)
{
# if CRC_BE_BITS == 8
- const u32 *tab = crc32table_be;
+ const u32 (*tab)[] = crc32table_be;
crc = __cpu_to_be32(crc);
crc = crc32_body(crc, p, len, tab);
__func__, (int)len);
nwords = ddebug_tokenize(tmpbuf, words, MAXWORDS);
- if (nwords < 0)
+ if (nwords <= 0)
return -EINVAL;
if (ddebug_parse_query(words, nwords-1, &query))
return -EINVAL;
#define LE_TABLE_SIZE (1 << CRC_LE_BITS)
#define BE_TABLE_SIZE (1 << CRC_BE_BITS)
-static uint32_t crc32table_le[LE_TABLE_SIZE];
-static uint32_t crc32table_be[BE_TABLE_SIZE];
+static uint32_t crc32table_le[4][LE_TABLE_SIZE];
+static uint32_t crc32table_be[4][BE_TABLE_SIZE];
/**
* crc32init_le() - allocate and initialize LE table data
unsigned i, j;
uint32_t crc = 1;
- crc32table_le[0] = 0;
+ crc32table_le[0][0] = 0;
for (i = 1 << (CRC_LE_BITS - 1); i; i >>= 1) {
crc = (crc >> 1) ^ ((crc & 1) ? CRCPOLY_LE : 0);
for (j = 0; j < LE_TABLE_SIZE; j += 2 * i)
- crc32table_le[i + j] = crc ^ crc32table_le[j];
+ crc32table_le[0][i + j] = crc ^ crc32table_le[0][j];
+ }
+ for (i = 0; i < LE_TABLE_SIZE; i++) {
+ crc = crc32table_le[0][i];
+ for (j = 1; j < 4; j++) {
+ crc = crc32table_le[0][crc & 0xff] ^ (crc >> 8);
+ crc32table_le[j][i] = crc;
+ }
}
}
unsigned i, j;
uint32_t crc = 0x80000000;
- crc32table_be[0] = 0;
+ crc32table_be[0][0] = 0;
for (i = 1; i < BE_TABLE_SIZE; i <<= 1) {
crc = (crc << 1) ^ ((crc & 0x80000000) ? CRCPOLY_BE : 0);
for (j = 0; j < i; j++)
- crc32table_be[i + j] = crc ^ crc32table_be[j];
+ crc32table_be[0][i + j] = crc ^ crc32table_be[0][j];
+ }
+ for (i = 0; i < BE_TABLE_SIZE; i++) {
+ crc = crc32table_be[0][i];
+ for (j = 1; j < 4; j++) {
+ crc = crc32table_be[0][(crc >> 24) & 0xff] ^ (crc << 8);
+ crc32table_be[j][i] = crc;
+ }
}
}
-static void output_table(uint32_t table[], int len, char *trans)
+static void output_table(uint32_t table[4][256], int len, char *trans)
{
- int i;
+ int i, j;
- for (i = 0; i < len - 1; i++) {
- if (i % ENTRIES_PER_LINE == 0)
- printf("\n");
- printf("%s(0x%8.8xL), ", trans, table[i]);
+ for (j = 0 ; j < 4; j++) {
+ printf("{");
+ for (i = 0; i < len - 1; i++) {
+ if (i % ENTRIES_PER_LINE == 0)
+ printf("\n");
+ printf("%s(0x%8.8xL), ", trans, table[j][i]);
+ }
+ printf("%s(0x%8.8xL)},\n", trans, table[j][len - 1]);
}
- printf("%s(0x%8.8xL)\n", trans, table[len - 1]);
}
int main(int argc, char** argv)
if (CRC_LE_BITS > 1) {
crc32init_le();
- printf("static const u32 crc32table_le[] = {");
+ printf("static const u32 crc32table_le[4][256] = {");
output_table(crc32table_le, LE_TABLE_SIZE, "tole");
printf("};\n");
}
if (CRC_BE_BITS > 1) {
crc32init_be();
- printf("static const u32 crc32table_be[] = {");
+ printf("static const u32 crc32table_be[4][256] = {");
output_table(crc32table_be, BE_TABLE_SIZE, "tobe");
printf("};\n");
}
const char hex_asc[] = "0123456789abcdef";
EXPORT_SYMBOL(hex_asc);
+/**
+ * hex_to_bin - convert a hex digit to its real value
+ * @ch: ascii character represents hex digit
+ *
+ * hex_to_bin() converts one hex digit to its actual value or -1 in case of bad
+ * input.
+ */
+int hex_to_bin(char ch)
+{
+ if ((ch >= '0') && (ch <= '9'))
+ return ch - '0';
+ ch = tolower(ch);
+ if ((ch >= 'a') && (ch <= 'f'))
+ return ch - 'a' + 10;
+ return -1;
+}
+EXPORT_SYMBOL(hex_to_bin);
+
/**
* hex_dump_to_buffer - convert a blob of data to "hex ASCII" in memory
* @buf: data blob to dump
*
* E.g.:
* hex_dump_to_buffer(frame->data, frame->len, 16, 1,
- * linebuf, sizeof(linebuf), 1);
+ * linebuf, sizeof(linebuf), true);
*
* example output buffer:
* 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f @ABCDEFGHIJKLMNO
for (j = 0; j < ngroups; j++)
lx += scnprintf(linebuf + lx, linebuflen - lx,
- "%s%16.16llx", j ? " " : "",
- (unsigned long long)*(ptr8 + j));
+ "%s%16.16llx", j ? " " : "",
+ (unsigned long long)*(ptr8 + j));
ascii_column = 17 * ngroups + 2;
break;
}
for (j = 0; j < ngroups; j++)
lx += scnprintf(linebuf + lx, linebuflen - lx,
- "%s%8.8x", j ? " " : "", *(ptr4 + j));
+ "%s%8.8x", j ? " " : "", *(ptr4 + j));
ascii_column = 9 * ngroups + 2;
break;
}
for (j = 0; j < ngroups; j++)
lx += scnprintf(linebuf + lx, linebuflen - lx,
- "%s%4.4x", j ? " " : "", *(ptr2 + j));
+ "%s%4.4x", j ? " " : "", *(ptr2 + j));
ascii_column = 5 * ngroups + 2;
break;
}
while (lx < (linebuflen - 1) && lx < (ascii_column - 1))
linebuf[lx++] = ' ';
- for (j = 0; (j < len) && (lx + 2) < linebuflen; j++)
- linebuf[lx++] = (isascii(ptr[j]) && isprint(ptr[j])) ? ptr[j]
- : '.';
+ for (j = 0; (j < len) && (lx + 2) < linebuflen; j++) {
+ ch = ptr[j];
+ linebuf[lx++] = (isascii(ch) && isprint(ch)) ? ch : '.';
+ }
nil:
linebuf[lx++] = '\0';
}
*
* E.g.:
* print_hex_dump(KERN_DEBUG, "raw data: ", DUMP_PREFIX_ADDRESS,
- * 16, 1, frame->data, frame->len, 1);
+ * 16, 1, frame->data, frame->len, true);
*
* Example output using %DUMP_PREFIX_OFFSET and 1-byte mode:
* 0009ab42: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f @ABCDEFGHIJKLMNO
* ffffffff88089af0: 73727170 77767574 7b7a7978 7f7e7d7c pqrstuvwxyz{|}~.
*/
void print_hex_dump(const char *level, const char *prefix_str, int prefix_type,
- int rowsize, int groupsize,
- const void *buf, size_t len, bool ascii)
+ int rowsize, int groupsize,
+ const void *buf, size_t len, bool ascii)
{
const u8 *ptr = buf;
int i, linelen, remaining = len;
- unsigned char linebuf[200];
+ unsigned char linebuf[32 * 3 + 2 + 32 + 1];
if (rowsize != 16 && rowsize != 32)
rowsize = 16;
for (i = 0; i < len; i += rowsize) {
linelen = min(remaining, rowsize);
remaining -= rowsize;
+
hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize,
- linebuf, sizeof(linebuf), ascii);
+ linebuf, sizeof(linebuf), ascii);
switch (prefix_type) {
case DUMP_PREFIX_ADDRESS:
- printk("%s%s%*p: %s\n", level, prefix_str,
- (int)(2 * sizeof(void *)), ptr + i, linebuf);
+ printk("%s%s%p: %s\n",
+ level, prefix_str, ptr + i, linebuf);
break;
case DUMP_PREFIX_OFFSET:
printk("%s%s%.8x: %s\n", level, prefix_str, i, linebuf);
* rowsize of 16, groupsize of 1, and ASCII output included.
*/
void print_hex_dump_bytes(const char *prefix_str, int prefix_type,
- const void *buf, size_t len)
+ const void *buf, size_t len)
{
print_hex_dump(KERN_DEBUG, prefix_str, prefix_type, 16, 1,
- buf, len, 1);
+ buf, len, true);
}
EXPORT_SYMBOL(print_hex_dump_bytes);
}
EXPORT_SYMBOL(strict_strtoll);
-static int skip_atoi(const char **s)
+static noinline_for_stack
+int skip_atoi(const char **s)
{
int i = 0;
/* Formats correctly any integer in [0,99999].
* Outputs from one to five digits depending on input.
* On i386 gcc 4.1.2 -O2: ~250 bytes of code. */
-static char *put_dec_trunc(char *buf, unsigned q)
+static noinline_for_stack
+char *put_dec_trunc(char *buf, unsigned q)
{
unsigned d3, d2, d1, d0;
d1 = (q>>4) & 0xf;
return buf;
}
/* Same with if's removed. Always emits five digits */
-static char *put_dec_full(char *buf, unsigned q)
+static noinline_for_stack
+char *put_dec_full(char *buf, unsigned q)
{
/* BTW, if q is in [0,9999], 8-bit ints will be enough, */
/* but anyway, gcc produces better code with full-sized ints */
return buf;
}
/* No inlining helps gcc to use registers better */
-static noinline char *put_dec(char *buf, unsigned long long num)
+static noinline_for_stack
+char *put_dec(char *buf, unsigned long long num)
{
while (1) {
unsigned rem;
s16 precision; /* # of digits/chars */
};
-static char *number(char *buf, char *end, unsigned long long num,
- struct printf_spec spec)
+static noinline_for_stack
+char *number(char *buf, char *end, unsigned long long num,
+ struct printf_spec spec)
{
/* we are called with base 8, 10 or 16, only, thus don't need "G..." */
static const char digits[16] = "0123456789ABCDEF"; /* "GHIJKLMNOPQRSTUVWXYZ"; */
return buf;
}
-static char *string(char *buf, char *end, const char *s, struct printf_spec spec)
+static noinline_for_stack
+char *string(char *buf, char *end, const char *s, struct printf_spec spec)
{
int len, i;
return buf;
}
-static char *symbol_string(char *buf, char *end, void *ptr,
- struct printf_spec spec, char ext)
+static noinline_for_stack
+char *symbol_string(char *buf, char *end, void *ptr,
+ struct printf_spec spec, char ext)
{
unsigned long value = (unsigned long) ptr;
#ifdef CONFIG_KALLSYMS
#endif
}
-static char *resource_string(char *buf, char *end, struct resource *res,
- struct printf_spec spec, const char *fmt)
+static noinline_for_stack
+char *resource_string(char *buf, char *end, struct resource *res,
+ struct printf_spec spec, const char *fmt)
{
#ifndef IO_RSRC_PRINTK_SIZE
#define IO_RSRC_PRINTK_SIZE 6
return string(buf, end, sym, spec);
}
-static char *mac_address_string(char *buf, char *end, u8 *addr,
- struct printf_spec spec, const char *fmt)
+static noinline_for_stack
+char *mac_address_string(char *buf, char *end, u8 *addr,
+ struct printf_spec spec, const char *fmt)
{
char mac_addr[sizeof("xx:xx:xx:xx:xx:xx")];
char *p = mac_addr;
return string(buf, end, mac_addr, spec);
}
-static char *ip4_string(char *p, const u8 *addr, const char *fmt)
+static noinline_for_stack
+char *ip4_string(char *p, const u8 *addr, const char *fmt)
{
int i;
bool leading_zeros = (fmt[0] == 'i');
return p;
}
-static char *ip6_compressed_string(char *p, const char *addr)
+static noinline_for_stack
+char *ip6_compressed_string(char *p, const char *addr)
{
int i, j, range;
unsigned char zerolength[8];
return p;
}
-static char *ip6_string(char *p, const char *addr, const char *fmt)
+static noinline_for_stack
+char *ip6_string(char *p, const char *addr, const char *fmt)
{
int i;
return p;
}
-static char *ip6_addr_string(char *buf, char *end, const u8 *addr,
- struct printf_spec spec, const char *fmt)
+static noinline_for_stack
+char *ip6_addr_string(char *buf, char *end, const u8 *addr,
+ struct printf_spec spec, const char *fmt)
{
char ip6_addr[sizeof("xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:255.255.255.255")];
return string(buf, end, ip6_addr, spec);
}
-static char *ip4_addr_string(char *buf, char *end, const u8 *addr,
- struct printf_spec spec, const char *fmt)
+static noinline_for_stack
+char *ip4_addr_string(char *buf, char *end, const u8 *addr,
+ struct printf_spec spec, const char *fmt)
{
char ip4_addr[sizeof("255.255.255.255")];
return string(buf, end, ip4_addr, spec);
}
-static char *uuid_string(char *buf, char *end, const u8 *addr,
- struct printf_spec spec, const char *fmt)
+static noinline_for_stack
+char *uuid_string(char *buf, char *end, const u8 *addr,
+ struct printf_spec spec, const char *fmt)
{
char uuid[sizeof("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx")];
char *p = uuid;
* function pointers are really function descriptors, which contain a
* pointer to the real address.
*/
-static char *pointer(const char *fmt, char *buf, char *end, void *ptr,
- struct printf_spec spec)
+static noinline_for_stack
+char *pointer(const char *fmt, char *buf, char *end, void *ptr,
+ struct printf_spec spec)
{
if (!ptr)
return string(buf, end, "(null)", spec);
* @precision: precision of a number
* @qualifier: qualifier of a number (long, size_t, ...)
*/
-static int format_decode(const char *fmt, struct printf_spec *spec)
+static noinline_for_stack
+int format_decode(const char *fmt, struct printf_spec *spec)
{
const char *start = fmt;
{
char *s = (char *)va_arg(args, char *);
if (field_width == -1)
- field_width = SHORT_MAX;
+ field_width = SHRT_MAX;
/* first, skip leading white space in buffer */
str = skip_spaces(str);
default "999999" if DEBUG_SPINLOCK || DEBUG_LOCK_ALLOC
default "4"
+#
+# support for memory compaction
+config COMPACTION
+ bool "Allow for memory compaction"
+ select MIGRATION
+ depends on EXPERIMENTAL && HUGETLB_PAGE && MMU
+ help
+ Allows the compaction of memory for the allocation of huge pages.
+
#
# support for page migration
#
depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE
help
Allows the migration of the physical location of pages of processes
- while the virtual addresses are not changed. This is useful for
- example on NUMA systems to put pages nearer to the processors accessing
- the page.
+ while the virtual addresses are not changed. This is useful in
+ two situations. The first is on NUMA systems to put pages nearer
+ to the processors accessing. The second is when allocating huge
+ pages as migration can relocate pages to satisfy a huge page
+ allocation instead of reclaiming.
config PHYS_ADDR_T_64BIT
def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
obj-$(CONFIG_SPARSEMEM) += sparse.o
obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
+obj-$(CONFIG_COMPACTION) += compaction.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
--- /dev/null
+/*
+ * linux/mm/compaction.c
+ *
+ * Memory compaction for the reduction of external fragmentation. Note that
+ * this heavily depends upon page migration to do all the real heavy
+ * lifting
+ *
+ * Copyright IBM Corp. 2007-2010 Mel Gorman <mel@csn.ul.ie>
+ */
+#include <linux/swap.h>
+#include <linux/migrate.h>
+#include <linux/compaction.h>
+#include <linux/mm_inline.h>
+#include <linux/backing-dev.h>
+#include <linux/sysctl.h>
+#include <linux/sysfs.h>
+#include "internal.h"
+
+/*
+ * compact_control is used to track pages being migrated and the free pages
+ * they are being migrated to during memory compaction. The free_pfn starts
+ * at the end of a zone and migrate_pfn begins at the start. Movable pages
+ * are moved to the end of a zone during a compaction run and the run
+ * completes when free_pfn <= migrate_pfn
+ */
+struct compact_control {
+ struct list_head freepages; /* List of free pages to migrate to */
+ struct list_head migratepages; /* List of pages being migrated */
+ unsigned long nr_freepages; /* Number of isolated free pages */
+ unsigned long nr_migratepages; /* Number of pages to migrate */
+ unsigned long free_pfn; /* isolate_freepages search base */
+ unsigned long migrate_pfn; /* isolate_migratepages search base */
+
+ /* Account for isolated anon and file pages */
+ unsigned long nr_anon;
+ unsigned long nr_file;
+
+ unsigned int order; /* order a direct compactor needs */
+ int migratetype; /* MOVABLE, RECLAIMABLE etc */
+ struct zone *zone;
+};
+
+static unsigned long release_freepages(struct list_head *freelist)
+{
+ struct page *page, *next;
+ unsigned long count = 0;
+
+ list_for_each_entry_safe(page, next, freelist, lru) {
+ list_del(&page->lru);
+ __free_page(page);
+ count++;
+ }
+
+ return count;
+}
+
+/* Isolate free pages onto a private freelist. Must hold zone->lock */
+static unsigned long isolate_freepages_block(struct zone *zone,
+ unsigned long blockpfn,
+ struct list_head *freelist)
+{
+ unsigned long zone_end_pfn, end_pfn;
+ int total_isolated = 0;
+ struct page *cursor;
+
+ /* Get the last PFN we should scan for free pages at */
+ zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
+ end_pfn = min(blockpfn + pageblock_nr_pages, zone_end_pfn);
+
+ /* Find the first usable PFN in the block to initialse page cursor */
+ for (; blockpfn < end_pfn; blockpfn++) {
+ if (pfn_valid_within(blockpfn))
+ break;
+ }
+ cursor = pfn_to_page(blockpfn);
+
+ /* Isolate free pages. This assumes the block is valid */
+ for (; blockpfn < end_pfn; blockpfn++, cursor++) {
+ int isolated, i;
+ struct page *page = cursor;
+
+ if (!pfn_valid_within(blockpfn))
+ continue;
+
+ if (!PageBuddy(page))
+ continue;
+
+ /* Found a free page, break it into order-0 pages */
+ isolated = split_free_page(page);
+ total_isolated += isolated;
+ for (i = 0; i < isolated; i++) {
+ list_add(&page->lru, freelist);
+ page++;
+ }
+
+ /* If a page was split, advance to the end of it */
+ if (isolated) {
+ blockpfn += isolated - 1;
+ cursor += isolated - 1;
+ }
+ }
+
+ return total_isolated;
+}
+
+/* Returns true if the page is within a block suitable for migration to */
+static bool suitable_migration_target(struct page *page)
+{
+
+ int migratetype = get_pageblock_migratetype(page);
+
+ /* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
+ if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
+ return false;
+
+ /* If the page is a large free page, then allow migration */
+ if (PageBuddy(page) && page_order(page) >= pageblock_order)
+ return true;
+
+ /* If the block is MIGRATE_MOVABLE, allow migration */
+ if (migratetype == MIGRATE_MOVABLE)
+ return true;
+
+ /* Otherwise skip the block */
+ return false;
+}
+
+/*
+ * Based on information in the current compact_control, find blocks
+ * suitable for isolating free pages from and then isolate them.
+ */
+static void isolate_freepages(struct zone *zone,
+ struct compact_control *cc)
+{
+ struct page *page;
+ unsigned long high_pfn, low_pfn, pfn;
+ unsigned long flags;
+ int nr_freepages = cc->nr_freepages;
+ struct list_head *freelist = &cc->freepages;
+
+ pfn = cc->free_pfn;
+ low_pfn = cc->migrate_pfn + pageblock_nr_pages;
+ high_pfn = low_pfn;
+
+ /*
+ * Isolate free pages until enough are available to migrate the
+ * pages on cc->migratepages. We stop searching if the migrate
+ * and free page scanners meet or enough free pages are isolated.
+ */
+ spin_lock_irqsave(&zone->lock, flags);
+ for (; pfn > low_pfn && cc->nr_migratepages > nr_freepages;
+ pfn -= pageblock_nr_pages) {
+ unsigned long isolated;
+
+ if (!pfn_valid(pfn))
+ continue;
+
+ /*
+ * Check for overlapping nodes/zones. It's possible on some
+ * configurations to have a setup like
+ * node0 node1 node0
+ * i.e. it's possible that all pages within a zones range of
+ * pages do not belong to a single zone.
+ */
+ page = pfn_to_page(pfn);
+ if (page_zone(page) != zone)
+ continue;
+
+ /* Check the block is suitable for migration */
+ if (!suitable_migration_target(page))
+ continue;
+
+ /* Found a block suitable for isolating free pages from */
+ isolated = isolate_freepages_block(zone, pfn, freelist);
+ nr_freepages += isolated;
+
+ /*
+ * Record the highest PFN we isolated pages from. When next
+ * looking for free pages, the search will restart here as
+ * page migration may have returned some pages to the allocator
+ */
+ if (isolated)
+ high_pfn = max(high_pfn, pfn);
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+
+ /* split_free_page does not map the pages */
+ list_for_each_entry(page, freelist, lru) {
+ arch_alloc_page(page, 0);
+ kernel_map_pages(page, 1, 1);
+ }
+
+ cc->free_pfn = high_pfn;
+ cc->nr_freepages = nr_freepages;
+}
+
+/* Update the number of anon and file isolated pages in the zone */
+static void acct_isolated(struct zone *zone, struct compact_control *cc)
+{
+ struct page *page;
+ unsigned int count[NR_LRU_LISTS] = { 0, };
+
+ list_for_each_entry(page, &cc->migratepages, lru) {
+ int lru = page_lru_base_type(page);
+ count[lru]++;
+ }
+
+ cc->nr_anon = count[LRU_ACTIVE_ANON] + count[LRU_INACTIVE_ANON];
+ cc->nr_file = count[LRU_ACTIVE_FILE] + count[LRU_INACTIVE_FILE];
+ __mod_zone_page_state(zone, NR_ISOLATED_ANON, cc->nr_anon);
+ __mod_zone_page_state(zone, NR_ISOLATED_FILE, cc->nr_file);
+}
+
+/* Similar to reclaim, but different enough that they don't share logic */
+static bool too_many_isolated(struct zone *zone)
+{
+
+ unsigned long inactive, isolated;
+
+ inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
+ zone_page_state(zone, NR_INACTIVE_ANON);
+ isolated = zone_page_state(zone, NR_ISOLATED_FILE) +
+ zone_page_state(zone, NR_ISOLATED_ANON);
+
+ return isolated > inactive;
+}
+
+/*
+ * Isolate all pages that can be migrated from the block pointed to by
+ * the migrate scanner within compact_control.
+ */
+static unsigned long isolate_migratepages(struct zone *zone,
+ struct compact_control *cc)
+{
+ unsigned long low_pfn, end_pfn;
+ struct list_head *migratelist = &cc->migratepages;
+
+ /* Do not scan outside zone boundaries */
+ low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
+
+ /* Only scan within a pageblock boundary */
+ end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
+
+ /* Do not cross the free scanner or scan within a memory hole */
+ if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
+ cc->migrate_pfn = end_pfn;
+ return 0;
+ }
+
+ /*
+ * Ensure that there are not too many pages isolated from the LRU
+ * list by either parallel reclaimers or compaction. If there are,
+ * delay for some time until fewer pages are isolated
+ */
+ while (unlikely(too_many_isolated(zone))) {
+ congestion_wait(BLK_RW_ASYNC, HZ/10);
+
+ if (fatal_signal_pending(current))
+ return 0;
+ }
+
+ /* Time to isolate some pages for migration */
+ spin_lock_irq(&zone->lru_lock);
+ for (; low_pfn < end_pfn; low_pfn++) {
+ struct page *page;
+ if (!pfn_valid_within(low_pfn))
+ continue;
+
+ /* Get the page and skip if free */
+ page = pfn_to_page(low_pfn);
+ if (PageBuddy(page))
+ continue;
+
+ /* Try isolate the page */
+ if (__isolate_lru_page(page, ISOLATE_BOTH, 0) != 0)
+ continue;
+
+ /* Successfully isolated */
+ del_page_from_lru_list(zone, page, page_lru(page));
+ list_add(&page->lru, migratelist);
+ mem_cgroup_del_lru(page);
+ cc->nr_migratepages++;
+
+ /* Avoid isolating too much */
+ if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
+ break;
+ }
+
+ acct_isolated(zone, cc);
+
+ spin_unlock_irq(&zone->lru_lock);
+ cc->migrate_pfn = low_pfn;
+
+ return cc->nr_migratepages;
+}
+
+/*
+ * This is a migrate-callback that "allocates" freepages by taking pages
+ * from the isolated freelists in the block we are migrating to.
+ */
+static struct page *compaction_alloc(struct page *migratepage,
+ unsigned long data,
+ int **result)
+{
+ struct compact_control *cc = (struct compact_control *)data;
+ struct page *freepage;
+
+ /* Isolate free pages if necessary */
+ if (list_empty(&cc->freepages)) {
+ isolate_freepages(cc->zone, cc);
+
+ if (list_empty(&cc->freepages))
+ return NULL;
+ }
+
+ freepage = list_entry(cc->freepages.next, struct page, lru);
+ list_del(&freepage->lru);
+ cc->nr_freepages--;
+
+ return freepage;
+}
+
+/*
+ * We cannot control nr_migratepages and nr_freepages fully when migration is
+ * running as migrate_pages() has no knowledge of compact_control. When
+ * migration is complete, we count the number of pages on the lists by hand.
+ */
+static void update_nr_listpages(struct compact_control *cc)
+{
+ int nr_migratepages = 0;
+ int nr_freepages = 0;
+ struct page *page;
+
+ list_for_each_entry(page, &cc->migratepages, lru)
+ nr_migratepages++;
+ list_for_each_entry(page, &cc->freepages, lru)
+ nr_freepages++;
+
+ cc->nr_migratepages = nr_migratepages;
+ cc->nr_freepages = nr_freepages;
+}
+
+static int compact_finished(struct zone *zone,
+ struct compact_control *cc)
+{
+ unsigned int order;
+ unsigned long watermark = low_wmark_pages(zone) + (1 << cc->order);
+
+ if (fatal_signal_pending(current))
+ return COMPACT_PARTIAL;
+
+ /* Compaction run completes if the migrate and free scanner meet */
+ if (cc->free_pfn <= cc->migrate_pfn)
+ return COMPACT_COMPLETE;
+
+ /* Compaction run is not finished if the watermark is not met */
+ if (!zone_watermark_ok(zone, cc->order, watermark, 0, 0))
+ return COMPACT_CONTINUE;
+
+ if (cc->order == -1)
+ return COMPACT_CONTINUE;
+
+ /* Direct compactor: Is a suitable page free? */
+ for (order = cc->order; order < MAX_ORDER; order++) {
+ /* Job done if page is free of the right migratetype */
+ if (!list_empty(&zone->free_area[order].free_list[cc->migratetype]))
+ return COMPACT_PARTIAL;
+
+ /* Job done if allocation would set block type */
+ if (order >= pageblock_order && zone->free_area[order].nr_free)
+ return COMPACT_PARTIAL;
+ }
+
+ return COMPACT_CONTINUE;
+}
+
+static int compact_zone(struct zone *zone, struct compact_control *cc)
+{
+ int ret;
+
+ /* Setup to move all movable pages to the end of the zone */
+ cc->migrate_pfn = zone->zone_start_pfn;
+ cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
+ cc->free_pfn &= ~(pageblock_nr_pages-1);
+
+ migrate_prep_local();
+
+ while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
+ unsigned long nr_migrate, nr_remaining;
+
+ if (!isolate_migratepages(zone, cc))
+ continue;
+
+ nr_migrate = cc->nr_migratepages;
+ migrate_pages(&cc->migratepages, compaction_alloc,
+ (unsigned long)cc, 0);
+ update_nr_listpages(cc);
+ nr_remaining = cc->nr_migratepages;
+
+ count_vm_event(COMPACTBLOCKS);
+ count_vm_events(COMPACTPAGES, nr_migrate - nr_remaining);
+ if (nr_remaining)
+ count_vm_events(COMPACTPAGEFAILED, nr_remaining);
+
+ /* Release LRU pages not migrated */
+ if (!list_empty(&cc->migratepages)) {
+ putback_lru_pages(&cc->migratepages);
+ cc->nr_migratepages = 0;
+ }
+
+ }
+
+ /* Release free pages and check accounting */
+ cc->nr_freepages -= release_freepages(&cc->freepages);
+ VM_BUG_ON(cc->nr_freepages != 0);
+
+ return ret;
+}
+
+static unsigned long compact_zone_order(struct zone *zone,
+ int order, gfp_t gfp_mask)
+{
+ struct compact_control cc = {
+ .nr_freepages = 0,
+ .nr_migratepages = 0,
+ .order = order,
+ .migratetype = allocflags_to_migratetype(gfp_mask),
+ .zone = zone,
+ };
+ INIT_LIST_HEAD(&cc.freepages);
+ INIT_LIST_HEAD(&cc.migratepages);
+
+ return compact_zone(zone, &cc);
+}
+
+int sysctl_extfrag_threshold = 500;
+
+/**
+ * try_to_compact_pages - Direct compact to satisfy a high-order allocation
+ * @zonelist: The zonelist used for the current allocation
+ * @order: The order of the current allocation
+ * @gfp_mask: The GFP mask of the current allocation
+ * @nodemask: The allowed nodes to allocate from
+ *
+ * This is the main entry point for direct page compaction.
+ */
+unsigned long try_to_compact_pages(struct zonelist *zonelist,
+ int order, gfp_t gfp_mask, nodemask_t *nodemask)
+{
+ enum zone_type high_zoneidx = gfp_zone(gfp_mask);
+ int may_enter_fs = gfp_mask & __GFP_FS;
+ int may_perform_io = gfp_mask & __GFP_IO;
+ unsigned long watermark;
+ struct zoneref *z;
+ struct zone *zone;
+ int rc = COMPACT_SKIPPED;
+
+ /*
+ * Check whether it is worth even starting compaction. The order check is
+ * made because an assumption is made that the page allocator can satisfy
+ * the "cheaper" orders without taking special steps
+ */
+ if (order <= PAGE_ALLOC_COSTLY_ORDER || !may_enter_fs || !may_perform_io)
+ return rc;
+
+ count_vm_event(COMPACTSTALL);
+
+ /* Compact each zone in the list */
+ for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
+ nodemask) {
+ int fragindex;
+ int status;
+
+ /*
+ * Watermarks for order-0 must be met for compaction. Note
+ * the 2UL. This is because during migration, copies of
+ * pages need to be allocated and for a short time, the
+ * footprint is higher
+ */
+ watermark = low_wmark_pages(zone) + (2UL << order);
+ if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+ continue;
+
+ /*
+ * fragmentation index determines if allocation failures are
+ * due to low memory or external fragmentation
+ *
+ * index of -1 implies allocations might succeed depending
+ * on watermarks
+ * index towards 0 implies failure is due to lack of memory
+ * index towards 1000 implies failure is due to fragmentation
+ *
+ * Only compact if a failure would be due to fragmentation.
+ */
+ fragindex = fragmentation_index(zone, order);
+ if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
+ continue;
+
+ if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0)) {
+ rc = COMPACT_PARTIAL;
+ break;
+ }
+
+ status = compact_zone_order(zone, order, gfp_mask);
+ rc = max(status, rc);
+
+ if (zone_watermark_ok(zone, order, watermark, 0, 0))
+ break;
+ }
+
+ return rc;
+}
+
+
+/* Compact all zones within a node */
+static int compact_node(int nid)
+{
+ int zoneid;
+ pg_data_t *pgdat;
+ struct zone *zone;
+
+ if (nid < 0 || nid >= nr_node_ids || !node_online(nid))
+ return -EINVAL;
+ pgdat = NODE_DATA(nid);
+
+ /* Flush pending updates to the LRU lists */
+ lru_add_drain_all();
+
+ for (zoneid = 0; zoneid < MAX_NR_ZONES; zoneid++) {
+ struct compact_control cc = {
+ .nr_freepages = 0,
+ .nr_migratepages = 0,
+ .order = -1,
+ };
+
+ zone = &pgdat->node_zones[zoneid];
+ if (!populated_zone(zone))
+ continue;
+
+ cc.zone = zone;
+ INIT_LIST_HEAD(&cc.freepages);
+ INIT_LIST_HEAD(&cc.migratepages);
+
+ compact_zone(zone, &cc);
+
+ VM_BUG_ON(!list_empty(&cc.freepages));
+ VM_BUG_ON(!list_empty(&cc.migratepages));
+ }
+
+ return 0;
+}
+
+/* Compact all nodes in the system */
+static int compact_nodes(void)
+{
+ int nid;
+
+ for_each_online_node(nid)
+ compact_node(nid);
+
+ return COMPACT_COMPLETE;
+}
+
+/* The written value is actually unused, all memory is compacted */
+int sysctl_compact_memory;
+
+/* This is the entry point for compacting all nodes via /proc/sys/vm */
+int sysctl_compaction_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ if (write)
+ return compact_nodes();
+
+ return 0;
+}
+
+int sysctl_extfrag_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ proc_dointvec_minmax(table, write, buffer, length, ppos);
+
+ return 0;
+}
+
+#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
+ssize_t sysfs_compact_node(struct sys_device *dev,
+ struct sysdev_attribute *attr,
+ const char *buf, size_t count)
+{
+ compact_node(dev->id);
+
+ return count;
+}
+static SYSDEV_ATTR(compact, S_IWUSR, NULL, sysfs_compact_node);
+
+int compaction_register_node(struct node *node)
+{
+ return sysdev_create_file(&node->sysdev, &attr_compact);
+}
+
+void compaction_unregister_node(struct node *node)
+{
+ return sysdev_remove_file(&node->sysdev, &attr_compact);
+}
+#endif /* CONFIG_SYSFS && CONFIG_NUMA */
/*
* Splice_read and readahead add shmem/tmpfs pages into the page cache
* before shmem_readpage has a chance to mark them as SwapBacked: they
- * need to go on the active_anon lru below, and mem_cgroup_cache_charge
+ * need to go on the anon lru below, and mem_cgroup_cache_charge
* (called in add_to_page_cache) needs to know where they're going too.
*/
if (mapping_cap_swap_backed(mapping))
if (page_is_file_cache(page))
lru_cache_add_file(page);
else
- lru_cache_add_active_anon(page);
+ lru_cache_add_anon(page);
}
return ret;
}
#ifdef CONFIG_NUMA
struct page *__page_cache_alloc(gfp_t gfp)
{
+ int n;
+ struct page *page;
+
if (cpuset_do_page_mem_spread()) {
- int n = cpuset_mem_spread_node();
- return alloc_pages_exact_node(n, gfp, 0);
+ get_mems_allowed();
+ n = cpuset_mem_spread_node();
+ page = alloc_pages_exact_node(n, gfp, 0);
+ put_mems_allowed();
+ return page;
}
return alloc_pages(gfp, 0);
}
#endif /* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-#if defined(CONFIG_DEBUG_HIGHMEM) && defined(CONFIG_TRACE_IRQFLAGS_SUPPORT)
+#ifdef CONFIG_DEBUG_HIGHMEM
void debug_kmap_atomic(enum km_type type)
{
struct page *page = NULL;
struct mempolicy *mpol;
nodemask_t *nodemask;
- struct zonelist *zonelist = huge_zonelist(vma, address,
- htlb_alloc_mask, &mpol, &nodemask);
+ struct zonelist *zonelist;
struct zone *zone;
struct zoneref *z;
+ get_mems_allowed();
+ zonelist = huge_zonelist(vma, address,
+ htlb_alloc_mask, &mpol, &nodemask);
/*
* A child process with MAP_PRIVATE mappings created by their parent
* have no page reserves. This check ensures that reservations are
*/
if (!vma_has_reserves(vma) &&
h->free_huge_pages - h->resv_huge_pages == 0)
- return NULL;
+ goto err;
/* If reserves cannot be used, ensure enough pages are in the pool */
if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
- return NULL;
+ goto err;;
for_each_zone_zonelist_nodemask(zone, z, zonelist,
MAX_NR_ZONES - 1, nodemask) {
break;
}
}
+err:
mpol_cond_put(mpol);
+ put_mems_allowed();
return page;
}
struct anon_vma *anon_vma)
{
rmap_item->anon_vma = anon_vma;
- atomic_inc(&anon_vma->ksm_refcount);
+ atomic_inc(&anon_vma->external_refcount);
}
static void drop_anon_vma(struct rmap_item *rmap_item)
{
struct anon_vma *anon_vma = rmap_item->anon_vma;
- if (atomic_dec_and_lock(&anon_vma->ksm_refcount, &anon_vma->lock)) {
+ if (atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
int empty = list_empty(&anon_vma->head);
spin_unlock(&anon_vma->lock);
if (empty)
}
EXPORT_SYMBOL_GPL(zap_vma_ptes);
-/*
- * Do a quick page-table lookup for a single page.
+/**
+ * follow_page - look up a page descriptor from a user-virtual address
+ * @vma: vm_area_struct mapping @address
+ * @address: virtual address to look up
+ * @flags: flags modifying lookup behaviour
+ *
+ * @flags can have FOLL_ flags set, defined in <linux/mm.h>
+ *
+ * Returns the mapped (struct page *), %NULL if no mapping exists, or
+ * an error pointer if there is a mapping to something not represented
+ * by a page descriptor (see also vm_normal_page()).
*/
struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
unsigned int flags)
* This means the page allocator ignores this zone.
* So, zonelist must be updated after online.
*/
+ mutex_lock(&zonelists_mutex);
if (!populated_zone(zone))
need_zonelists_rebuild = 1;
ret = walk_system_ram_range(pfn, nr_pages, &onlined_pages,
online_pages_range);
if (ret) {
+ mutex_unlock(&zonelists_mutex);
printk(KERN_DEBUG "online_pages %lx at %lx failed\n",
nr_pages, pfn);
memory_notify(MEM_CANCEL_ONLINE, &arg);
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
+ if (need_zonelists_rebuild)
+ build_all_zonelists(zone);
+ else
+ zone_pcp_update(zone);
- zone_pcp_update(zone);
+ mutex_unlock(&zonelists_mutex);
setup_per_zone_wmarks();
calculate_zone_inactive_ratio(zone);
if (onlined_pages) {
node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
}
- if (need_zonelists_rebuild)
- build_all_zonelists();
- else
- vm_total_pages = nr_free_pagecache_pages();
+ vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
}
+/*
+ * called by cpu_up() to online a node without onlined memory.
+ */
+int mem_online_node(int nid)
+{
+ pg_data_t *pgdat;
+ int ret;
+
+ lock_system_sleep();
+ pgdat = hotadd_new_pgdat(nid, 0);
+ if (pgdat) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ node_set_online(nid);
+ ret = register_one_node(nid);
+ BUG_ON(ret);
+
+out:
+ unlock_system_sleep();
+ return ret;
+}
+
/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
int __ref add_memory(int nid, u64 start, u64 size)
{
static const struct mempolicy_operations {
int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
- void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes);
+ /*
+ * If read-side task has no lock to protect task->mempolicy, write-side
+ * task will rebind the task->mempolicy by two step. The first step is
+ * setting all the newly nodes, and the second step is cleaning all the
+ * disallowed nodes. In this way, we can avoid finding no node to alloc
+ * page.
+ * If we have a lock to protect task->mempolicy in read-side, we do
+ * rebind directly.
+ *
+ * step:
+ * MPOL_REBIND_ONCE - do rebind work at once
+ * MPOL_REBIND_STEP1 - set all the newly nodes
+ * MPOL_REBIND_STEP2 - clean all the disallowed nodes
+ */
+ void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes,
+ enum mpol_rebind_step step);
} mpol_ops[MPOL_MAX];
/* Check that the nodemask contains at least one populated zone */
{
int nd, k;
- /* Check that there is something useful in this mask */
- k = policy_zone;
-
for_each_node_mask(nd, *nodemask) {
struct zone *z;
static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
{
- return pol->flags & (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES);
+ return pol->flags & MPOL_MODE_FLAGS;
}
static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig,
kmem_cache_free(policy_cache, p);
}
-static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes)
+static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes,
+ enum mpol_rebind_step step)
{
}
-static void mpol_rebind_nodemask(struct mempolicy *pol,
- const nodemask_t *nodes)
+/*
+ * step:
+ * MPOL_REBIND_ONCE - do rebind work at once
+ * MPOL_REBIND_STEP1 - set all the newly nodes
+ * MPOL_REBIND_STEP2 - clean all the disallowed nodes
+ */
+static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes,
+ enum mpol_rebind_step step)
{
nodemask_t tmp;
else if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes);
else {
- nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed,
- *nodes);
- pol->w.cpuset_mems_allowed = *nodes;
+ /*
+ * if step == 1, we use ->w.cpuset_mems_allowed to cache the
+ * result
+ */
+ if (step == MPOL_REBIND_ONCE || step == MPOL_REBIND_STEP1) {
+ nodes_remap(tmp, pol->v.nodes,
+ pol->w.cpuset_mems_allowed, *nodes);
+ pol->w.cpuset_mems_allowed = step ? tmp : *nodes;
+ } else if (step == MPOL_REBIND_STEP2) {
+ tmp = pol->w.cpuset_mems_allowed;
+ pol->w.cpuset_mems_allowed = *nodes;
+ } else
+ BUG();
}
- pol->v.nodes = tmp;
+ if (nodes_empty(tmp))
+ tmp = *nodes;
+
+ if (step == MPOL_REBIND_STEP1)
+ nodes_or(pol->v.nodes, pol->v.nodes, tmp);
+ else if (step == MPOL_REBIND_ONCE || step == MPOL_REBIND_STEP2)
+ pol->v.nodes = tmp;
+ else
+ BUG();
+
if (!node_isset(current->il_next, tmp)) {
current->il_next = next_node(current->il_next, tmp);
if (current->il_next >= MAX_NUMNODES)
}
static void mpol_rebind_preferred(struct mempolicy *pol,
- const nodemask_t *nodes)
+ const nodemask_t *nodes,
+ enum mpol_rebind_step step)
{
nodemask_t tmp;
}
}
-/* Migrate a policy to a different set of nodes */
-static void mpol_rebind_policy(struct mempolicy *pol,
- const nodemask_t *newmask)
+/*
+ * mpol_rebind_policy - Migrate a policy to a different set of nodes
+ *
+ * If read-side task has no lock to protect task->mempolicy, write-side
+ * task will rebind the task->mempolicy by two step. The first step is
+ * setting all the newly nodes, and the second step is cleaning all the
+ * disallowed nodes. In this way, we can avoid finding no node to alloc
+ * page.
+ * If we have a lock to protect task->mempolicy in read-side, we do
+ * rebind directly.
+ *
+ * step:
+ * MPOL_REBIND_ONCE - do rebind work at once
+ * MPOL_REBIND_STEP1 - set all the newly nodes
+ * MPOL_REBIND_STEP2 - clean all the disallowed nodes
+ */
+static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask,
+ enum mpol_rebind_step step)
{
if (!pol)
return;
- if (!mpol_store_user_nodemask(pol) &&
+ if (!mpol_store_user_nodemask(pol) && step == 0 &&
nodes_equal(pol->w.cpuset_mems_allowed, *newmask))
return;
- mpol_ops[pol->mode].rebind(pol, newmask);
+
+ if (step == MPOL_REBIND_STEP1 && (pol->flags & MPOL_F_REBINDING))
+ return;
+
+ if (step == MPOL_REBIND_STEP2 && !(pol->flags & MPOL_F_REBINDING))
+ BUG();
+
+ if (step == MPOL_REBIND_STEP1)
+ pol->flags |= MPOL_F_REBINDING;
+ else if (step == MPOL_REBIND_STEP2)
+ pol->flags &= ~MPOL_F_REBINDING;
+ else if (step >= MPOL_REBIND_NSTEP)
+ BUG();
+
+ mpol_ops[pol->mode].rebind(pol, newmask, step);
}
/*
* Called with task's alloc_lock held.
*/
-void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new)
+void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new,
+ enum mpol_rebind_step step)
{
- mpol_rebind_policy(tsk->mempolicy, new);
+ mpol_rebind_policy(tsk->mempolicy, new, step);
}
/*
down_write(&mm->mmap_sem);
for (vma = mm->mmap; vma; vma = vma->vm_next)
- mpol_rebind_policy(vma->vm_policy, new);
+ mpol_rebind_policy(vma->vm_policy, new, MPOL_REBIND_ONCE);
up_write(&mm->mmap_sem);
}
nodes_clear(nmask);
node_set(source, nmask);
- check_range(mm, mm->mmap->vm_start, TASK_SIZE, &nmask,
+ check_range(mm, mm->mmap->vm_start, mm->task_size, &nmask,
flags | MPOL_MF_DISCONTIG_OK, &pagelist);
if (!list_empty(&pagelist))
/*
* Normally, MPOL_BIND allocations are node-local within the
* allowed nodemask. However, if __GFP_THISNODE is set and the
- * current node is part of the mask, we use the zonelist for
+ * current node isn't part of the mask, we use the zonelist for
* the first node in the mask instead.
*/
if (unlikely(gfp & __GFP_THISNODE) &&
unlikely(!node_isset(nd, policy->v.nodes)))
nd = first_node(policy->v.nodes);
break;
- case MPOL_INTERLEAVE: /* should not happen */
- break;
default:
BUG();
}
* to the struct mempolicy for conditional unref after allocation.
* If the effective policy is 'BIND, returns a pointer to the mempolicy's
* @nodemask for filtering the zonelist.
+ *
+ * Must be protected by get_mems_allowed()
*/
struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
gfp_t gfp_flags, struct mempolicy **mpol,
if (!(mask && current->mempolicy))
return false;
+ task_lock(current);
mempolicy = current->mempolicy;
switch (mempolicy->mode) {
case MPOL_PREFERRED:
default:
BUG();
}
+ task_unlock(current);
return true;
}
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ struct page *page;
+ get_mems_allowed();
if (unlikely(pol->mode == MPOL_INTERLEAVE)) {
unsigned nid;
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
mpol_cond_put(pol);
- return alloc_page_interleave(gfp, 0, nid);
+ page = alloc_page_interleave(gfp, 0, nid);
+ put_mems_allowed();
+ return page;
}
zl = policy_zonelist(gfp, pol);
if (unlikely(mpol_needs_cond_ref(pol))) {
struct page *page = __alloc_pages_nodemask(gfp, 0,
zl, policy_nodemask(gfp, pol));
__mpol_put(pol);
+ put_mems_allowed();
return page;
}
/*
* fast path: default or task policy
*/
- return __alloc_pages_nodemask(gfp, 0, zl, policy_nodemask(gfp, pol));
+ page = __alloc_pages_nodemask(gfp, 0, zl, policy_nodemask(gfp, pol));
+ put_mems_allowed();
+ return page;
}
/**
struct page *alloc_pages_current(gfp_t gfp, unsigned order)
{
struct mempolicy *pol = current->mempolicy;
+ struct page *page;
if (!pol || in_interrupt() || (gfp & __GFP_THISNODE))
pol = &default_policy;
+ get_mems_allowed();
/*
* No reference counting needed for current->mempolicy
* nor system default_policy
*/
if (pol->mode == MPOL_INTERLEAVE)
- return alloc_page_interleave(gfp, order, interleave_nodes(pol));
- return __alloc_pages_nodemask(gfp, order,
+ page = alloc_page_interleave(gfp, order, interleave_nodes(pol));
+ else
+ page = __alloc_pages_nodemask(gfp, order,
policy_zonelist(gfp, pol), policy_nodemask(gfp, pol));
+ put_mems_allowed();
+ return page;
}
EXPORT_SYMBOL(alloc_pages_current);
* with the mems_allowed returned by cpuset_mems_allowed(). This
* keeps mempolicies cpuset relative after its cpuset moves. See
* further kernel/cpuset.c update_nodemask().
+ *
+ * current's mempolicy may be rebinded by the other task(the task that changes
+ * cpuset's mems), so we needn't do rebind work for current task.
*/
/* Slow path of a mempolicy duplicate */
if (!new)
return ERR_PTR(-ENOMEM);
+
+ /* task's mempolicy is protected by alloc_lock */
+ if (old == current->mempolicy) {
+ task_lock(current);
+ *new = *old;
+ task_unlock(current);
+ } else
+ *new = *old;
+
rcu_read_lock();
if (current_cpuset_is_being_rebound()) {
nodemask_t mems = cpuset_mems_allowed(current);
- mpol_rebind_policy(old, &mems);
+ if (new->flags & MPOL_F_REBINDING)
+ mpol_rebind_policy(new, &mems, MPOL_REBIND_STEP2);
+ else
+ mpol_rebind_policy(new, &mems, MPOL_REBIND_ONCE);
}
rcu_read_unlock();
- *new = *old;
atomic_set(&new->refcnt, 1);
return new;
}
return tompol;
}
-static int mpol_match_intent(const struct mempolicy *a,
- const struct mempolicy *b)
-{
- if (a->flags != b->flags)
- return 0;
- if (!mpol_store_user_nodemask(a))
- return 1;
- return nodes_equal(a->w.user_nodemask, b->w.user_nodemask);
-}
-
/* Slow path of a mempolicy comparison */
int __mpol_equal(struct mempolicy *a, struct mempolicy *b)
{
return 0;
if (a->mode != b->mode)
return 0;
- if (a->mode != MPOL_DEFAULT && !mpol_match_intent(a, b))
+ if (a->flags != b->flags)
return 0;
+ if (mpol_store_user_nodemask(a))
+ if (!nodes_equal(a->w.user_nodemask, b->w.user_nodemask))
+ return 0;
+
switch (a->mode) {
case MPOL_BIND:
/* Fall through */
return;
/* contextualize the tmpfs mount point mempolicy */
new = mpol_new(mpol->mode, mpol->flags, &mpol->w.user_nodemask);
- if (IS_ERR(new)) {
- mpol_put(mpol); /* drop our ref on sb mpol */
- NODEMASK_SCRATCH_FREE(scratch);
- return; /* no valid nodemask intersection */
- }
+ if (IS_ERR(new))
+ goto put_free; /* no valid nodemask intersection */
task_lock(current);
ret = mpol_set_nodemask(new, &mpol->w.user_nodemask, scratch);
task_unlock(current);
mpol_put(mpol); /* drop our ref on sb mpol */
- if (ret) {
- NODEMASK_SCRATCH_FREE(scratch);
- mpol_put(new);
- return;
- }
+ if (ret)
+ goto put_free;
/* Create pseudo-vma that contains just the policy */
memset(&pvma, 0, sizeof(struct vm_area_struct));
pvma.vm_end = TASK_SIZE; /* policy covers entire file */
mpol_set_shared_policy(sp, &pvma, new); /* adds ref */
+
+put_free:
mpol_put(new); /* drop initial ref */
NODEMASK_SCRATCH_FREE(scratch);
}
* "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag
* Used only for mpol_parse_str() and mpol_to_str()
*/
-#define MPOL_LOCAL (MPOL_INTERLEAVE + 1)
-static const char * const policy_types[] =
- { "default", "prefer", "bind", "interleave", "local" };
+#define MPOL_LOCAL MPOL_MAX
+static const char * const policy_modes[] =
+{
+ [MPOL_DEFAULT] = "default",
+ [MPOL_PREFERRED] = "prefer",
+ [MPOL_BIND] = "bind",
+ [MPOL_INTERLEAVE] = "interleave",
+ [MPOL_LOCAL] = "local"
+};
#ifdef CONFIG_TMPFS
int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
{
struct mempolicy *new = NULL;
- unsigned short uninitialized_var(mode);
+ unsigned short mode;
unsigned short uninitialized_var(mode_flags);
nodemask_t nodes;
char *nodelist = strchr(str, ':');
char *flags = strchr(str, '=');
- int i;
int err = 1;
if (nodelist) {
if (flags)
*flags++ = '\0'; /* terminate mode string */
- for (i = 0; i <= MPOL_LOCAL; i++) {
- if (!strcmp(str, policy_types[i])) {
- mode = i;
+ for (mode = 0; mode <= MPOL_LOCAL; mode++) {
+ if (!strcmp(str, policy_modes[mode])) {
break;
}
}
- if (i > MPOL_LOCAL)
+ if (mode > MPOL_LOCAL)
goto out;
switch (mode) {
if (IS_ERR(new))
goto out;
- {
+ if (no_context) {
+ /* save for contextualization */
+ new->w.user_nodemask = nodes;
+ } else {
int ret;
NODEMASK_SCRATCH(scratch);
if (scratch) {
}
}
err = 0;
- if (no_context) {
- /* save for contextualization */
- new->w.user_nodemask = nodes;
- }
out:
/* Restore string for error message */
BUG();
}
- l = strlen(policy_types[mode]);
+ l = strlen(policy_modes[mode]);
if (buffer + maxlen < p + l + 1)
return -ENOSPC;
- strcpy(p, policy_types[mode]);
+ strcpy(p, policy_modes[mode]);
p += l;
if (flags & MPOL_MODE_FLAGS) {
/*
* migrate_prep() needs to be called before we start compiling a list of pages
- * to be migrated using isolate_lru_page().
+ * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is
+ * undesirable, use migrate_prep_local()
*/
int migrate_prep(void)
{
return 0;
}
+/* Do the necessary work of migrate_prep but not if it involves other CPUs */
+int migrate_prep_local(void)
+{
+ lru_add_drain();
+
+ return 0;
+}
+
/*
* Add isolated pages on the list back to the LRU under page lock
* to avoid leaking evictable pages back onto unevictable list.
- *
- * returns the number of pages put back.
*/
-int putback_lru_pages(struct list_head *l)
+void putback_lru_pages(struct list_head *l)
{
struct page *page;
struct page *page2;
- int count = 0;
list_for_each_entry_safe(page, page2, l, lru) {
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
putback_lru_page(page);
- count++;
}
- return count;
}
/*
* < 0 - error code
* == 0 - success
*/
-static int move_to_new_page(struct page *newpage, struct page *page)
+static int move_to_new_page(struct page *newpage, struct page *page,
+ int remap_swapcache)
{
struct address_space *mapping;
int rc;
else
rc = fallback_migrate_page(mapping, newpage, page);
- if (!rc)
- remove_migration_ptes(page, newpage);
- else
+ if (rc) {
newpage->mapping = NULL;
+ } else {
+ if (remap_swapcache)
+ remove_migration_ptes(page, newpage);
+ }
unlock_page(newpage);
int rc = 0;
int *result = NULL;
struct page *newpage = get_new_page(page, private, &result);
+ int remap_swapcache = 1;
int rcu_locked = 0;
int charge = 0;
struct mem_cgroup *mem = NULL;
+ struct anon_vma *anon_vma = NULL;
if (!newpage)
return -ENOMEM;
if (PageAnon(page)) {
rcu_read_lock();
rcu_locked = 1;
+
+ /* Determine how to safely use anon_vma */
+ if (!page_mapped(page)) {
+ if (!PageSwapCache(page))
+ goto rcu_unlock;
+
+ /*
+ * We cannot be sure that the anon_vma of an unmapped
+ * swapcache page is safe to use because we don't
+ * know in advance if the VMA that this page belonged
+ * to still exists. If the VMA and others sharing the
+ * data have been freed, then the anon_vma could
+ * already be invalid.
+ *
+ * To avoid this possibility, swapcache pages get
+ * migrated but are not remapped when migration
+ * completes
+ */
+ remap_swapcache = 0;
+ } else {
+ /*
+ * Take a reference count on the anon_vma if the
+ * page is mapped so that it is guaranteed to
+ * exist when the page is remapped later
+ */
+ anon_vma = page_anon_vma(page);
+ atomic_inc(&anon_vma->external_refcount);
+ }
}
/*
skip_unmap:
if (!page_mapped(page))
- rc = move_to_new_page(newpage, page);
+ rc = move_to_new_page(newpage, page, remap_swapcache);
- if (rc)
+ if (rc && remap_swapcache)
remove_migration_ptes(page, page);
rcu_unlock:
+
+ /* Drop an anon_vma reference if we took one */
+ if (anon_vma && atomic_dec_and_lock(&anon_vma->external_refcount, &anon_vma->lock)) {
+ int empty = list_empty(&anon_vma->head);
+ spin_unlock(&anon_vma->lock);
+ if (empty)
+ anon_vma_free(anon_vma);
+ }
+
if (rcu_locked)
rcu_read_unlock();
uncharge:
#include <asm/uaccess.h>
#include <asm/pgtable.h>
+static void mincore_hugetlb_page_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
+{
+#ifdef CONFIG_HUGETLB_PAGE
+ struct hstate *h;
+
+ h = hstate_vma(vma);
+ while (1) {
+ unsigned char present;
+ pte_t *ptep;
+ /*
+ * Huge pages are always in RAM for now, but
+ * theoretically it needs to be checked.
+ */
+ ptep = huge_pte_offset(current->mm,
+ addr & huge_page_mask(h));
+ present = ptep && !huge_pte_none(huge_ptep_get(ptep));
+ while (1) {
+ *vec = present;
+ vec++;
+ addr += PAGE_SIZE;
+ if (addr == end)
+ return;
+ /* check hugepage border */
+ if (!(addr & ~huge_page_mask(h)))
+ break;
+ }
+ }
+#else
+ BUG();
+#endif
+}
+
/*
* Later we can get more picky about what "in core" means precisely.
* For now, simply check to see if the page is in the page cache,
return present;
}
-/*
- * Do a chunk of "sys_mincore()". We've already checked
- * all the arguments, we hold the mmap semaphore: we should
- * just return the amount of info we're asked for.
- */
-static long do_mincore(unsigned long addr, unsigned char *vec, unsigned long pages)
+static void mincore_unmapped_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
{
- pgd_t *pgd;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *ptep;
- spinlock_t *ptl;
- unsigned long nr;
+ unsigned long nr = (end - addr) >> PAGE_SHIFT;
int i;
- pgoff_t pgoff;
- struct vm_area_struct *vma = find_vma(current->mm, addr);
- /*
- * find_vma() didn't find anything above us, or we're
- * in an unmapped hole in the address space: ENOMEM.
- */
- if (!vma || addr < vma->vm_start)
- return -ENOMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
- if (is_vm_hugetlb_page(vma)) {
- struct hstate *h;
- unsigned long nr_huge;
- unsigned char present;
+ if (vma->vm_file) {
+ pgoff_t pgoff;
- i = 0;
- nr = min(pages, (vma->vm_end - addr) >> PAGE_SHIFT);
- h = hstate_vma(vma);
- nr_huge = ((addr + pages * PAGE_SIZE - 1) >> huge_page_shift(h))
- - (addr >> huge_page_shift(h)) + 1;
- nr_huge = min(nr_huge,
- (vma->vm_end - addr) >> huge_page_shift(h));
- while (1) {
- /* hugepage always in RAM for now,
- * but generally it needs to be check */
- ptep = huge_pte_offset(current->mm,
- addr & huge_page_mask(h));
- present = !!(ptep &&
- !huge_pte_none(huge_ptep_get(ptep)));
- while (1) {
- vec[i++] = present;
- addr += PAGE_SIZE;
- /* reach buffer limit */
- if (i == nr)
- return nr;
- /* check hugepage border */
- if (!((addr & ~huge_page_mask(h))
- >> PAGE_SHIFT))
- break;
- }
- }
- return nr;
+ pgoff = linear_page_index(vma, addr);
+ for (i = 0; i < nr; i++, pgoff++)
+ vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff);
+ } else {
+ for (i = 0; i < nr; i++)
+ vec[i] = 0;
}
-#endif
-
- /*
- * Calculate how many pages there are left in the last level of the
- * PTE array for our address.
- */
- nr = PTRS_PER_PTE - ((addr >> PAGE_SHIFT) & (PTRS_PER_PTE-1));
-
- /*
- * Don't overrun this vma
- */
- nr = min(nr, (vma->vm_end - addr) >> PAGE_SHIFT);
-
- /*
- * Don't return more than the caller asked for
- */
- nr = min(nr, pages);
+}
- pgd = pgd_offset(vma->vm_mm, addr);
- if (pgd_none_or_clear_bad(pgd))
- goto none_mapped;
- pud = pud_offset(pgd, addr);
- if (pud_none_or_clear_bad(pud))
- goto none_mapped;
- pmd = pmd_offset(pud, addr);
- if (pmd_none_or_clear_bad(pmd))
- goto none_mapped;
+static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
+{
+ unsigned long next;
+ spinlock_t *ptl;
+ pte_t *ptep;
ptep = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
- for (i = 0; i < nr; i++, ptep++, addr += PAGE_SIZE) {
- unsigned char present;
+ do {
pte_t pte = *ptep;
+ pgoff_t pgoff;
- if (pte_present(pte)) {
- present = 1;
-
- } else if (pte_none(pte)) {
- if (vma->vm_file) {
- pgoff = linear_page_index(vma, addr);
- present = mincore_page(vma->vm_file->f_mapping,
- pgoff);
- } else
- present = 0;
-
- } else if (pte_file(pte)) {
+ next = addr + PAGE_SIZE;
+ if (pte_none(pte))
+ mincore_unmapped_range(vma, addr, next, vec);
+ else if (pte_present(pte))
+ *vec = 1;
+ else if (pte_file(pte)) {
pgoff = pte_to_pgoff(pte);
- present = mincore_page(vma->vm_file->f_mapping, pgoff);
-
+ *vec = mincore_page(vma->vm_file->f_mapping, pgoff);
} else { /* pte is a swap entry */
swp_entry_t entry = pte_to_swp_entry(pte);
+
if (is_migration_entry(entry)) {
/* migration entries are always uptodate */
- present = 1;
+ *vec = 1;
} else {
#ifdef CONFIG_SWAP
pgoff = entry.val;
- present = mincore_page(&swapper_space, pgoff);
+ *vec = mincore_page(&swapper_space, pgoff);
#else
WARN_ON(1);
- present = 1;
+ *vec = 1;
#endif
}
}
+ vec++;
+ } while (ptep++, addr = next, addr != end);
+ pte_unmap_unlock(ptep - 1, ptl);
+}
- vec[i] = present;
- }
- pte_unmap_unlock(ptep-1, ptl);
+static void mincore_pmd_range(struct vm_area_struct *vma, pud_t *pud,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
+{
+ unsigned long next;
+ pmd_t *pmd;
- return nr;
+ pmd = pmd_offset(pud, addr);
+ do {
+ next = pmd_addr_end(addr, end);
+ if (pmd_none_or_clear_bad(pmd))
+ mincore_unmapped_range(vma, addr, next, vec);
+ else
+ mincore_pte_range(vma, pmd, addr, next, vec);
+ vec += (next - addr) >> PAGE_SHIFT;
+ } while (pmd++, addr = next, addr != end);
+}
-none_mapped:
- if (vma->vm_file) {
- pgoff = linear_page_index(vma, addr);
- for (i = 0; i < nr; i++, pgoff++)
- vec[i] = mincore_page(vma->vm_file->f_mapping, pgoff);
- } else {
- for (i = 0; i < nr; i++)
- vec[i] = 0;
+static void mincore_pud_range(struct vm_area_struct *vma, pgd_t *pgd,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
+{
+ unsigned long next;
+ pud_t *pud;
+
+ pud = pud_offset(pgd, addr);
+ do {
+ next = pud_addr_end(addr, end);
+ if (pud_none_or_clear_bad(pud))
+ mincore_unmapped_range(vma, addr, next, vec);
+ else
+ mincore_pmd_range(vma, pud, addr, next, vec);
+ vec += (next - addr) >> PAGE_SHIFT;
+ } while (pud++, addr = next, addr != end);
+}
+
+static void mincore_page_range(struct vm_area_struct *vma,
+ unsigned long addr, unsigned long end,
+ unsigned char *vec)
+{
+ unsigned long next;
+ pgd_t *pgd;
+
+ pgd = pgd_offset(vma->vm_mm, addr);
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none_or_clear_bad(pgd))
+ mincore_unmapped_range(vma, addr, next, vec);
+ else
+ mincore_pud_range(vma, pgd, addr, next, vec);
+ vec += (next - addr) >> PAGE_SHIFT;
+ } while (pgd++, addr = next, addr != end);
+}
+
+/*
+ * Do a chunk of "sys_mincore()". We've already checked
+ * all the arguments, we hold the mmap semaphore: we should
+ * just return the amount of info we're asked for.
+ */
+static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *vec)
+{
+ struct vm_area_struct *vma;
+ unsigned long end;
+
+ vma = find_vma(current->mm, addr);
+ if (!vma || addr < vma->vm_start)
+ return -ENOMEM;
+
+ end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
+
+ if (is_vm_hugetlb_page(vma)) {
+ mincore_hugetlb_page_range(vma, addr, end, vec);
+ return (end - addr) >> PAGE_SHIFT;
}
- return nr;
+ end = pmd_addr_end(addr, end);
+
+ if (is_vm_hugetlb_page(vma))
+ mincore_hugetlb_page_range(vma, addr, end, vec);
+ else
+ mincore_page_range(vma, addr, end, vec);
+
+ return (end - addr) >> PAGE_SHIFT;
}
/*
* the temporary buffer size.
*/
down_read(¤t->mm->mmap_sem);
- retval = do_mincore(start, tmp, min(pages, PAGE_SIZE));
+ retval = do_mincore(start, min(pages, PAGE_SIZE), tmp);
up_read(¤t->mm->mmap_sem);
if (retval <= 0)
#include <linux/debugobjects.h>
#include <linux/kmemleak.h>
#include <linux/memory.h>
+#include <linux/compaction.h>
#include <trace/events/kmem.h>
#include <linux/ftrace_event.h>
int migratetype)
{
unsigned long page_idx;
+ unsigned long combined_idx;
+ struct page *buddy;
if (unlikely(PageCompound(page)))
if (unlikely(destroy_compound_page(page, order)))
VM_BUG_ON(bad_range(zone, page));
while (order < MAX_ORDER-1) {
- unsigned long combined_idx;
- struct page *buddy;
-
buddy = __page_find_buddy(page, page_idx, order);
if (!page_is_buddy(page, buddy, order))
break;
order++;
}
set_page_order(page, order);
- list_add(&page->lru,
- &zone->free_area[order].free_list[migratetype]);
+
+ /*
+ * If this is not the largest possible page, check if the buddy
+ * of the next-highest order is free. If it is, it's possible
+ * that pages are being freed that will coalesce soon. In case,
+ * that is happening, add the free page to the tail of the list
+ * so it's less likely to be used soon and more likely to be merged
+ * as a higher order page
+ */
+ if ((order < MAX_ORDER-1) && pfn_valid_within(page_to_pfn(buddy))) {
+ struct page *higher_page, *higher_buddy;
+ combined_idx = __find_combined_index(page_idx, order);
+ higher_page = page + combined_idx - page_idx;
+ higher_buddy = __page_find_buddy(higher_page, combined_idx, order + 1);
+ if (page_is_buddy(higher_page, higher_buddy, order + 1)) {
+ list_add_tail(&page->lru,
+ &zone->free_area[order].free_list[migratetype]);
+ goto out;
+ }
+ }
+
+ list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
+out:
zone->free_area[order].nr_free++;
}
spin_unlock(&zone->lock);
}
-static void __free_pages_ok(struct page *page, unsigned int order)
+static bool free_pages_prepare(struct page *page, unsigned int order)
{
- unsigned long flags;
int i;
int bad = 0;
- int wasMlocked = __TestClearPageMlocked(page);
trace_mm_page_free_direct(page, order);
kmemcheck_free_shadow(page, order);
- for (i = 0 ; i < (1 << order) ; ++i)
- bad += free_pages_check(page + i);
+ for (i = 0; i < (1 << order); i++) {
+ struct page *pg = page + i;
+
+ if (PageAnon(pg))
+ pg->mapping = NULL;
+ bad += free_pages_check(pg);
+ }
if (bad)
- return;
+ return false;
if (!PageHighMem(page)) {
debug_check_no_locks_freed(page_address(page),PAGE_SIZE<<order);
arch_free_page(page, order);
kernel_map_pages(page, 1 << order, 0);
+ return true;
+}
+
+static void __free_pages_ok(struct page *page, unsigned int order)
+{
+ unsigned long flags;
+ int wasMlocked = __TestClearPageMlocked(page);
+
+ if (!free_pages_prepare(page, order))
+ return;
+
local_irq_save(flags);
if (unlikely(wasMlocked))
free_page_mlock(page);
int migratetype;
int wasMlocked = __TestClearPageMlocked(page);
- trace_mm_page_free_direct(page, 0);
- kmemcheck_free_shadow(page, 0);
-
- if (PageAnon(page))
- page->mapping = NULL;
- if (free_pages_check(page))
+ if (!free_pages_prepare(page, 0))
return;
- if (!PageHighMem(page)) {
- debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
- debug_check_no_obj_freed(page_address(page), PAGE_SIZE);
- }
- arch_free_page(page, 0);
- kernel_map_pages(page, 1, 0);
-
migratetype = get_pageblock_migratetype(page);
set_page_private(page, migratetype);
local_irq_save(flags);
set_page_refcounted(page + i);
}
+/*
+ * Similar to split_page except the page is already free. As this is only
+ * being used for migration, the migratetype of the block also changes.
+ * As this is called with interrupts disabled, the caller is responsible
+ * for calling arch_alloc_page() and kernel_map_page() after interrupts
+ * are enabled.
+ *
+ * Note: this is probably too low level an operation for use in drivers.
+ * Please consult with lkml before using this in your driver.
+ */
+int split_free_page(struct page *page)
+{
+ unsigned int order;
+ unsigned long watermark;
+ struct zone *zone;
+
+ BUG_ON(!PageBuddy(page));
+
+ zone = page_zone(page);
+ order = page_order(page);
+
+ /* Obey watermarks as if the page was being allocated */
+ watermark = low_wmark_pages(zone) + (1 << order);
+ if (!zone_watermark_ok(zone, 0, watermark, 0, 0))
+ return 0;
+
+ /* Remove page from free list */
+ list_del(&page->lru);
+ zone->free_area[order].nr_free--;
+ rmv_page_order(page);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
+
+ /* Split into individual pages */
+ set_page_refcounted(page);
+ split_page(page, order);
+
+ if (order >= pageblock_order - 1) {
+ struct page *endpage = page + (1 << order) - 1;
+ for (; page < endpage; page += pageblock_nr_pages)
+ set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+ }
+
+ return 1 << order;
+}
+
/*
* Really, prep_compound_page() should be called from __rmqueue_bulk(). But
* we cheat by calling it from here, in the order > 0 path. Saves a branch
return page;
}
+#ifdef CONFIG_COMPACTION
+/* Try memory compaction for high-order allocations before reclaim */
+static struct page *
+__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist, enum zone_type high_zoneidx,
+ nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
+ int migratetype, unsigned long *did_some_progress)
+{
+ struct page *page;
+
+ if (!order || compaction_deferred(preferred_zone))
+ return NULL;
+
+ *did_some_progress = try_to_compact_pages(zonelist, order, gfp_mask,
+ nodemask);
+ if (*did_some_progress != COMPACT_SKIPPED) {
+
+ /* Page migration frees to the PCP lists but we want merging */
+ drain_pages(get_cpu());
+ put_cpu();
+
+ page = get_page_from_freelist(gfp_mask, nodemask,
+ order, zonelist, high_zoneidx,
+ alloc_flags, preferred_zone,
+ migratetype);
+ if (page) {
+ preferred_zone->compact_considered = 0;
+ preferred_zone->compact_defer_shift = 0;
+ count_vm_event(COMPACTSUCCESS);
+ return page;
+ }
+
+ /*
+ * It's bad if compaction run occurs and fails.
+ * The most likely reason is that pages exist,
+ * but not enough to satisfy watermarks.
+ */
+ count_vm_event(COMPACTFAIL);
+ defer_compaction(preferred_zone);
+
+ cond_resched();
+ }
+
+ return NULL;
+}
+#else
+static inline struct page *
+__alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist, enum zone_type high_zoneidx,
+ nodemask_t *nodemask, int alloc_flags, struct zone *preferred_zone,
+ int migratetype, unsigned long *did_some_progress)
+{
+ return NULL;
+}
+#endif /* CONFIG_COMPACTION */
+
/* The really slow allocator path where we enter direct reclaim */
static inline struct page *
__alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
if (test_thread_flag(TIF_MEMDIE) && !(gfp_mask & __GFP_NOFAIL))
goto nopage;
+ /* Try direct compaction */
+ page = __alloc_pages_direct_compact(gfp_mask, order,
+ zonelist, high_zoneidx,
+ nodemask,
+ alloc_flags, preferred_zone,
+ migratetype, &did_some_progress);
+ if (page)
+ goto got_pg;
+
/* Try direct reclaim and then allocating */
page = __alloc_pages_direct_reclaim(gfp_mask, order,
zonelist, high_zoneidx,
if (unlikely(!zonelist->_zonerefs->zone))
return NULL;
+ get_mems_allowed();
/* The preferred zone is used for statistics later */
first_zones_zonelist(zonelist, high_zoneidx, nodemask, &preferred_zone);
- if (!preferred_zone)
+ if (!preferred_zone) {
+ put_mems_allowed();
return NULL;
+ }
/* First allocation attempt */
page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
page = __alloc_pages_slowpath(gfp_mask, order,
zonelist, high_zoneidx, nodemask,
preferred_zone, migratetype);
+ put_mems_allowed();
trace_mm_page_alloc(page, order, gfp_mask, migratetype);
return page;
strncpy((char*)table->data, saved_string,
NUMA_ZONELIST_ORDER_LEN);
user_zonelist_order = oldval;
- } else if (oldval != user_zonelist_order)
- build_all_zonelists();
+ } else if (oldval != user_zonelist_order) {
+ mutex_lock(&zonelists_mutex);
+ build_all_zonelists(NULL);
+ mutex_unlock(&zonelists_mutex);
+ }
}
out:
mutex_unlock(&zl_order_mutex);
* ZONE_DMA and ZONE_DMA32 can be very small area in the system.
* If they are really small and used heavily, the system can fall
* into OOM very easily.
- * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
+ * This function detect ZONE_DMA/DMA32 size and configures zone order.
*/
/* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
low_kmem_size = 0;
if (zone_type < ZONE_NORMAL)
low_kmem_size += z->present_pages;
total_size += z->present_pages;
+ } else if (zone_type == ZONE_NORMAL) {
+ /*
+ * If any node has only lowmem, then node order
+ * is preferred to allow kernel allocations
+ * locally; otherwise, they can easily infringe
+ * on other nodes when there is an abundance of
+ * lowmem available to allocate from.
+ */
+ return ZONELIST_ORDER_NODE;
}
}
}
*/
static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch);
static DEFINE_PER_CPU(struct per_cpu_pageset, boot_pageset);
+static void setup_zone_pageset(struct zone *zone);
+
+/*
+ * Global mutex to protect against size modification of zonelists
+ * as well as to serialize pageset setup for the new populated zone.
+ */
+DEFINE_MUTEX(zonelists_mutex);
/* return values int ....just for stop_machine() */
-static int __build_all_zonelists(void *dummy)
+static __init_refok int __build_all_zonelists(void *data)
{
int nid;
int cpu;
build_zonelist_cache(pgdat);
}
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /* Setup real pagesets for the new zone */
+ if (data) {
+ struct zone *zone = data;
+ setup_zone_pageset(zone);
+ }
+#endif
+
/*
* Initialize the boot_pagesets that are going to be used
* for bootstrapping processors. The real pagesets for
return 0;
}
-void build_all_zonelists(void)
+/*
+ * Called with zonelists_mutex held always
+ * unless system_state == SYSTEM_BOOTING.
+ */
+void build_all_zonelists(void *data)
{
set_zonelist_order();
} else {
/* we have to stop all cpus to guarantee there is no user
of zonelist */
- stop_machine(__build_all_zonelists, NULL, NULL);
+ stop_machine(__build_all_zonelists, data, NULL);
/* cpuset refresh routine should be here */
}
vm_total_pages = nr_free_pagecache_pages();
pcp->batch = PAGE_SHIFT * 8;
}
+static __meminit void setup_zone_pageset(struct zone *zone)
+{
+ int cpu;
+
+ zone->pageset = alloc_percpu(struct per_cpu_pageset);
+
+ for_each_possible_cpu(cpu) {
+ struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
+
+ setup_pageset(pcp, zone_batchsize(zone));
+
+ if (percpu_pagelist_fraction)
+ setup_pagelist_highmark(pcp,
+ (zone->present_pages /
+ percpu_pagelist_fraction));
+ }
+}
+
/*
* Allocate per cpu pagesets and initialize them.
* Before this call only boot pagesets were available.
- * Boot pagesets will no longer be used by this processorr
- * after setup_per_cpu_pageset().
*/
void __init setup_per_cpu_pageset(void)
{
struct zone *zone;
- int cpu;
- for_each_populated_zone(zone) {
- zone->pageset = alloc_percpu(struct per_cpu_pageset);
-
- for_each_possible_cpu(cpu) {
- struct per_cpu_pageset *pcp = per_cpu_ptr(zone->pageset, cpu);
-
- setup_pageset(pcp, zone_batchsize(zone));
-
- if (percpu_pagelist_fraction)
- setup_pagelist_highmark(pcp,
- (zone->present_pages /
- percpu_pagelist_fraction));
- }
- }
+ for_each_populated_zone(zone)
+ setup_zone_pageset(zone);
}
static noinline __init_refok
* @req_size: hint: total size of the read which the caller is performing in
* pagecache pages
*
- * page_cache_async_ondemand() should be called when a page is used which
+ * page_cache_async_readahead() should be called when a page is used which
* has the PG_readahead flag; this is a marker to suggest that the application
* has used up enough of the readahead window that we should start pulling in
* more pages.
list_del(&anon_vma_chain->same_anon_vma);
/* We must garbage collect the anon_vma if it's empty */
- empty = list_empty(&anon_vma->head) && !ksm_refcount(anon_vma);
+ empty = list_empty(&anon_vma->head) && !anonvma_external_refcount(anon_vma);
spin_unlock(&anon_vma->lock);
if (empty)
struct anon_vma *anon_vma = data;
spin_lock_init(&anon_vma->lock);
- ksm_refcount_init(anon_vma);
+ anonvma_external_refcount_init(anon_vma);
INIT_LIST_HEAD(&anon_vma->head);
}
return ret;
}
+static bool is_vma_temporary_stack(struct vm_area_struct *vma)
+{
+ int maybe_stack = vma->vm_flags & (VM_GROWSDOWN | VM_GROWSUP);
+
+ if (!maybe_stack)
+ return false;
+
+ if ((vma->vm_flags & VM_STACK_INCOMPLETE_SETUP) ==
+ VM_STACK_INCOMPLETE_SETUP)
+ return true;
+
+ return false;
+}
+
/**
* try_to_unmap_anon - unmap or unlock anonymous page using the object-based
* rmap method
list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
struct vm_area_struct *vma = avc->vma;
- unsigned long address = vma_address(page, vma);
+ unsigned long address;
+
+ /*
+ * During exec, a temporary VMA is setup and later moved.
+ * The VMA is moved under the anon_vma lock but not the
+ * page tables leading to a race where migration cannot
+ * find the migration ptes. Rather than increasing the
+ * locking requirements of exec(), migration skips
+ * temporary VMAs until after exec() completes.
+ */
+ if (PAGE_MIGRATION && (flags & TTU_MIGRATION) &&
+ is_vma_temporary_stack(vma))
+ continue;
+
+ address = vma_address(page, vma);
if (address == -EFAULT)
continue;
ret = try_to_unmap_one(page, vma, address, flags);
/*
* Note: remove_migration_ptes() cannot use page_lock_anon_vma()
* because that depends on page_mapped(); but not all its usages
- * are holding mmap_sem, which also gave the necessary guarantee
- * (that this anon_vma's slab has not already been destroyed).
- * This needs to be reviewed later: avoiding page_lock_anon_vma()
- * is risky, and currently limits the usefulness of rmap_walk().
+ * are holding mmap_sem. Users without mmap_sem are required to
+ * take a reference count to prevent the anon_vma disappearing
*/
anon_vma = page_anon_vma(page);
if (!anon_vma)
spin_unlock(&info->lock);
page = shmem_dir_alloc(mapping_gfp_mask(inode->i_mapping));
- if (page)
- set_page_private(page, 0);
spin_lock(&info->lock);
if (!page) {
if (in_interrupt() || (flags & __GFP_THISNODE))
return NULL;
nid_alloc = nid_here = numa_node_id();
+ get_mems_allowed();
if (cpuset_do_slab_mem_spread() && (cachep->flags & SLAB_MEM_SPREAD))
nid_alloc = cpuset_mem_spread_node();
else if (current->mempolicy)
nid_alloc = slab_node(current->mempolicy);
+ put_mems_allowed();
if (nid_alloc != nid_here)
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
if (flags & __GFP_THISNODE)
return NULL;
+ get_mems_allowed();
zonelist = node_zonelist(slab_node(current->mempolicy), flags);
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
}
}
}
+ put_mems_allowed();
return obj;
}
get_cycles() % 1024 > s->remote_node_defrag_ratio)
return NULL;
+ get_mems_allowed();
zonelist = node_zonelist(slab_node(current->mempolicy), flags);
for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
struct kmem_cache_node *n;
if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
n->nr_partial > s->min_partial) {
page = get_partial_node(n);
- if (page)
+ if (page) {
+ put_mems_allowed();
return page;
+ }
}
}
+ put_mems_allowed();
#endif
return NULL;
}
struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
{
struct page *map;
+ unsigned long size;
map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
if (map)
return map;
- map = alloc_bootmem_pages_node(NODE_DATA(nid),
- PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION));
+ size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+ map = __alloc_bootmem_node_high(NODE_DATA(nid), size,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
return map;
}
void __init sparse_mem_maps_populate_node(struct page **map_map,
}
size = PAGE_ALIGN(size);
- map = alloc_bootmem_pages_node(NODE_DATA(nodeid), size * map_count);
+ map = __alloc_bootmem_node_high(NODE_DATA(nodeid), size * map_count,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
if (map) {
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))
int swappiness;
- int all_unreclaimable;
-
int order;
+ /*
+ * Intend to reclaim enough contenious memory rather than to reclaim
+ * enough amount memory. I.e, it's the mode for high order allocation.
+ */
+ bool lumpy_reclaim_mode;
+
/* Which cgroup do we reclaim from */
struct mem_cgroup *mem_cgroup;
* are scanned.
*/
nodemask_t *nodemask;
-
- /* Pluggable isolate pages callback */
- unsigned long (*isolate_pages)(unsigned long nr, struct list_head *dst,
- unsigned long *scanned, int order, int mode,
- struct zone *z, struct mem_cgroup *mem_cont,
- int active, int file);
};
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
referenced_page = TestClearPageReferenced(page);
/* Lumpy reclaim - ignore references */
- if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+ if (sc->lumpy_reclaim_mode)
return PAGEREF_RECLAIM;
/*
return nr_reclaimed;
}
-/* LRU Isolation modes. */
-#define ISOLATE_INACTIVE 0 /* Isolate inactive pages. */
-#define ISOLATE_ACTIVE 1 /* Isolate active pages. */
-#define ISOLATE_BOTH 2 /* Isolate both active and inactive pages. */
-
/*
* Attempt to remove the specified page from its LRU. Only take this page
* if it is of the appropriate PageActive status. Pages which are being
struct list_head *dst,
unsigned long *scanned, int order,
int mode, struct zone *z,
- struct mem_cgroup *mem_cont,
int active, int file)
{
int lru = LRU_BASE;
unsigned long nr_scanned = 0;
unsigned long nr_reclaimed = 0;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
- int lumpy_reclaim = 0;
while (unlikely(too_many_isolated(zone, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
return SWAP_CLUSTER_MAX;
}
- /*
- * If we need a large contiguous chunk of memory, or have
- * trouble getting a small set of contiguous pages, we
- * will reclaim both active and inactive pages.
- *
- * We use the same threshold as pageout congestion_wait below.
- */
- if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
- lumpy_reclaim = 1;
- else if (sc->order && priority < DEF_PRIORITY - 2)
- lumpy_reclaim = 1;
pagevec_init(&pvec, 1);
unsigned long nr_freed;
unsigned long nr_active;
unsigned int count[NR_LRU_LISTS] = { 0, };
- int mode = lumpy_reclaim ? ISOLATE_BOTH : ISOLATE_INACTIVE;
+ int mode = sc->lumpy_reclaim_mode ? ISOLATE_BOTH : ISOLATE_INACTIVE;
unsigned long nr_anon;
unsigned long nr_file;
- nr_taken = sc->isolate_pages(SWAP_CLUSTER_MAX,
- &page_list, &nr_scan, sc->order, mode,
- zone, sc->mem_cgroup, 0, file);
-
if (scanning_global_lru(sc)) {
+ nr_taken = isolate_pages_global(SWAP_CLUSTER_MAX,
+ &page_list, &nr_scan,
+ sc->order, mode,
+ zone, 0, file);
zone->pages_scanned += nr_scan;
if (current_is_kswapd())
__count_zone_vm_events(PGSCAN_KSWAPD, zone,
else
__count_zone_vm_events(PGSCAN_DIRECT, zone,
nr_scan);
+ } else {
+ nr_taken = mem_cgroup_isolate_pages(SWAP_CLUSTER_MAX,
+ &page_list, &nr_scan,
+ sc->order, mode,
+ zone, sc->mem_cgroup,
+ 0, file);
+ /*
+ * mem_cgroup_isolate_pages() keeps track of
+ * scanned pages on its own.
+ */
}
if (nr_taken == 0)
* but that should be acceptable to the caller
*/
if (nr_freed < nr_taken && !current_is_kswapd() &&
- lumpy_reclaim) {
+ sc->lumpy_reclaim_mode) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
/*
lru_add_drain();
spin_lock_irq(&zone->lru_lock);
- nr_taken = sc->isolate_pages(nr_pages, &l_hold, &pgscanned, sc->order,
- ISOLATE_ACTIVE, zone,
- sc->mem_cgroup, 1, file);
- /*
- * zone->pages_scanned is used for detect zone's oom
- * mem_cgroup remembers nr_scan by itself.
- */
if (scanning_global_lru(sc)) {
+ nr_taken = isolate_pages_global(nr_pages, &l_hold,
+ &pgscanned, sc->order,
+ ISOLATE_ACTIVE, zone,
+ 1, file);
zone->pages_scanned += pgscanned;
+ } else {
+ nr_taken = mem_cgroup_isolate_pages(nr_pages, &l_hold,
+ &pgscanned, sc->order,
+ ISOLATE_ACTIVE, zone,
+ sc->mem_cgroup, 1, file);
+ /*
+ * mem_cgroup_isolate_pages() keeps track of
+ * scanned pages on its own.
+ */
}
+
reclaim_stat->recent_scanned[file] += nr_taken;
__count_zone_vm_events(PGREFILL, zone, pgscanned);
return shrink_inactive_list(nr_to_scan, zone, sc, priority, file);
}
+/*
+ * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
+ * until we collected @swap_cluster_max pages to scan.
+ */
+static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
+ unsigned long *nr_saved_scan)
+{
+ unsigned long nr;
+
+ *nr_saved_scan += nr_to_scan;
+ nr = *nr_saved_scan;
+
+ if (nr >= SWAP_CLUSTER_MAX)
+ *nr_saved_scan = 0;
+ else
+ nr = 0;
+
+ return nr;
+}
+
/*
* Determine how aggressively the anon and file LRU lists should be
* scanned. The relative value of each set of LRU lists is determined
* by looking at the fraction of the pages scanned we did rotate back
* onto the active list instead of evict.
*
- * percent[0] specifies how much pressure to put on ram/swap backed
- * memory, while percent[1] determines pressure on the file LRUs.
+ * nr[0] = anon pages to scan; nr[1] = file pages to scan
*/
-static void get_scan_ratio(struct zone *zone, struct scan_control *sc,
- unsigned long *percent)
+static void get_scan_count(struct zone *zone, struct scan_control *sc,
+ unsigned long *nr, int priority)
{
unsigned long anon, file, free;
unsigned long anon_prio, file_prio;
unsigned long ap, fp;
struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+ u64 fraction[2], denominator;
+ enum lru_list l;
+ int noswap = 0;
+
+ /* If we have no swap space, do not bother scanning anon pages. */
+ if (!sc->may_swap || (nr_swap_pages <= 0)) {
+ noswap = 1;
+ fraction[0] = 0;
+ fraction[1] = 1;
+ denominator = 1;
+ goto out;
+ }
anon = zone_nr_lru_pages(zone, sc, LRU_ACTIVE_ANON) +
zone_nr_lru_pages(zone, sc, LRU_INACTIVE_ANON);
/* If we have very few page cache pages,
force-scan anon pages. */
if (unlikely(file + free <= high_wmark_pages(zone))) {
- percent[0] = 100;
- percent[1] = 0;
- return;
+ fraction[0] = 1;
+ fraction[1] = 0;
+ denominator = 1;
+ goto out;
}
}
fp = (file_prio + 1) * (reclaim_stat->recent_scanned[1] + 1);
fp /= reclaim_stat->recent_rotated[1] + 1;
- /* Normalize to percentages */
- percent[0] = 100 * ap / (ap + fp + 1);
- percent[1] = 100 - percent[0];
+ fraction[0] = ap;
+ fraction[1] = fp;
+ denominator = ap + fp + 1;
+out:
+ for_each_evictable_lru(l) {
+ int file = is_file_lru(l);
+ unsigned long scan;
+
+ scan = zone_nr_lru_pages(zone, sc, l);
+ if (priority || noswap) {
+ scan >>= priority;
+ scan = div64_u64(scan * fraction[file], denominator);
+ }
+ nr[l] = nr_scan_try_batch(scan,
+ &reclaim_stat->nr_saved_scan[l]);
+ }
}
-/*
- * Smallish @nr_to_scan's are deposited in @nr_saved_scan,
- * until we collected @swap_cluster_max pages to scan.
- */
-static unsigned long nr_scan_try_batch(unsigned long nr_to_scan,
- unsigned long *nr_saved_scan)
+static void set_lumpy_reclaim_mode(int priority, struct scan_control *sc)
{
- unsigned long nr;
-
- *nr_saved_scan += nr_to_scan;
- nr = *nr_saved_scan;
-
- if (nr >= SWAP_CLUSTER_MAX)
- *nr_saved_scan = 0;
+ /*
+ * If we need a large contiguous chunk of memory, or have
+ * trouble getting a small set of contiguous pages, we
+ * will reclaim both active and inactive pages.
+ */
+ if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
+ sc->lumpy_reclaim_mode = 1;
+ else if (sc->order && priority < DEF_PRIORITY - 2)
+ sc->lumpy_reclaim_mode = 1;
else
- nr = 0;
-
- return nr;
+ sc->lumpy_reclaim_mode = 0;
}
/*
{
unsigned long nr[NR_LRU_LISTS];
unsigned long nr_to_scan;
- unsigned long percent[2]; /* anon @ 0; file @ 1 */
enum lru_list l;
unsigned long nr_reclaimed = sc->nr_reclaimed;
unsigned long nr_to_reclaim = sc->nr_to_reclaim;
- struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
- int noswap = 0;
-
- /* If we have no swap space, do not bother scanning anon pages. */
- if (!sc->may_swap || (nr_swap_pages <= 0)) {
- noswap = 1;
- percent[0] = 0;
- percent[1] = 100;
- } else
- get_scan_ratio(zone, sc, percent);
- for_each_evictable_lru(l) {
- int file = is_file_lru(l);
- unsigned long scan;
+ get_scan_count(zone, sc, nr, priority);
- scan = zone_nr_lru_pages(zone, sc, l);
- if (priority || noswap) {
- scan >>= priority;
- scan = (scan * percent[file]) / 100;
- }
- nr[l] = nr_scan_try_batch(scan,
- &reclaim_stat->nr_saved_scan[l]);
- }
+ set_lumpy_reclaim_mode(priority, sc);
while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
* If a zone is deemed to be full of pinned pages then just give it a light
* scan then give up on it.
*/
-static void shrink_zones(int priority, struct zonelist *zonelist,
+static int shrink_zones(int priority, struct zonelist *zonelist,
struct scan_control *sc)
{
enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
struct zoneref *z;
struct zone *zone;
+ int progress = 0;
- sc->all_unreclaimable = 1;
for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
sc->nodemask) {
if (!populated_zone(zone))
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
- sc->all_unreclaimable = 0;
} else {
/*
* Ignore cpuset limitation here. We just want to reduce
* # of used pages by us regardless of memory shortage.
*/
- sc->all_unreclaimable = 0;
mem_cgroup_note_reclaim_priority(sc->mem_cgroup,
priority);
}
shrink_zone(priority, zone, sc);
+ progress = 1;
}
+ return progress;
}
/*
enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask);
unsigned long writeback_threshold;
+ get_mems_allowed();
delayacct_freepages_start();
if (scanning_global_lru(sc))
sc->nr_scanned = 0;
if (!priority)
disable_swap_token();
- shrink_zones(priority, zonelist, sc);
+ ret = shrink_zones(priority, zonelist, sc);
/*
* Don't shrink slabs when reclaiming memory from
* over limit cgroups
congestion_wait(BLK_RW_ASYNC, HZ/10);
}
/* top priority shrink_zones still had more to do? don't OOM, then */
- if (!sc->all_unreclaimable && scanning_global_lru(sc))
+ if (ret && scanning_global_lru(sc))
ret = sc->nr_reclaimed;
out:
/*
mem_cgroup_record_reclaim_priority(sc->mem_cgroup, priority);
delayacct_freepages_end();
+ put_mems_allowed();
return ret;
}
.swappiness = vm_swappiness,
.order = order,
.mem_cgroup = NULL,
- .isolate_pages = isolate_pages_global,
.nodemask = nodemask,
};
.swappiness = swappiness,
.order = 0,
.mem_cgroup = mem,
- .isolate_pages = mem_cgroup_isolate_pages,
};
nodemask_t nm = nodemask_of_node(nid);
.swappiness = swappiness,
.order = 0,
.mem_cgroup = mem_cont,
- .isolate_pages = mem_cgroup_isolate_pages,
.nodemask = NULL, /* we don't care the placement */
};
.swappiness = vm_swappiness,
.order = order,
.mem_cgroup = NULL,
- .isolate_pages = isolate_pages_global,
};
/*
* temp_priority is used to remember the scanning priority at which
.hibernation_mode = 1,
.swappiness = vm_swappiness,
.order = 0,
- .isolate_pages = isolate_pages_global,
};
struct zonelist * zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
.gfp_mask = gfp_mask,
.swappiness = vm_swappiness,
.order = order,
- .isolate_pages = isolate_pages_global,
};
unsigned long slab_reclaimable;
#include <linux/cpu.h>
#include <linux/vmstat.h>
#include <linux/sched.h>
+#include <linux/math64.h>
#ifdef CONFIG_VM_EVENT_COUNTERS
DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
}
#endif
-#ifdef CONFIG_PROC_FS
+#ifdef CONFIG_COMPACTION
+struct contig_page_info {
+ unsigned long free_pages;
+ unsigned long free_blocks_total;
+ unsigned long free_blocks_suitable;
+};
+
+/*
+ * Calculate the number of free pages in a zone, how many contiguous
+ * pages are free and how many are large enough to satisfy an allocation of
+ * the target size. Note that this function makes no attempt to estimate
+ * how many suitable free blocks there *might* be if MOVABLE pages were
+ * migrated. Calculating that is possible, but expensive and can be
+ * figured out from userspace
+ */
+static void fill_contig_page_info(struct zone *zone,
+ unsigned int suitable_order,
+ struct contig_page_info *info)
+{
+ unsigned int order;
+
+ info->free_pages = 0;
+ info->free_blocks_total = 0;
+ info->free_blocks_suitable = 0;
+
+ for (order = 0; order < MAX_ORDER; order++) {
+ unsigned long blocks;
+
+ /* Count number of free blocks */
+ blocks = zone->free_area[order].nr_free;
+ info->free_blocks_total += blocks;
+
+ /* Count free base pages */
+ info->free_pages += blocks << order;
+
+ /* Count the suitable free blocks */
+ if (order >= suitable_order)
+ info->free_blocks_suitable += blocks <<
+ (order - suitable_order);
+ }
+}
+
+/*
+ * A fragmentation index only makes sense if an allocation of a requested
+ * size would fail. If that is true, the fragmentation index indicates
+ * whether external fragmentation or a lack of memory was the problem.
+ * The value can be used to determine if page reclaim or compaction
+ * should be used
+ */
+static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
+{
+ unsigned long requested = 1UL << order;
+
+ if (!info->free_blocks_total)
+ return 0;
+
+ /* Fragmentation index only makes sense when a request would fail */
+ if (info->free_blocks_suitable)
+ return -1000;
+
+ /*
+ * Index is between 0 and 1 so return within 3 decimal places
+ *
+ * 0 => allocation would fail due to lack of memory
+ * 1 => allocation would fail due to fragmentation
+ */
+ return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
+}
+
+/* Same as __fragmentation index but allocs contig_page_info on stack */
+int fragmentation_index(struct zone *zone, unsigned int order)
+{
+ struct contig_page_info info;
+
+ fill_contig_page_info(zone, order, &info);
+ return __fragmentation_index(order, &info);
+}
+#endif
+
+#if defined(CONFIG_PROC_FS) || defined(CONFIG_COMPACTION)
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
spin_unlock_irqrestore(&zone->lock, flags);
}
}
+#endif
+#ifdef CONFIG_PROC_FS
static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
struct zone *zone)
{
"allocstall",
"pgrotated",
+
+#ifdef CONFIG_COMPACTION
+ "compact_blocks_moved",
+ "compact_pages_moved",
+ "compact_pagemigrate_failed",
+ "compact_stall",
+ "compact_fail",
+ "compact_success",
+#endif
+
#ifdef CONFIG_HUGETLB_PAGE
"htlb_buddy_alloc_success",
"htlb_buddy_alloc_fail",
return 0;
}
module_init(setup_vmstat)
+
+#if defined(CONFIG_DEBUG_FS) && defined(CONFIG_COMPACTION)
+#include <linux/debugfs.h>
+
+static struct dentry *extfrag_debug_root;
+
+/*
+ * Return an index indicating how much of the available free memory is
+ * unusable for an allocation of the requested size.
+ */
+static int unusable_free_index(unsigned int order,
+ struct contig_page_info *info)
+{
+ /* No free memory is interpreted as all free memory is unusable */
+ if (info->free_pages == 0)
+ return 1000;
+
+ /*
+ * Index should be a value between 0 and 1. Return a value to 3
+ * decimal places.
+ *
+ * 0 => no fragmentation
+ * 1 => high fragmentation
+ */
+ return div_u64((info->free_pages - (info->free_blocks_suitable << order)) * 1000ULL, info->free_pages);
+
+}
+
+static void unusable_show_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ unsigned int order;
+ int index;
+ struct contig_page_info info;
+
+ seq_printf(m, "Node %d, zone %8s ",
+ pgdat->node_id,
+ zone->name);
+ for (order = 0; order < MAX_ORDER; ++order) {
+ fill_contig_page_info(zone, order, &info);
+ index = unusable_free_index(order, &info);
+ seq_printf(m, "%d.%03d ", index / 1000, index % 1000);
+ }
+
+ seq_putc(m, '\n');
+}
+
+/*
+ * Display unusable free space index
+ *
+ * The unusable free space index measures how much of the available free
+ * memory cannot be used to satisfy an allocation of a given size and is a
+ * value between 0 and 1. The higher the value, the more of free memory is
+ * unusable and by implication, the worse the external fragmentation is. This
+ * can be expressed as a percentage by multiplying by 100.
+ */
+static int unusable_show(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ /* check memoryless node */
+ if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ return 0;
+
+ walk_zones_in_node(m, pgdat, unusable_show_print);
+
+ return 0;
+}
+
+static const struct seq_operations unusable_op = {
+ .start = frag_start,
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = unusable_show,
+};
+
+static int unusable_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &unusable_op);
+}
+
+static const struct file_operations unusable_file_ops = {
+ .open = unusable_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static void extfrag_show_print(struct seq_file *m,
+ pg_data_t *pgdat, struct zone *zone)
+{
+ unsigned int order;
+ int index;
+
+ /* Alloc on stack as interrupts are disabled for zone walk */
+ struct contig_page_info info;
+
+ seq_printf(m, "Node %d, zone %8s ",
+ pgdat->node_id,
+ zone->name);
+ for (order = 0; order < MAX_ORDER; ++order) {
+ fill_contig_page_info(zone, order, &info);
+ index = __fragmentation_index(order, &info);
+ seq_printf(m, "%d.%03d ", index / 1000, index % 1000);
+ }
+
+ seq_putc(m, '\n');
+}
+
+/*
+ * Display fragmentation index for orders that allocations would fail for
+ */
+static int extfrag_show(struct seq_file *m, void *arg)
+{
+ pg_data_t *pgdat = (pg_data_t *)arg;
+
+ walk_zones_in_node(m, pgdat, extfrag_show_print);
+
+ return 0;
+}
+
+static const struct seq_operations extfrag_op = {
+ .start = frag_start,
+ .next = frag_next,
+ .stop = frag_stop,
+ .show = extfrag_show,
+};
+
+static int extfrag_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &extfrag_op);
+}
+
+static const struct file_operations extfrag_file_ops = {
+ .open = extfrag_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static int __init extfrag_debug_init(void)
+{
+ extfrag_debug_root = debugfs_create_dir("extfrag", NULL);
+ if (!extfrag_debug_root)
+ return -ENOMEM;
+
+ if (!debugfs_create_file("unusable_index", 0444,
+ extfrag_debug_root, NULL, &unusable_file_ops))
+ return -ENOMEM;
+
+ if (!debugfs_create_file("extfrag_index", 0444,
+ extfrag_debug_root, NULL, &extfrag_file_ops))
+ return -ENOMEM;
+
+ return 0;
+}
+
+module_init(extfrag_debug_init);
+#endif
const char *sptr = va_arg(ap, const char *);
int16_t len = 0;
if (sptr)
- len = MIN(strlen(sptr), USHORT_MAX);
+ len = MIN(strlen(sptr), USHRT_MAX);
errcode = p9pdu_writef(pdu, proto_version,
"w", len);
{
if (likely(ndp <= 0xFF))
return 1;
- return likely(ndp <= USHORT_MAX) ? 2 : (ndp <= UINT_MAX ? 4 : 6);
+ return likely(ndp <= USHRT_MAX) ? 2 : (ndp <= UINT_MAX ? 4 : 6);
}
int dccp_insert_option(struct sock *sk, struct sk_buff *skb,
return -ENOPROTOOPT;
if (val != 0 && val < 8) /* Illegal coverage: use default (8) */
val = 8;
- else if (val > USHORT_MAX)
- val = USHORT_MAX;
+ else if (val > USHRT_MAX)
+ val = USHRT_MAX;
up->pcslen = val;
up->pcflag |= UDPLITE_SEND_CC;
break;
return -ENOPROTOOPT;
if (val != 0 && val < 8) /* Avoid silly minimal values. */
val = 8;
- else if (val > USHORT_MAX)
- val = USHORT_MAX;
+ else if (val > USHRT_MAX)
+ val = USHRT_MAX;
up->pcrlen = val;
up->pcflag |= UDPLITE_RECV_CC;
break;
skb_queue_head_init(&sta->tx_filtered);
for (i = 0; i < NUM_RX_DATA_QUEUES; i++)
- sta->last_seq_ctrl[i] = cpu_to_le16(USHORT_MAX);
+ sta->last_seq_ctrl[i] = cpu_to_le16(USHRT_MAX);
#ifdef CONFIG_MAC80211_VERBOSE_DEBUG
printk(KERN_DEBUG "%s: Allocated STA %pM\n",
port = ntohl(*p);
dprintk("RPC: %5u PMAP_%s result: %lu\n", task->tk_pid,
task->tk_msg.rpc_proc->p_name, port);
- if (unlikely(port > USHORT_MAX))
+ if (unlikely(port > USHRT_MAX))
return -EIO;
rpcb->r_port = port;
int xprt_load_transport(const char *transport_name)
{
struct xprt_class *t;
- char module_name[sizeof t->name + 5];
int result;
result = 0;
}
}
spin_unlock(&xprt_list_lock);
- strcpy(module_name, "xprt");
- strncat(module_name, transport_name, sizeof t->name);
- result = request_module(module_name);
+ result = request_module("xprt%s", transport_name);
out:
return result;
}
ERROR("trailing whitespace\n" . $herevet);
}
+# check for Kconfig help text having a real description
+ if ($realfile =~ /Kconfig/ &&
+ $line =~ /\+?\s*(---)?help(---)?$/) {
+ my $length = 0;
+ for (my $l = $linenr; defined($lines[$l]); $l++) {
+ my $f = $lines[$l];
+ $f =~ s/#.*//;
+ $f =~ s/^\s+//;
+ next if ($f =~ /^$/);
+ last if ($f =~ /^\s*config\s/);
+ $length++;
+ }
+ WARN("please write a paragraph that describes the config symbol fully\n" . $herecurr) if ($length < 4);
+ }
+
# check we are in a valid source file if not then ignore this hunk
next if ($realfile !~ /\.(h|c|s|S|pl|sh)$/);
CHK("architecture specific defines should be avoided\n" . $herecurr);
}
+# Check that the storage class is at the beginning of a declaration
+ if ($line =~ /\b$Storage\b/ && $line !~ /^.\s*$Storage\b/) {
+ WARN("storage class should be at the beginning of the declaration\n" . $herecurr)
+ }
+
# check the location of the inline attribute, that it is between
# storage class and type.
if ($line =~ /\b$Type\s+$Inline\b/ ||
use strict;
my $P = $0;
-my $V = '0.23';
+my $V = '0.24';
use Getopt::Long qw(:config no_auto_abbrev);
my $email_subscriber_list = 0;
my $email_git_penguin_chiefs = 0;
my $email_git = 1;
+my $email_git_all_signature_types = 0;
my $email_git_blame = 0;
my $email_git_min_signatures = 1;
my $email_git_max_maintainers = 5;
my $exit = 0;
my @penguin_chief = ();
-push(@penguin_chief,"Linus Torvalds:torvalds\@linux-foundation.org");
+push(@penguin_chief, "Linus Torvalds:torvalds\@linux-foundation.org");
#Andrew wants in on most everything - 2009/01/14
-#push(@penguin_chief,"Andrew Morton:akpm\@linux-foundation.org");
+#push(@penguin_chief, "Andrew Morton:akpm\@linux-foundation.org");
my @penguin_chief_names = ();
foreach my $chief (@penguin_chief) {
push(@penguin_chief_names, $chief_name);
}
}
-my $penguin_chiefs = "\(" . join("|",@penguin_chief_names) . "\)";
+my $penguin_chiefs = "\(" . join("|", @penguin_chief_names) . "\)";
+
+# Signature types of people who are either
+# a) responsible for the code in question, or
+# b) familiar enough with it to give relevant feedback
+my @signature_tags = ();
+push(@signature_tags, "Signed-off-by:");
+push(@signature_tags, "Reviewed-by:");
+push(@signature_tags, "Acked-by:");
+my $signaturePattern = "\(" . join("|", @signature_tags) . "\)";
# rfc822 email address - preloaded methods go here.
my $rfc822_lwsp = "(?:(?:\\r\\n)?[ \\t])";
"blame_commit_pattern" => "^([0-9a-f]+):"
);
+if (-f "${lk_path}.get_maintainer.conf") {
+ my @conf_args;
+ open(my $conffile, '<', "${lk_path}.get_maintainer.conf")
+ or warn "$P: Can't open .get_maintainer.conf: $!\n";
+ while (<$conffile>) {
+ my $line = $_;
+
+ $line =~ s/\s*\n?$//g;
+ $line =~ s/^\s*//g;
+ $line =~ s/\s+/ /g;
+
+ next if ($line =~ m/^\s*#/);
+ next if ($line =~ m/^\s*$/);
+
+ my @words = split(" ", $line);
+ foreach my $word (@words) {
+ last if ($word =~ m/^#/);
+ push (@conf_args, $word);
+ }
+ }
+ close($conffile);
+ unshift(@ARGV, @conf_args) if @conf_args;
+}
+
if (!GetOptions(
'email!' => \$email,
'git!' => \$email_git,
+ 'git-all-signature-types!' => \$email_git_all_signature_types,
'git-blame!' => \$email_git_blame,
'git-chief-penguins!' => \$email_git_penguin_chiefs,
'git-min-signatures=i' => \$email_git_min_signatures,
. "a linux kernel source tree.\n";
}
+if ($email_git_all_signature_types) {
+ $signaturePattern = "(.+?)[Bb][Yy]:";
+}
+
## Read MAINTAINERS for type/value pairs
my @typevalue = ();
MAINTAINER field selection options:
--email => print email address(es) if any
--git => include recent git \*-by: signers
+ --git-all-signature-types => include signers regardless of signature type
+ or use only ${signaturePattern} signers (default: $email_git_all_signature_types)
--git-chief-penguins => include ${penguin_chiefs}
- --git-min-signatures => number of signatures required (default: 1)
- --git-max-maintainers => maximum maintainers to add (default: 5)
- --git-min-percent => minimum percentage of commits required (default: 5)
+ --git-min-signatures => number of signatures required (default: $email_git_min_signatures)
+ --git-max-maintainers => maximum maintainers to add (default: $email_git_max_maintainers)
+ --git-min-percent => minimum percentage of commits required (default: $email_git_min_percent)
--git-blame => use git blame to find modified commits for patch or file
- --git-since => git history to use (default: 1-year-ago)
- --hg-since => hg history to use (default: -365)
+ --git-since => git history to use (default: $email_git_since)
+ --hg-since => hg history to use (default: $email_hg_since)
--m => include maintainer(s) if any
--n => include name 'Full Name <addr\@domain.tld>'
--l => include list(s) if any
--git-min-signatures, --git-max-maintainers, --git-min-percent, and
--git-blame
Use --hg-since not --git-since to control date selection
+ File ".get_maintainer.conf", if it exists in the linux kernel source root
+ directory, can change whatever get_maintainer defaults are desired.
+ Entries in this file can be any command line argument.
+ This file is prepended to any additional command line arguments.
+ Multiple lines and # comments are allowed.
EOT
}
$commits = grep(/$pattern/, @lines); # of commits
- @lines = grep(/^[-_ a-z]+by:.*\@.*$/i, @lines);
+ @lines = grep(/^[ \t]*${signaturePattern}.*\@.*$/, @lines);
if (!$email_git_penguin_chiefs) {
@lines = grep(!/${penguin_chiefs}/i, @lines);
}
struct keyring_list *klist =
container_of(rcu, struct keyring_list, rcu);
- if (klist->delkey != USHORT_MAX)
+ if (klist->delkey != USHRT_MAX)
key_put(klist->keys[klist->delkey]);
kfree(klist);
}
max += klist->maxkeys;
ret = -ENFILE;
- if (max > USHORT_MAX - 1)
+ if (max > USHRT_MAX - 1)
goto error_quota;
size = sizeof(*klist) + sizeof(struct key *) * max;
if (size > PAGE_SIZE)
sizeof(struct key *) * klist->nkeys);
nklist->delkey = klist->nkeys;
nklist->nkeys = klist->nkeys + 1;
- klist->delkey = USHORT_MAX;
+ klist->delkey = USHRT_MAX;
} else {
nklist->nkeys = 1;
nklist->delkey = 0;