pandora-kernel.git
16 years agoCRIS v10: Change boot/rescue/Makefile to use ccflags-y, asflags-y and ldflags-y.
Jesper Nilsson [Thu, 17 Jan 2008 10:22:41 +0000 (11:22 +0100)]
CRIS v10: Change boot/rescue/Makefile to use ccflags-y, asflags-y and ldflags-y.

Replace EXTRA_CFLAGS with ccflags-y.
Change ASFLAGS and LDFLAGS into asflags-y and ldflags-y, we only need
these flags in this makefile.

16 years agoCRIS v10: Update boot/compressed/Makefile to use ccflags-y and ldflags-y
Jesper Nilsson [Thu, 17 Jan 2008 10:13:21 +0000 (11:13 +0100)]
CRIS v10: Update boot/compressed/Makefile to use ccflags-y and ldflags-y

Replace use of EXTRA_CFLAGS with ccflags-y and LDFLAGS with ldflags-y,
(we only need to change linker flags for this makefile)

16 years agoCRIS: Add architecture dependent bug.h for CRIS v10 and CRIS v32
Jesper Nilsson [Thu, 17 Jan 2008 09:42:58 +0000 (10:42 +0100)]
CRIS: Add architecture dependent bug.h for CRIS v10 and CRIS v32

16 years agoCRIS v32: Update and improve kernel/time.c
Jesper Nilsson [Tue, 4 Dec 2007 16:25:45 +0000 (17:25 +0100)]
CRIS v32: Update and improve kernel/time.c

- Shorten include paths to machine dependent header files.
- Register name for first timer is now regi_timer0.
- Remove raw_printk hack, use oops_in_progress instead.
- Add handling of CPU frequency scaling for CRIS.
- Remove regs parameter to timer_interrupt, get them from get_irq_regs instead.
- Whitespace and formatting changes.

16 years agoCRIS v10: New default config.
Jesper Nilsson [Mon, 3 Dec 2007 10:37:14 +0000 (11:37 +0100)]
CRIS v10: New default config.

16 years agoCRIS v32: Minor fixes for io.h
Jesper Nilsson [Mon, 3 Dec 2007 10:16:25 +0000 (11:16 +0100)]
CRIS v32: Minor fixes for io.h

- Shorten include paths for machine dependent header files.
- Add volatile to hardeware register pointers.
- Add spinlocks around critical region.
- Expand macros for handling of leds.

16 years agoCRIS v32: Update and improve kernel/traps.c
Jesper Nilsson [Mon, 3 Dec 2007 10:12:10 +0000 (11:12 +0100)]
CRIS v32: Update and improve kernel/traps.c

- Remove watchdog handling, handled elsewhere.
- Shorten include paths to machine dependent header files.
- Remove raw_printk hack, we now use oops_in_progress instead.
- Add handling of BUG for exception handlers (break 14).
- Formatting and whitespace changes.

16 years agoCRIS v32: Minor updates to kernel/process.c
Jesper Nilsson [Mon, 3 Dec 2007 09:54:15 +0000 (10:54 +0100)]
CRIS v32: Minor updates to kernel/process.c

- Shorten include paths for machine dependent header files.
- Remove unused extern declaration of etrax_gpio_wake_up_check.
- Register name for first timer is now regi_timer0.

16 years agoCRIS v32: Update and simplify kernel/irq.c.
Jesper Nilsson [Fri, 30 Nov 2007 17:09:54 +0000 (18:09 +0100)]
CRIS v32: Update and simplify kernel/irq.c.

- First timer register has changed name to timer0.
- Build IRQs with only IRQ number, mask bit will be calculated instead.
- Add more IRQs, up to 64 supported.
- Use arrays to hold which IRQs triggered instead of trying to do magic
  with two 32 bit values now that more than 32 IRQs are supported.

16 years agoCRIS v32: Update kernel/head.S
Jesper Nilsson [Fri, 30 Nov 2007 16:54:12 +0000 (17:54 +0100)]
CRIS v32: Update kernel/head.S

- Shorten include paths for machine specific header files.
- Add magic for booting NAND flash.
- Change CONFIG_ETRAXFS_SIM to CONFIG_ETRAX_VCS_SIM.
- Use assembler macros for initializing hardware (clocks)
- Add stubs for SMP slave CPUs.
- Search for cramfs or jffs2 if no romfs found.
- Initialize l2cache.

16 years agoCRIS v32: Update and improve fasttimer.c
Jesper Nilsson [Fri, 30 Nov 2007 16:46:11 +0000 (17:46 +0100)]
CRIS v32: Update and improve fasttimer.c

- Change include path to machine dependent header files.
- Remove __INLINE__, it expands to inline anyway.
- Don't initialize static variables.
- Change timers to use fasttimer_t instead of timevals.
- Change name of timeval_cmp to fasttime_cmp to highlight this.
- Register name for first timer is regi_timer0, not regi_timer.
- Whitespace and formatting changes.
- Don't return if we're blocking interrupts, goto done and restore interrupts.
- Disable interrupts while walking the fasttimer list, only restore
  while doing the callback.
- Remove #ifdef DECLARE_WAITQUEUE, this code won't be used in another OS.
- Remove CVS log.

16 years agoCRIS v32: Include path fix for timex.h
Jesper Nilsson [Fri, 30 Nov 2007 16:28:05 +0000 (17:28 +0100)]
CRIS v32: Include path fix for timex.h

- Shorten include path for machine dependent header files.
- Correct some formatting issues.

16 years agoCRIS v32: Update debugport.
Jesper Nilsson [Fri, 30 Nov 2007 16:26:23 +0000 (17:26 +0100)]
CRIS v32: Update debugport.

- Shorten include paths to machine dependent headers.
- Add support for fifth serial port.
- Remove CONFIG_ETRAXFS_SIM and CONFIG_ETRAX_DEBUG_PORT_NULL, no longer used.
- Remove raw_printk and stupid_debug hack, no longer needed.
- Remove dummy console stuff, no longer needed.
- Correct some register type names.
- Correct some whitespace errors and formatting.

16 years agoCRIS v32: Update boot/rescue/head.S code.
Jesper Nilsson [Fri, 30 Nov 2007 16:20:00 +0000 (17:20 +0100)]
CRIS v32: Update boot/rescue/head.S code.

- Add ifdef for ETRAX_AXISFLASHMAP to avoid compiling file unless it is set.
- Use assembler macros for setting up clocks.
- Don't copy image, just jump to it (only works for NOR flash)

16 years agoCRIS v32: Update boot/compressed/misc.c
Jesper Nilsson [Fri, 30 Nov 2007 16:16:09 +0000 (17:16 +0100)]
CRIS v32: Update boot/compressed/misc.c

- Shorten include paths to machine specific headers.
- Remove fill_inbuf, not defined here.
- Return __dest as value from memcpy.
- Enable serial port hardware transmitter and receiver in serial_setup.
- Correct baudrate divisor calculation, changed from 4800 to 115200.
- Add support for Artpec-3 specific serial port setup.
- Initialize pinmux for the correct serial port.

16 years agoCRIS v32: Update compressed head.S
Jesper Nilsson [Fri, 30 Nov 2007 15:40:26 +0000 (16:40 +0100)]
CRIS v32: Update compressed head.S

- Fixes for NAND and NOR flash booting.
- Use assembler macros for common tasks (clocks, general io etc)
- Use (EtraxFS or Artpec-3) machine specific include for dram and hardware init.

16 years agoCRIS v32: Remove common gpio and nandflash, add mach-fs and mach-a3 as subdirs.
Jesper Nilsson [Fri, 30 Nov 2007 15:30:58 +0000 (16:30 +0100)]
CRIS v32: Remove common gpio and nandflash, add mach-fs and mach-a3 as subdirs.

Also add board_mmcspi to build if ETRAX_SPI_MMC_BOARD is set.
(Generic MMC SPI implementation)

16 years agoCRIS v32: Update boot rescue Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:28:26 +0000 (16:28 +0100)]
CRIS v32: Update boot rescue Kbuild makefile.

- Remove old specific targets, use more generic ones instead.
- Use if_changed to avoid creating new images when no change.
- Use EXTRA_CFLAGS instead of CFLAGS.

16 years agoCRIS v32: Update boot compressed Kbuild makefile.
Jesper Nilsson [Wed, 30 Jan 2008 11:52:51 +0000 (12:52 +0100)]
CRIS v32: Update boot compressed Kbuild makefile.

- Remove old specific targets, use more generic ones instead.
- Use if_changed to avoid creating new images when no change.
- Use KBUILD_CFLAGS instead of CFLAGS.

16 years agoCRIS v32: Update boot Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:24:07 +0000 (16:24 +0100)]
CRIS v32: Update boot Kbuild makefile.

- Remove old specific targets, use more generic ones instead.
- Use if_changed to avoid creating new images when no change.

16 years agoCRIS v32: Update traps.c
Jesper Nilsson [Fri, 30 Nov 2007 15:22:50 +0000 (16:22 +0100)]
CRIS v32: Update traps.c

- Remove raw_prink hack, use oops_in_progress instead.
- When ETRAX_WATCHDOG_NICE_DOGGY is set, loop in trap after oops dump
  instead of rebooting.
- Break long lines to less than 80 chars.
- Fix whitespace errors.
- Remove unnecessary comments.

16 years agoCRIS v10: Update and improve axisflashmap.c
Jesper Nilsson [Fri, 30 Nov 2007 15:17:21 +0000 (16:17 +0100)]
CRIS v10: Update and improve axisflashmap.c

- Add config to use mtd0 as whole flash device.
- Fix whitespace errors.
- Remove braces around single statement ifs.
- Break long lines.
- Remove unnecessary CVS log.

16 years agoCRIS v10: Update rescue head.s
Jesper Nilsson [Fri, 30 Nov 2007 15:13:29 +0000 (16:13 +0100)]
CRIS v10: Update rescue head.s

- Correct whitespace problems.
- Add ifdef for ETRAX_AXISFLASHMAP to avoid compile error when not set.

16 years agoCRIS v10: Update rescue Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:11:38 +0000 (16:11 +0100)]
CRIS v10: Update rescue Kbuild makefile.

- Remove old specific targets, use more generic ones instead.
- Use if_changed to avoid creating new images when no change.
  Removes a lot of cruft.
- Use EXTRA_CFLAGS instead of CFLAGS.

16 years agoCRIS v10: Update boot/compressed Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:10:30 +0000 (16:10 +0100)]
CRIS v10: Update boot/compressed Kbuild makefile.

- Remove old specific targets, use more generic ones instead.
- Use if_changed to avoid creating new images when no change.
- Use EXTRA_CFLAGS instead of CFLAGS.

16 years agoCRIS v10: Update boot Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:08:34 +0000 (16:08 +0100)]
CRIS v10: Update boot Kbuild makefile.

- Remove old specific targets, use more generic ones instead.

16 years agoCRIS: Update main Kbuild makefile.
Jesper Nilsson [Fri, 30 Nov 2007 15:07:06 +0000 (16:07 +0100)]
CRIS: Update main Kbuild makefile.

- Remove old and non-generic targets, use generic ones instead.
- Add sub-arch as mach-fs or mach-a3 for EtraxFS and Artpec-3 respectively.
- Add links to sub-arch directories, and erase before trying to create them.
- Include from sub-arch specific include directory "mach".
- Add files to be cleaned in CLEAN_FILES instead of as archclean target.

16 years agoCRIS v32: Update and improve axisflashmap
Jesper Nilsson [Fri, 30 Nov 2007 15:01:53 +0000 (16:01 +0100)]
CRIS v32: Update and improve axisflashmap

- Use default partition table when no partition is found (for initial tests)
- Add config ETRAX_AXISFLASHMAP_MTD0WHOLE to allow whole flash as mtd0.
- Add config for VCS simulator connection.

16 years agoCRIS v32: New version of I2C driver.
Jesper Nilsson [Fri, 30 Nov 2007 14:54:01 +0000 (15:54 +0100)]
CRIS v32: New version of I2C driver.

- Add i2c_write and i2c_read as functions.
- Use spinlocks for critical regions.
- Add config item to set I2C data and clock port.
- Put unneeded testcode inside #if 0.
- Remove CVS id tag.

16 years agoCRIS v32: Fixup kernel Makefile.
Jesper Nilsson [Fri, 30 Nov 2007 14:47:34 +0000 (15:47 +0100)]
CRIS v32: Fixup kernel Makefile.

- Remove CRISv32 common arbiter, dma, io and pinmux files,
  they are now defined in machine dependent directories.
- Add cache and cacheflush files for working around cache problems
  in CRISv32 chips.

16 years agoCRIS v32: Update entry.S to working order.
Jesper Nilsson [Fri, 30 Nov 2007 14:44:07 +0000 (15:44 +0100)]
CRIS v32: Update entry.S to working order.

- Remove oldset parameter.
- Utilise delay-slot for parameter moving.
- Add kernel_execve as break 13.
- Add new kernel syscalls.

16 years agoCRIS: Remove define ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY
Jesper Nilsson [Fri, 30 Nov 2007 14:40:21 +0000 (15:40 +0100)]
CRIS: Remove define ARCH_HAS_DMA_DECLARE_COHERENT_MEMORY

16 years agoCRIS v32: Whitespace and formatting changes for kernel/ptrace.c
Jesper Nilsson [Fri, 30 Nov 2007 13:14:54 +0000 (14:14 +0100)]
CRIS v32: Whitespace and formatting changes for kernel/ptrace.c

16 years agoCRIS: Minor generic kernel/traps.c changes.
Jesper Nilsson [Fri, 30 Nov 2007 13:11:29 +0000 (14:11 +0100)]
CRIS: Minor generic kernel/traps.c changes.

- Collect extern declarations at top of file.
- Change raw_printk to printk, use oops_in_progress instead.
- Fix formatting and whitespace.
- Allow the watchdog to be disabled during oops.

16 years agoCRIS: Minor fixes to mm/fault.c
Jesper Nilsson [Fri, 30 Nov 2007 12:59:57 +0000 (13:59 +0100)]
CRIS: Minor fixes to mm/fault.c

- Only disallow oops if we're in_interrupt context (was in_atomic before)
- Use the generic oops_in_progress instead of the raw_printk hack.
- Fix whitespace/formatting.
- Remove CVS log entries.

16 years agoCRIS v32: Add headers for EtraxFS and Artpec-3 chips.
Jesper Nilsson [Fri, 30 Nov 2007 09:12:31 +0000 (10:12 +0100)]
CRIS v32: Add headers for EtraxFS and Artpec-3 chips.

16 years agoCRIS v32: Add prototypes for cache flushing
Jesper Nilsson [Fri, 30 Nov 2007 09:11:43 +0000 (10:11 +0100)]
CRIS v32: Add prototypes for cache flushing

We need these to work around some cache bugs in CRISv32 chips.

16 years agoCRIS: Remove unnecessary CVS log from cris/mm/init.c
Jesper Nilsson [Thu, 29 Nov 2007 17:19:42 +0000 (18:19 +0100)]
CRIS: Remove unnecessary CVS log from cris/mm/init.c

16 years agoCRIS v32: Update asm-cris/arch-v32/irq.h for ETRAX FS and ARTPEC-3
Jesper Nilsson [Tue, 15 Jan 2008 10:59:12 +0000 (11:59 +0100)]
CRIS v32: Update asm-cris/arch-v32/irq.h for ETRAX FS and ARTPEC-3

- Correct include to use <>
- Rework calculation of number of IRQs and exceptions we have.
- Remove useless "mask" argument to BUILD_IRQ macro

16 years agoCRIS: Merge axisflashmap.h with Axis internal changes.
Jesper Nilsson [Thu, 29 Nov 2007 16:58:06 +0000 (17:58 +0100)]
CRIS: Merge axisflashmap.h with Axis internal changes.

- Add partition table struct to be used to parse partition table in flash.
- Add JFFS2 as a type, and add readoly flag.
- Improve some comments.
- Lindent has been run, fixing whitespace and formatting issues.

16 years agoCRIS v32: Update synchronous serial driver.
Jesper Nilsson [Thu, 29 Nov 2007 16:30:24 +0000 (17:30 +0100)]
CRIS v32: Update synchronous serial driver.

Now uses a DMA descriptor ring, which should avoid any unnecessary
pauses in the streams.

16 years agoCRIS v32: Add SECOND_WORD_SYNC, used in sync_serial.
Jesper Nilsson [Thu, 29 Nov 2007 16:26:24 +0000 (17:26 +0100)]
CRIS v32: Add SECOND_WORD_SYNC, used in sync_serial.

16 years agoCRIS v32: Add L2 cache initialization code.
Jesper Nilsson [Thu, 29 Nov 2007 16:24:10 +0000 (17:24 +0100)]
CRIS v32: Add L2 cache initialization code.

16 years agoCRIS v32: Add hardware dependent include files and defconfigs for ETRAX FS and ARTPEC...
Jesper Nilsson [Thu, 29 Nov 2007 16:21:59 +0000 (17:21 +0100)]
CRIS v32: Add hardware dependent include files and defconfigs for ETRAX FS and ARTPEC-3 chips.

The header files describe the hardware registers available in both
these chips, note that most of this documentation is automatically
generated from the hardware implementation.

16 years agoCRIS v32: Add new machine dependent files for Etrax-FS and Artpec-3.
Jesper Nilsson [Thu, 29 Nov 2007 16:11:23 +0000 (17:11 +0100)]
CRIS v32: Add new machine dependent files for Etrax-FS and Artpec-3.

The two chips are somewhat different, and needs different handling.
Adds handing of the dma, dram initialization, hardware settings, io,
memory arbiter and pinmux

Also moves the dma, dram initialization and io from CRIS v32 common files.

16 years agoCRIS v32: Add new driver files for Etrax-FS
Jesper Nilsson [Thu, 29 Nov 2007 16:05:58 +0000 (17:05 +0100)]
CRIS v32: Add new driver files for Etrax-FS

Adds gpio and nandflash handling for Etrax-FS

16 years agoCRIS v32: Add new driver files for Artpec-3.
Jesper Nilsson [Thu, 29 Nov 2007 16:03:41 +0000 (17:03 +0100)]
CRIS v32: Add new driver files for Artpec-3.

Adds gpio and nandflash handling for Artpec-3.

16 years agoCRIS: Rearrange Kconfigs for v10 and v32 to allow compilation without warnings.
Jesper Nilsson [Wed, 5 Dec 2007 17:10:36 +0000 (18:10 +0100)]
CRIS: Rearrange Kconfigs for v10 and v32 to allow compilation without warnings.

- Remove some unneeded configs and add some new ones.
- Merge common config items to common file instead of duplicating them.
- Pull in standard Kconfig.preempt.
- Remove some unneeded Kconfigs for subsystems not (yet) available on CRIS
  (md, scsi, ieee1394, i2o, isdn, telephony, media, pcmcia, pci)
- Rename CRISv32 config items which had different types from CRISv10.
  (ETRAX_LED2G, ETRAX_LED2R, ETRAX_LED3G, ETRAX_LED3R, ETRAX_I2C_DATA_PORT,
  ETRAX_I2C_CLK_PORT)

16 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
Linus Torvalds [Fri, 8 Feb 2008 03:30:50 +0000 (19:30 -0800)]
Merge git://git./linux/kernel/git/agk/linux-2.6-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (44 commits)
  dm raid1: report fault status
  dm raid1: handle read failures
  dm raid1: fix EIO after log failure
  dm raid1: handle recovery failures
  dm raid1: handle write failures
  dm snapshot: combine consecutive exceptions in memory
  dm: stripe enhanced status return
  dm: stripe trigger event on failure
  dm log: auto load modules
  dm: move deferred bio flushing to workqueue
  dm crypt: use async crypto
  dm crypt: prepare async callback fn
  dm crypt: add completion for async
  dm crypt: add async request mempool
  dm crypt: extract scatterlist processing
  dm crypt: tidy io ref counting
  dm crypt: introduce crypt_write_io_loop
  dm crypt: abstract crypt_write_done
  dm crypt: store sector mapping in dm_crypt_io
  dm crypt: move queue functions
  ...

16 years agoMerge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6
Linus Torvalds [Fri, 8 Feb 2008 03:15:38 +0000 (19:15 -0800)]
Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6

* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (59 commits)
  hwmon: (lm80) Add individual alarm files
  hwmon: (lm80) De-macro the sysfs callbacks
  hwmon: (lm80) Various cleanups
  hwmon: (w83627hf) Refactor beep enable handling
  hwmon: (w83627hf) Add individual alarm and beep files
  hwmon: (w83627hf) Enable VBAT monitoring
  hwmon: (w83627ehf) The W83627DHG has 8 VID pins
  hwmon: (asb100) Add individual alarm files
  hwmon: (asb100) De-macro the sysfs callbacks
  hwmon: (asb100) Various cleanups
  hwmon: VRM is not written to registers
  hwmon: (dme1737) fix Super-IO device ID override
  hwmon: (dme1737) fix divide-by-0
  hwmon: (abituguru3) Add AUX4 fan input for Abit IP35 Pro
  hwmon: Add support for Texas Instruments/Burr-Brown ADS7828
  hwmon: (adm9240) Add individual alarm files
  hwmon: (lm77) Add individual alarm files
  hwmon: Discard useless I2C driver IDs
  hwmon: (lm85) Make the pwmN_enable files writable
  hwmon: (lm85) Return standard values in pwmN_enable
  ...

16 years agoMerge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
Linus Torvalds [Fri, 8 Feb 2008 03:12:12 +0000 (19:12 -0800)]
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6

* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (62 commits)
  [XFS] add __init/__exit mark to specific init/cleanup functions
  [XFS] Fix oops in xfs_file_readdir()
  [XFS] kill xfs_root
  [XFS] keep i_nlink updated and use proper accessors
  [XFS] stop updating inode->i_blocks
  [XFS] Make xfs_ail_check check less by default
  [XFS] Move AIL pushing into it's own thread
  [XFS] use generic_permission
  [XFS] stop re-checking permissions in xfs_swapext
  [XFS] clean up xfs_swapext
  [XFS] remove permission check from xfs_change_file_space
  [XFS] prevent panic during log recovery due to bogus op_hdr length
  [XFS] Cleanup various fid related bits:
  [XFS] Fix xfs_lowbit64
  [XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros.
  [XFS] kill superflous buffer locking (2nd attempt)
  [XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity
  [XFS] Remove the BPCSHIFT and NB* based macros from XFS.
  [XFS] Remove bogus assert
  [XFS] optimize XFS_IS_REALTIME_INODE w/o realtime config
  ...

16 years agoConvert SG from nopage to fault.
Nick Piggin [Fri, 8 Feb 2008 02:46:06 +0000 (18:46 -0800)]
Convert SG from nopage to fault.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16 years agoMerge branch 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm
Linus Torvalds [Fri, 8 Feb 2008 02:22:29 +0000 (18:22 -0800)]
Merge branch 'slub-linus' of git://git./linux/kernel/git/christoph/vm

* 'slub-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
  SLUB: fix checkpatch warnings
  Use non atomic unlock
  SLUB: Support for performance statistics
  SLUB: Alternate fast paths using cmpxchg_local
  SLUB: Use unique end pointer for each slab page.
  SLUB: Deal with annoying gcc warning on kfree()

16 years agodm raid1: report fault status
Jonathan Brassow [Fri, 8 Feb 2008 02:11:39 +0000 (02:11 +0000)]
dm raid1: report fault status

This patch adds extra information to the mirror status output, so that
it can be determined which device(s) have failed.  For each mirror device,
a character is printed indicating the most severe error encountered.  The
characters are:
 *    A => Alive - No failures
 *    D => Dead - A write failure occurred leaving mirror out-of-sync
 *    S => Sync - A sychronization failure occurred, mirror out-of-sync
 *    R => Read - A read failure occurred, mirror data unaffected
This allows userspace to properly reconfigure the mirror set.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm raid1: handle read failures
Jonathan Brassow [Fri, 8 Feb 2008 02:11:37 +0000 (02:11 +0000)]
dm raid1: handle read failures

This patch gives the ability to respond-to/record device failures
that happen during read operations.  It also adds the ability to
read from mirror devices that are not the primary if they are
in-sync.

There are essentially two read paths in mirroring; the direct path
and the queued path.  When a read request is mapped, if the region
is 'in-sync' the direct path is taken; otherwise the queued path
is taken.

If the direct path is taken, we must record bio information so that
if the read fails we can retry it.  We then discover the status of
a direct read through mirror_end_io.  If the read has failed, we will
mark the device from which the read was attempted as failed (so we
don't try to read from it again), restore the bio and try again.

If the queued path is taken, we discover the results of the read
from 'read_callback'.  If the device failed, we will mark the device
as failed and attempt the read again if there is another device
where this region is known to be 'in-sync'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm raid1: fix EIO after log failure
Jonathan Brassow [Fri, 8 Feb 2008 02:11:35 +0000 (02:11 +0000)]
dm raid1: fix EIO after log failure

This patch adds the ability to requeue write I/O to
core device-mapper when there is a log device failure.

If a write to the log produces and error, the pending writes are
put on the "failures" list.  Since the log is marked as failed,
they will stay on the failures list until a suspend happens.

Suspends come in two phases, presuspend and postsuspend.  We must
make sure that all the writes on the failures list are requeued
in the presuspend phase (a requirement of dm core).  This means
that recovery must be complete (because writes may be delayed
behind it) and the failures list must be requeued before we
return from presuspend.

The mechanisms to ensure recovery is complete (or stopped) was
already in place, but needed to be moved from postsuspend to
presuspend.  We rely on 'flush_workqueue' to ensure that the
mirror thread is complete and therefore, has requeued all writes
in the failures list.

Because we are using flush_workqueue, we must ensure that no
additional 'queue_work' calls will produce additional I/O
that we need to requeue (because once we return from
presuspend, we are unable to do anything about it).  'queue_work'
is called in response to the following functions:
- complete_resync_work = NA, recovery is stopped
- rh_dec (mirror_end_io) = NA, only calls 'queue_work' if it
                           is ready to recover the region
                           (recovery is stopped) or it needs
                           to clear the region in the log*
                           **this doesn't get called while
                           suspending**
- rh_recovery_end = NA, recovery is stopped
- rh_recovery_start = NA, recovery is stopped
- write_callback = 1) Writes w/o failures simply call
                   bio_endio -> mirror_end_io -> rh_dec
                   (see rh_dec above)
                   2) Writes with failures are put on
                   the failures list and queue_work is
                   called**
                   ** write_callbacks don't happen
                   during suspend **
- do_failures = NA, 'queue_work' not called if suspending
- add_mirror (initialization) = NA, only done on mirror creation
- queue_bio = NA, 1) delayed I/O scheduled before flush_workqueue
              is called.  2) No more I/Os are being issued.
              3) Re-attempted READs can still be handled.
              (Write completions are handled through rh_dec/
              write_callback - mention above - and do not
              use queue_bio.)

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm raid1: handle recovery failures
Jonathan Brassow [Fri, 8 Feb 2008 02:11:32 +0000 (02:11 +0000)]
dm raid1: handle recovery failures

This patch adds the calls to 'fail_mirror' if an error occurs during
mirror recovery (aka resynchronization).  'fail_mirror' is responsible
for recording the type of error by mirror device and ensuring an event
gets raised for the purpose of notifying userspace.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm raid1: handle write failures
Jonathan Brassow [Fri, 8 Feb 2008 02:11:29 +0000 (02:11 +0000)]
dm raid1: handle write failures

This patch gives mirror the ability to handle device failures
during normal write operations.

The 'write_callback' function is called when a write completes.
If all the writes failed or succeeded, we report failure or
success respectively.  If some of the writes failed, we call
fail_mirror; which increments the error count for the device, notes
the type of error encountered (DM_RAID1_WRITE_ERROR),  and
selects a new primary (if necessary).  Note that the primary
device can never change while the mirror is not in-sync (IOW,
while recovery is happening.)  This means that the scenario
where a failed write changes the primary and gives
recovery_complete a chance to misread the primary never happens.
The fact that the primary can change has necessitated the change
to the default_mirror field.  We need to protect against reading
garbage while the primary changes.  We then add the bio to a new
list in the mirror set, 'failures'.  For every bio in the 'failures'
list, we call a new function, '__bio_mark_nosync', where we mark
the region 'not-in-sync' in the log and properly set the region
state as, RH_NOSYNC.  Userspace must also be notified of the
failure.  This is done by 'raising an event' (dm_table_event()).
If fail_mirror is called in process context the event can be raised
right away.  If in interrupt context, the event is deferred to the
kmirrord thread - which raises the event if 'event_waiting' is set.

Backwards compatibility is maintained by ignoring errors if
the DM_FEATURES_HANDLE_ERRORS flag is not present.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: combine consecutive exceptions in memory
Milan Broz [Fri, 8 Feb 2008 02:11:27 +0000 (02:11 +0000)]
dm snapshot: combine consecutive exceptions in memory

Provided sector_t is 64 bits, reduce the in-memory footprint of the
snapshot exception table by the simple method of using unused bits of
the chunk number to combine consecutive entries.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: stripe enhanced status return
Brian Wood [Fri, 8 Feb 2008 02:11:24 +0000 (02:11 +0000)]
dm: stripe enhanced status return

This patch adds additional information to the status line. It is added at the
end of the returned text so it will not interfere with existing
implementations using this data. The addition of this information will allow
for a common return interface to match that returned with the dm-raid1.c
status line (with Jonathan Brassow's patches).

Here is a sample of what is returned with a mirror "status" call:
isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core

Here's what's returned with this patch for a stripe "status" call:
isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: stripe trigger event on failure
Brian Wood [Fri, 8 Feb 2008 02:11:22 +0000 (02:11 +0000)]
dm: stripe trigger event on failure

This patch adds the stripe_end_io function to process errors that might
occur after an IO operation. As part of this there are a number of
enhancements made to record and trigger events:

- New atomic variable in struct stripe to record the number of
errors each stripe volume device has experienced (could be used
later with uevents to report back directly to userspace)

- New workqueue/work struct setup to process the trigger_event function

- New end_io function. It is here that testing for BIO error conditions
take place. It determines the exact stripe that cause the error,
records this in the new atomic variable, and calls the queue_work() function

- New trigger_event function to process failure events. This
calls dm_table_event()

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm log: auto load modules
Jonathan Brassow [Fri, 8 Feb 2008 02:11:19 +0000 (02:11 +0000)]
dm log: auto load modules

If the log type is not recognised, attempt to load the module
'dm-log-<type>.ko'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: move deferred bio flushing to workqueue
Milan Broz [Fri, 8 Feb 2008 02:11:17 +0000 (02:11 +0000)]
dm: move deferred bio flushing to workqueue

Add a single-thread workqueue for each mapped device
and move flushing of the lists of pushback and deferred bios
to this new workqueue.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: use async crypto
Milan Broz [Fri, 8 Feb 2008 02:11:14 +0000 (02:11 +0000)]
dm crypt: use async crypto

dm-crypt: Use crypto ablkcipher interface

Move encrypt/decrypt core to async crypto call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: prepare async callback fn
Milan Broz [Fri, 8 Feb 2008 02:11:12 +0000 (02:11 +0000)]
dm crypt: prepare async callback fn

dm-crypt: Use crypto ablkcipher interface

Prepare callback function for async crypto operation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: add completion for async
Milan Broz [Fri, 8 Feb 2008 02:11:09 +0000 (02:11 +0000)]
dm crypt: add completion for async

dm-crypt: Use crypto ablkcipher interface
Prepare completion for async crypto request.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: add async request mempool
Milan Broz [Fri, 8 Feb 2008 02:11:07 +0000 (02:11 +0000)]
dm crypt: add async request mempool

dm-crypt: Use crypto ablkcipher interface

Introduce mempool for async crypto requests.

cc->req is used mainly during synchronous operations
(to prevent allocation and deallocation of the same object).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: extract scatterlist processing
Milan Broz [Fri, 8 Feb 2008 02:11:04 +0000 (02:11 +0000)]
dm crypt: extract scatterlist processing

dm-crypt: Use crypto ablkcipher interface

Move scatterlists to separate dm_crypt_struct and
pick out block processing from crypt_convert.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: tidy io ref counting
Milan Broz [Fri, 8 Feb 2008 02:11:02 +0000 (02:11 +0000)]
dm crypt: tidy io ref counting

Make io reference counting more obvious.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: introduce crypt_write_io_loop
Milan Broz [Fri, 8 Feb 2008 02:10:59 +0000 (02:10 +0000)]
dm crypt: introduce crypt_write_io_loop

Introduce crypt_write_io_loop().

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: abstract crypt_write_done
Milan Broz [Fri, 8 Feb 2008 02:10:57 +0000 (02:10 +0000)]
dm crypt: abstract crypt_write_done

Process write request in separate function and queue
final bio through io workqueue.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: store sector mapping in dm_crypt_io
Milan Broz [Fri, 8 Feb 2008 02:10:54 +0000 (02:10 +0000)]
dm crypt: store sector mapping in dm_crypt_io

Add sector into dm_crypt_io instead of using local variable.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: move queue functions
Alasdair G Kergon [Fri, 8 Feb 2008 02:10:52 +0000 (02:10 +0000)]
dm crypt: move queue functions

Reorder kcryptd functions for clarity.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: adjust io processing functions
Milan Broz [Fri, 8 Feb 2008 02:10:49 +0000 (02:10 +0000)]
dm crypt: adjust io processing functions

Rename functions to follow calling convention.
Prepare write io error processing function skeleton.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: tidy crypt_endio
Milan Broz [Fri, 8 Feb 2008 02:10:46 +0000 (02:10 +0000)]
dm crypt: tidy crypt_endio

Simplify crypt_endio function.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: move error setting outside crypt_dec_pending
Milan Broz [Fri, 8 Feb 2008 02:10:43 +0000 (02:10 +0000)]
dm crypt: move error setting outside crypt_dec_pending

Move error code setting outside of crypt_dec_pending function.
Use -EIO if crypt_convert_scatterlist() fails.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: remove unnecessary crypt_context write parm
Milan Broz [Fri, 8 Feb 2008 02:10:41 +0000 (02:10 +0000)]
dm crypt: remove unnecessary crypt_context write parm

Remove write attribute from convert_context and use bio flag instead.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm crypt: move convert_context inside dm_crypt_io
Milan Broz [Fri, 8 Feb 2008 02:10:38 +0000 (02:10 +0000)]
dm crypt: move convert_context inside dm_crypt_io

Move convert_context inside dm_crypt_io.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm mpath: add missing static
Alasdair G Kergon [Fri, 8 Feb 2008 02:10:35 +0000 (02:10 +0000)]
dm mpath: add missing static

A static declaration missing.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: targets no longer experimental
Alasdair G Kergon [Fri, 8 Feb 2008 02:10:32 +0000 (02:10 +0000)]
dm: targets no longer experimental

Drop the EXPERIMENTAL tag from well-established device-mapper targets, so
the newer ones stand out better.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: refactor dm_suspend completion wait
Milan Broz [Fri, 8 Feb 2008 02:10:30 +0000 (02:10 +0000)]
dm: refactor dm_suspend completion wait

Move completion wait to separate function

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: split dm_suspend io_lock hold into two
Milan Broz [Fri, 8 Feb 2008 02:10:27 +0000 (02:10 +0000)]
dm: split dm_suspend io_lock hold into two

Change io_locking to allow processing flush in separate thread.

Because we have DMF_BLOCK_IO already set, any possible
new ios are queued in dm_requests now.

In the case of interrupting previous wait there can be more
ios queued (we unlocked io_lock for a while) but this is safe.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: tidy dm_suspend
Milan Broz [Fri, 8 Feb 2008 02:10:25 +0000 (02:10 +0000)]
dm: tidy dm_suspend

Tidy dm_suspend function

 - change return value logic in dm_suspend
 - use atomic_read only once.
 - move DMF_BLOCK_IO clearing into one place

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: refactor deferred bio_list processing
Milan Broz [Fri, 8 Feb 2008 02:10:22 +0000 (02:10 +0000)]
dm: refactor deferred bio_list processing

Refactor deferred bio_list processing.

 - use separate _merge_pushback_list function
 - move deferred bio list pick up to flush function
 - use bio_list_pop instead of bio_list_get
 - simplify noflush flag use

No real functional change in this patch.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: tidy alloc_dev labels
Milan Broz [Fri, 8 Feb 2008 02:10:19 +0000 (02:10 +0000)]
dm: tidy alloc_dev labels

Tidy labels in alloc_dev to make later patches more clear.

No functional change in this patch.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm ioctl: use uninitialized_var
Andrew Morton [Fri, 8 Feb 2008 02:10:16 +0000 (02:10 +0000)]
dm ioctl: use uninitialized_var

drivers/md/dm-ioctl.c:1405: warning: 'param' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: table use uninitialized_var
Andrew Morton [Fri, 8 Feb 2008 02:10:14 +0000 (02:10 +0000)]
dm: table use uninitialized_var

drivers/md/dm-table.c: In function 'dm_get_device':
drivers/md/dm-table.c:478: warning: 'dev' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: use uninitialized_var
Andrew Morton [Fri, 8 Feb 2008 02:10:11 +0000 (02:10 +0000)]
dm snapshot: use uninitialized_var

drivers/md/dm-exception-store.c: In function 'persistent_read_metadata':
drivers/md/dm-exception-store.c:452: warning: 'new_snapshot' may be used uninitialized in this function

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: convert suspend_lock semaphore to mutex
Daniel Walker [Fri, 8 Feb 2008 02:10:08 +0000 (02:10 +0000)]
dm: convert suspend_lock semaphore to mutex

Replace semaphore with mutex.

Signed-off-by: Daniel Walker <dwalker@mvista.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm snapshot: use rounddown_pow_of_two
Robert P. J. Day [Fri, 8 Feb 2008 02:10:06 +0000 (02:10 +0000)]
dm snapshot: use rounddown_pow_of_two

Since the source file already includes the log2.h header file, it
seems pointless to re-invent the necessary routine.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: table remove unused total
Jun'ichi Nomura [Fri, 8 Feb 2008 02:10:04 +0000 (02:10 +0000)]
dm: table remove unused total

"total = 0" does nothing.

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: table remove unused variable
Vasily Averin [Fri, 8 Feb 2008 02:10:01 +0000 (02:10 +0000)]
dm: table remove unused variable

Save some bytes.

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: table use list_for_each
Paul Jimenez [Fri, 8 Feb 2008 02:09:59 +0000 (02:09 +0000)]
dm: table use list_for_each

This patch is some minor janitorish cleanup, using some macros
from linux/list.h (already #included via dm.h) to improve
readability.

Signed-off-by: Paul Jimenez <pj@place.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm ioctl: move compat code
Milan Broz [Fri, 8 Feb 2008 02:09:56 +0000 (02:09 +0000)]
dm ioctl: move compat code

Move compat_ioctl handling into dm-ioctl.c.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm ioctl: remove lock_kernel
Alasdair G Kergon [Fri, 8 Feb 2008 02:09:53 +0000 (02:09 +0000)]
dm ioctl: remove lock_kernel

Remove lock_kernel() from the device-mapper ioctls - there should
be sufficient internal locking already where required.

Also remove some superfluous casts.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: mark function lists static
Alasdair G Kergon [Fri, 8 Feb 2008 02:09:51 +0000 (02:09 +0000)]
dm: mark function lists static

Add a couple of statics.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agodm: add missing memory barrier to dm_suspend
Milan Broz [Fri, 8 Feb 2008 02:09:49 +0000 (02:09 +0000)]
dm: add missing memory barrier to dm_suspend

Add memory barrier to fix atomic_read of pending value.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
16 years agoSLUB: fix checkpatch warnings
Ingo Molnar [Wed, 6 Feb 2008 01:57:39 +0000 (17:57 -0800)]
SLUB: fix checkpatch warnings

fix checkpatch --file mm/slub.c errors and warnings.

 $ q-code-quality-compare
                                      errors   lines of code   errors/KLOC
 mm/slub.c      [before]                  22            4204           5.2
 mm/slub.c      [after]                    0            4210             0

no code changed:

    text    data     bss     dec     hex filename
   22195    8634     136   30965    78f5 slub.o.before
   22195    8634     136   30965    78f5 slub.o.after

   md5:
     93cdfbec2d6450622163c590e1064358  slub.o.before.asm
     93cdfbec2d6450622163c590e1064358  slub.o.after.asm

[clameter: rediffed against Pekka's cleanup patch, omitted
moves of the name of a function to the start of line]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
16 years agoUse non atomic unlock
Nick Piggin [Tue, 8 Jan 2008 07:20:27 +0000 (23:20 -0800)]
Use non atomic unlock

Slub can use the non-atomic version to unlock because other flags will not
get modified with the lock held.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
16 years agoSLUB: Support for performance statistics
Christoph Lameter [Fri, 8 Feb 2008 01:47:41 +0000 (17:47 -0800)]
SLUB: Support for performance statistics

The statistics provided here allow the monitoring of allocator behavior but
at the cost of some (minimal) loss of performance. Counters are placed in
SLUB's per cpu data structure. The per cpu structure may be extended by the
statistics to grow larger than one cacheline which will increase the cache
footprint of SLUB.

There is a compile option to enable/disable the inclusion of the runtime
statistics and its off by default.

The slabinfo tool is enhanced to support these statistics via two options:

-D  Switches the line of information displayed for a slab from size
mode to activity mode.

-A Sorts the slabs displayed by activity. This allows the display of
the slabs most important to the performance of a certain load.

-r Report option will report detailed statistics on

Example (tbench load):

slabinfo -AD ->Shows the most active slabs

Name                   Objects    Alloc     Free   %Fast
skbuff_fclone_cache         33 111953835 111953835  99  99
:0000192                  2666  5283688  5281047  99  99
:0001024                   849  5247230  5246389  83  83
vm_area_struct            1349   119642   118355  91  22
:0004096                    15    66753    66751  98  98
:0000064                  2067    25297    23383  98  78
dentry                   10259    28635    18464  91  45
:0000080                 11004    18950     8089  98  98
:0000096                  1703    12358    10784  99  98
:0000128                   762    10582     9875  94  18
:0000512                   184     9807     9647  95  81
:0002048                   479     9669     9195  83  65
anon_vma                   777     9461     9002  99  71
kmalloc-8                 6492     9981     5624  99  97
:0000768                   258     7174     6931  58  15

So the skbuff_fclone_cache is of highest importance for the tbench load.
Pretty high load on the 192 sized slab. Look for the aliases

slabinfo -a | grep 000192
:0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili

Likely skbuff_head_cache.

Looking into the statistics of the skbuff_fclone_cache is possible through

slabinfo skbuff_fclone_cache ->-r option implied if cache name is mentioned

.... Usual output ...

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath             111953360 111946981  99  99
Slowpath                 1044     7423   0   0
Page Alloc                272      264   0   0
Add partial                25      325   0   0
Remove partial             86      264   0   0
RemoteObj/SlabFrozen      350     4832   0   0
Total                111954404 111954404

Flushes       49 Refill        0
Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)

Looks good because the fastpath is overwhelmingly taken.

skbuff_head_cache:

Slab Perf Counter       Alloc     Free %Al %Fr
--------------------------------------------------
Fastpath              5297262  5259882  99  99
Slowpath                 4477    39586   0   0
Page Alloc                937      824   0   0
Add partial                 0     2515   0   0
Remove partial           1691      824   0   0
RemoteObj/SlabFrozen     2621     9684   0   0
Total                 5301739  5299468

Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)

Descriptions of the output:

Total: The total number of allocation and frees that occurred for a
slab

Fastpath: The number of allocations/frees that used the fastpath.

Slowpath: Other allocations

Page Alloc: Number of calls to the page allocator as a result of slowpath
processing

Add Partial: Number of slabs added to the partial list through free or
alloc (occurs during cpuslab flushes)

Remove Partial: Number of slabs removed from the partial list as a result of
allocations retrieving a partial slab or by a free freeing
the last object of a slab.

RemoteObj/Froz: How many times were remotely freed object encountered when a
slab was about to be deactivated. Frozen: How many times was
free able to skip list processing because the slab was in use
as the cpuslab of another processor.

Flushes: Number of times the cpuslab was flushed on request
(kmem_cache_shrink, may result from races in __slab_alloc)

Refill: Number of times we were able to refill the cpuslab from
remotely freed objects for the same slab.

Deactivate: Statistics how slabs were deactivated. Shows how they were
put onto the partial list.

In general fastpath is very good. Slowpath without partial list processing is
also desirable. Any touching of partial list uses node specific locks which
may potentially cause list lock contention.

Signed-off-by: Christoph Lameter <clameter@sgi.com>