|
Linux/390 - Notes and Observations |
||
Abstract
This document is a collection of extracts, observations and notes pertaining to the S/390 port of Linux.
Contents
On December 18, 1999, IBM published its modifications and additions to the Linux 2.2.13 code base for the support of the S/390 architecture. This port is designed to run under VM/ESA and natively. The code has subsequently been rolled into the 2.2.15 level.
This document contains information specific to the S/390 port of Linux. In it I have reproduced documentation found within the distribution that describes the I/O facilities and DASD handling. In addition, information that I have come across as I have looked at the port such as new source files, system calls, and register conventions have been included.
The following section was copied from the
Documentation/390 directory of the Linux distribution. It was written by Indo Adlung and is copyright IBM 1999, under the GNU Public License.
This chapter describes the common device support routines for Linux/390. Different than other hardware architectures, ESA/390 has defined a unified I/O access method. This gives relief to the device drivers as they don't have to deal with different bus types, polling versus interrupt processing, shared versus non-shared interrupt processing, DMA versus port I/O (PIO), and other hardware features more. i However, this implies that either every single device driver needs to implement the hardware I/O attachment functionality itself, or the operating system provides for a unified method to access the hardware, providing all the functionality that every single device driver would have to provide itself.
The document does not intend to explain the ESA/390 hardware architecture in every detail. This information can be obtained from the ESA/390 Principles of Operation manual (IBM Form. No. SA22-7201).
In order to build common device support for ESA/390 I/O interfaces, a functional layer was introduced that provides generic I/O access methods to the hardware. The following figure shows the usage of the common device support of Linux/390 using a TCP/IP driven device access an example. Similar figures could be drawn for other access methods, e.g. file system access to disk devices.
The common device support layer shown above comprises the I/O support routines defined below. Some of them implement common Linux device driver interfaces, while some of them are ESA/390 platform specific.
allow a device driver to determine the devices attached (visible) to the system and their current status.
get IRQ (subchannel) from device number and vice versa.
read device characteristics
obtain ownership for a specific device.
release ownership for a specific device.
disable a device from presenting interrupts.
enable a device, allowing for I/O interrupts.
initiate an I/O request.
terminate the current I/O request processed on the device.
generic interrupt routine. This function is called by the interrupt entry routine whenever an I/O interrupt is presented to the system. The do_IRQ() routine determines the interrupt status and calls the device specific interrupt handler according to the rules (flags) defined during I/O request initiation with do_IO().
The next sections describe the functions, other than
do_IRQ() in more details. The do_IRQ() interface is not described, as it is called from the Linux/390 first level interrupt handler only and does not comprise a device driver callable interface. Instead, the functional description of do_IO() also describes the input to the device specific interrupt handler.
The following chapters describe the I/O related interface routines the Linux/390 common device support (CDS) provides to allow for device specific driver implementations on the IBM ESA/390 hardware platform. Those interfaces intend to provide the functionality required by every device driver implementation to allow driving a specific hardware device on the ESA/390 platform. Some of the interface routines are specific to Linux/390 and some of them can be found on other Linux platforms' implementations too.
Miscellaneous function prototypes, data declarations, and macro definitions can be found in the architecture specific "C header file"
linux/arch/s390/kernel/IRQ.h.
Different to other hardware platforms, the ESA/390 architecture does not define interrupt lines managed by a specific interrupt controller and bus systems that may or may not allow for shared interrupts, DMA processing, etceteras. Instead, the ESA/390 architecture has implemented a so-called channel subsystem, which provides a unified view of the devices physically attached to the systems. Though the ESA/390 hardware platform knows about a huge variety of different peripheral attachments like disk devices (also known as DASD), tapes, communication controllers, they can all by accessed by a well defined access method and they are presenting I/O completion a unified way: I/O interruptions. Every single device is uniquely identified to the system by a so-called subchannel, where the ESA/390 architecture allows for 64k devices to be attached.
Linux, however was first built on the Intel PC architecture, with its two cascaded 8259 programmable interrupt controllers (PICs), that allow for a maximum of 15 different interrupt lines. All devices attached to such a system share those 15 interrupt levels. Devices attached to the ISA bus system must not share interrupt levels (also known as IRQs), as the ISA bus bases on edge triggered interrupts. MCA, EISA, PCI and other bus systems base on level triggered interrupts, and thus allow for shared IRQs. However, if multiple devices present their hardware status by the same (shared) IRQ, the operating system has to call every single device driver registered on this IRQ in order to determine the device driver owning the device that raised the interrupt.
In order not to introduce a new I/O concept to the common Linux code, Linux/390 preserves the IRQ concept and semantically maps the ESA/390 subchannels to Linux as IRQs. This allows Linux/390 to support up to 64k different IRQs, uniquely representig a single device each.
During its startup the Linux/390 system checks for peripheral devices. A so-called "subchannel" uniquely defines each of those devices by the ESA/390 channel subsystem. While the subchannel numbers are system generated, each subchannel also takes a user-defined attribute, the so-called "device number". Both, subchannel number and device number can not exceed 65535. The
init_IRQ() routine gathers the information about control unit type and device types that imply specific I/O commands (channel command words or CCWs) in order to operate the device. Device drivers can retrieve this set of hardware information during their initialization step to recognize the devices they support using get_dev_info_by_IRQ() or get_dev_info_by_devno() respectively.
This methods implies that Linux/390 does not require to probe for free (not armed) interrupt request lines (IRQs) to drive its devices with. Where applicable, the device drivers can use the
read_dev_chars() to retrieve device characteristics. This can be done without having to request device ownership previously.
When a device driver has recognized a device it wants to claim ownership for, it calls
request_IRQ() with the device's subchannel id serving as pseudo IRQ line. One of the required parameters it has to specify is dev_id, defining a device status block, the CDS layer will use to notify the device driver's interrupt handler about interrupt information observed. It depends on the device driver to properly handle those interrupts.
In order to allow for easy I/O initiation the CDS layer provides a
do_IO() interface that takes a device specific channel program (one or more CCWs) as input sets up the required architecture specific control blocks and initiates an I/O request on behalf of the device driver. The do_IO() routine allows for different I/O methods, synchronous and asynchronous, and allows to specify whether it expects the CDS layer to notify the device driver for every interrupt it observes, or with final status only. It also provides a scheme to allow for overlapped I/O processing. See "2.9 do_IO() - Initiate I/O Request" on page * for more details. A device driver must never issue ESA/390 I/O commands itself, but must use the Linux/390 CDS interfaces instead.
For long running I/O request to be canceled, the CDS layer provides the
halt_IO() function. Some devices require to initially issue a HALT SUBCHANNEL (HSCH) command without having pending I/O requests. This function is also covered by halt_IO().
When done with a device, the device driver calls
free_IRQ() to release its ownership for the device. During free_IRQ() processing the CDS layer also disables the device from presenting further interrupts: the device driver does not need to assure it. The device will be re-enabled for interrupts with the next call to request_IRQ().
During system startup -
init_IRQ() processing - the generic I/O device support checks for the devices available. For all devices found it collects the Sense-ID information. For those devices supporting the command it also obtains extended Sense-ID information.
int get_dev_info_by_IRQ( int IRQ, dev_info_t *devinfo); int get_dev_info_by_devno( unsigned int IRQ, dev_info_t *devinfo);
|
|
Defines the subchannel, status information is to be returned for. |
|
device number. |
|
Pointer to a user buffer of type dev_info_t that should be filled with device specific information. |
typedef struct {
unsigned int devno; /* device number */
unsigned int status; /* device status */
senseid_t sid_data; /* senseID data */
} dev_info_t;
|
|
Device number as configured in the IOCDS |
|
device status |
|
data obtained by a SenseID call |
Possible status values are:
DEVSTAT_NOT_OPER - device was found not operational. In this case the caller should disregard the sid_data buffer content.
//
// SenseID response buffer layout
//
typedef struct {
/* common part */
unsigned char reserved; /* always 0x'FF' */
unsigned short cu_type; /* control unit type */
unsigned char cu_model; /* control unit model */
unsigned short dev_type; /* device type */
unsigned char dev_model; /* device model */
unsigned char unused; /* padding byte */
/* extended part */
ciw_t ciw[62]; /* variable # of CIWs */
} senseid_t;
The ESA/390 I/O architecture defines certain device specific I/O functions. The device returns the device specific command code together with the Sense-ID data in so called Command Information Words (CIW):
typedef struct _ciw {
unsigned int et : 2; // entry type
unsigned int reserved : 2; // reserved
unsigned int ct : 4; // command type
unsigned int cmd : 8; // command
unsigned int count : 16; // count
} ciw_t;
Possible CIW entry types are:
#define CIW_TYPE_RDC 0x0; // read configuration data
#define CIW_TYPE_SII 0x1; // set interface identifier
#define CIW_TYPE_RNI 0x2; // read node identifier
The device driver may use these commands as appropriate.
The
get_dev_info_by_IRQ() / get_dev_info_by_devno() functions return:
|
0 |
Sucessful completion |
|
-ENODEV |
IRQ or devno don't specify a known subchannel or device number. |
|
-EINVAL |
Invalid devinfo value. |
In order to scan for known devices a device driver should scan all IRQs by calling
get_dev_info() until it returns -ENODEV as there are not any more available devices.
If a device driver wants to request ownership for a specific device it must call
request_IRQ() prior to be able to issue any I/O request for it, including above mentioned device dependent commands.
Please see the "ESA/390 Common I/O-Commands and Self Description" manual, with IBM form number SA22-7204 for more details on how to read the Sense-ID output, CIWs and device independent commands.
While some device drivers act on the IRQ (subchannel) only, others take user defined device configurations on device number base, according to the device numbers configured in the IOCDS. The following routines serve the purpose to convert IRQ values into device numbers and vice versa.
int get_IRQ_by_devno( unsigned int devno ); unsigned int get_devno_by_IRQ( int IRQ );
The functions return :
This routine returns the characteristics for the device specified.
The function is meant to be called without an IRQ handler being in place. However, the IRQ for the requested device must not be locked or this will cause a deadlock situation. Further, the driver must assure that nobody else has claimed ownership for the requested IRQ yet or the owning device driver's internal accounting may be affected.
In case of a registered interrupt handler, the interrupt handler must be able to properly react on interrupts related to the
read_dev_chars() I/O commands. While the request is processed synchronously, the device interrupt handler is called for final ending status. In case of error situations the interrupt handler may recover appropriately. The device IRQ handler can recognize the corresponding interrupts by the interruption parameter being 0x00524443. If using the function with an existing device interrupt handler in place, the IRQ must be locked prior to call read_dev_chars().
The function may be called enabled or disabled.
int read_dev_chars( int IRQ, void **buffer, int length );
|
|
specifies the subchannel the device characteristic retrieval is requested for |
|
pointer to a buffer pointer. The buffer pointer itself may be NULL to have the function allocate a buffer or must contain a valid buffer area. |
|
length of the buffer provided or to be allocated. |
The read_dev_chars() function returns :
|
0 |
Successful completion |
|
-ENODEV |
IRQ does not specify a valid subchannel number |
|
-EINVAL |
An invalid parameter was detected |
|
-EBUSY |
An irrecoverable I/O error occurred or the device is not operational |
The function can be used in two ways:
read_dev_chars() allocates a data buffer and provides the device characteristics together. It is the caller's responsibility to release the kernel memory if not longer needed. This behavior is triggered by specifying a NULL buffer area (*buffer == NULL).
In either case the caller must provide the data area length: for the buffer specified or the buffer wanted allocated.
As previously discussed a device driver will scan for the devices its supports by calling
get_dev_info(). Once it has found a device it will call request_IRQ() to request ownership for it. This call causes the subchannel to be enabled for interrupts if it was found operational.
int request_IRQ( unsigned int IRQ, int (*handler)( int, void *, struct pt_regs *), unsigned long irqflags, const char *devname, void *dev_id);
|
|
Specifies the subchannel the ownership is requested for |
|
Specifies the device driver's interrupt handler to be called for interrupt processing |
|
IRQ flags, must be 0 (zero) or SA_SAMPLE_RANDOM |
|
Device name |
|
Required pointer to a device specific buffer of type devstat_t |
typedef struct {
unsigned int devno; /* device number from irb */
unsigned int intparm; /* interrupt parameter */
unsigned char cstat; /* channel status - accumulated */
unsigned char dstat; /* device status - accumulated */
unsigned char lpum; /* last path used mask from irb */
unsigned char unused; /* not used - reserved */
unsigned int flag; /* flag : see below */
unsigned long cpa; /* CCW addr from irb at prim. status */
unsigned int rescnt; /* count from irb at primary status */
unsigned int scnt; /* sense count, if available */
union {
irb_t irb; /* interruption response block */
sense_t sense; /* sense information */
} ii; /* interrupt information */
} devstat_t;
During
request_IRQ() processing, the devstat_t layout does not matter as it won't be used during request_IRQ() processing. See "2.9 do_IO() - Initiate I/O Request" on page * for a functional description of its usage.
The
request_IRQ() function returns :
|
0 |
Successful completion |
|
-EINVAL |
An invalid parameter was detected |
|
-EBUSY |
Device (subchannel) already owned |
|
-ENODEV |
The device is not operational |
|
-ENOMEM |
Not enough kernel memory to process request |
While Linux for Intel defines dev_id as a unique identifier for shared interrupt lines it has a totally different purpose on Linux/390. Here it serves as a shared interrupt status area between the generic device support layer, and the device specific driver. The value passed to
request_IRQ() must therefore point to a valid devstat_t type buffer area the device driver must preserve for later usage. That is, it must not be released prior to a call to free_IRQ().
The only value parameter irqflags supports is SA_SAMPLE_RANDOM if appropriate. The Linux/390 kernel does not know about "fast" interrupt handlers, or does it allow for interrupt sharing. Remember, the term interrupt level (IRQ), device, and subchannel are used interchangeably in Linux/390.
If
request_IRQ() was called in enabled state, or if multiple CPUs are present, the device may present an interrupt to the specified handler prior to request_IRQ() return to the caller already. This includes the possibility of unsolicited interrupts or a pending interrupt status from an earlier solicited I/O request. The device driver must be able to handle this situation properly or the device may become non-operational.
Although the interrupt handler is defined to be called with a pointer to a struct pt_regs buffer area, this is not implemented by the Linux/390 generic I/O device driver support layer. The device driver's interrupt handler must therefore not rely on this parameter on function entry.
A device driver may call
free_IRQ() to release ownership of a previously acquired device.
void free_IRQ( unsigned int IRQ, void *dev_id);
|
|
Specifies the subchannel the ownership is requested for |
|
Required pointer to a device specific buffer of type devstat_t. This must be the same as the one specified during a previous call to request_IRQ(). |
Unfortunately
free_IRQ() is defined not to return error codes. That is, if called with wrong parameters a device may still be operational although there is no device driver available to handle its interrupts. Further, during free_IRQ() processing we may possibly find pending interrupt conditions. As those need to be processed, we have to delay free_IRQ() returning until a clean device status is found by synchronously handling them.
The call to
free_IRQ() will also cause the device (subchannel) be disabled for interrupts. The device driver must not release any data areas required for interrupt processing prior to free_IRQ() return to the caller as interrupts can occur prior to free_IRQ() returning. This is also true when called in disabled state if either multiple CPUs are presents or a pending interrupt status was found during free_IRQ() processing.
This function may be called at any time to disable interrupt processing for the specified IRQ. However, as Linux/390 maps IRQs to the device (subchannel) one-to-one, this may require more extensive I/O processing than anticipated, especially if an interrupt status is found pending on the subchannel that requires synchronous error processing.
int disable_IRQ( unsigned int IRQ );
|
|
Specifies the subchannel to be disabled |
The disable-IRQ() routine may return:
|
0 |
Successful completion |
|
-EBUSY |
Device (subchannel) already owned |
|
-ENODEV |
The device is not operational or the IRQ does not specify a valid subchannel |
Unlike the Intel based hardware architecture the ESA/390 architecture does not have a programmable interrupt controller (PIC) where a specific interrupt line can be disabled. Instead the subchannel logically representing the device in the channel subsystem must be disabled for interrupts. However, if there are still interrupt conditions pending they must be processed first in order to allow for proper processing after re-enabling the device at a later time. This may lead to delayed disable processing.
As described previously the disable processing may require extensive processing. Therefore disabling and re-enabling the device using
disable_IRQ() or enable_IRQ() should be avoided and is not suitable for high frequency operations.
Linux for Intel defines this function
void disable_IRQ( int IRQ);
This is suitable for the Intel PC architecture as this only causes to mask the requested IRQ line in the PIC which is not applicable for the ESA/390 architecture. Therefore we allow for returning error codes.
This function is used to enable a previously disabled device (subchannel). See "2.7 disable_IRQ() - Disable Interrupts for a given Device" on page
* for more details.int enable_IRQ( unsigned int IRQ );
|
|
Specifies the subchannel to be enabled |
The
enable-IRQ() routine may return:
|
|
Successful completion |
|
Device (subchannel) busy, which implies the device is already enabled |
|
The device is not operational or the IRQ does not specify a valid subchannel |
The
do_IO() routines is the I/O request front-end processor. All device driver I/O requests must be issued using this routine. A device driver must not issue ESA/390 I/O commands itself. Instead the do_IO() routine provides all interfaces required to drive arbitrary devices.
This description also covers the status information passed to the device driver's interrupt handler as this is related to the rules (flags) defined with the associated I/O request when calling
do_IO().
int do_IO( int IRQ, ccw1_t *cpa, unsigned long intparm, unsigned int lpm, unsigned long flag);
|
|
IRQ (subchannel) the I/O request is destined for |
|
Logical start address of channel program |
|
User-specific interrupt information; will be presented back to the device driver's interrupt handler. Allows a device driver to associate the interrupt with a particular I/O request. |
|
Defines the channel path to be used for a specific I/O request. Valid with flag value of DOIO_VALID_LPM only. |
|
Defines the action to be performed for I/O processing |
Possible flag values are:
|
|
Allow for early interrupt notification |
|
LPM input parameter is valid (see usage notes for details) |
|
Wait synchronously for final status |
|
Report all interrupt conditions |
The cpa parameter points to the first format 1 CCW of a channel program:
typedef struct {
char cmd_code; /* command code */
char flags; /* flags, like IDA addressing, etc. */
unsigned short count; /* byte count */
void *cda; /* data address */
} ccw1_t __attribute__ ((aligned(8)));
with the following CCW flags values defined:
|
|
Data chaining |
|
Command chaining |
|
Suppress incorrect length |
|
Skip |
|
PCI |
|
Indirect addressing |
|
Suspend |
The
do_IO() function returns:
|
|
Successful completion or request successfully initiated |
|
The do_io() function was called out of sequence. The device is currently processing a previous I/O request |
|
IRQ does not specify a valid subchannel, the device is not operational (check dev_id.flags) or the IRQ is not owned. |
|
Both DOIO_EARLY_NOTIFICATION and DOIO_REORT_ALL flags have been specified. The usage of those flags is mutual exclusive. |
When the I/O request completes, the CDS first level interrupt handler will setup the dev_id buffer of type devstat_t defined during
request_IRQ() processing. See "2.5 request_IRQ() - Request Device Ownership" on page * for the devstat_t data layout. The dev_id->intparm field in the device status area will contain the value the device driver has associated with a particular I/O request. If a pending device status was recognized dev_id->intparm will be set to 0 (zero). This may happen during I/O initiation or delayed by an alert status notification.
In any case this status is not related to the current (last) I/O request. In case of a delayed status notification no special interrupt will be presented to indicate I/O completion as the I/O request was never started, even though
do_IO() returned with successful completion.
Possible
dev_id->flag values are:
|
|
Sense data is available |
|
Device is not operational |
|
Interrupt is presented as a result of a call to do_IO() |
|
Interrupt is presented as a result of a call to halt_IO() |
|
A pending status was found. The I/O request (if any) was not initiated. This status might have been presented delayed, after do_IO() or halt_IO() have successfully be started previously. |
|
This is a final interrupt status for the I/O request identified by intparm. |
If device status DEVSTAT_FLAG_SENSE_AVAIL is indicated in field dev_id->flag, field dev_id->scnt describes the number of device specific sense bytes available in the sense area dev_id->ii.sense. No device sensing by the device driver itself is required.
typedef struct {
unsigned char res[32]; /* reserved */
unsigned char data[32]; /* sense data */
} sense_t;
The device interrupt handler can use the following definitions to investigate the primary unit check source coded in sense byte 0:
|
|
0x80 |
|
0x40 |
|
0x20 |
|
0x10 |
|
0x08 |
|
0x04 |
Depending on the device status, multiple of those values may be set together. Please refer to the device specific documentation for details.
The devi_id->cstat field provides the (accumulated) subchannel status:
|
|
Program controlled interrupt |
|
Incorrect length |
|
Program check |
|
Protection check |
|
Channel data check |
|
Channel control check |
|
Interface control check |
|
Chaining check |
The dev_id->dstat field provides the (accumulated) device status:
|
|
Attention |
|
Status modifier |
|
Control unit end |
|
Busy |
|
Channel end |
|
Device end |
|
Unit check |
|
Unit exception |
Please see the ESA/390 Principles of Operation manual for details on the individual flag meanings.
In rare error situations the device driver may require access to the original hardware interrupt data beyond the scope of previously mentioned information. For those situations the Linux/390 common device support provides the interrupt response block (IRB) as part of the device status block in
dev_id->ii.irb.
Prior to call
do_IO() the device driver must assure disabled state, that is, the I/O mask value in the PSW must be disabled. This can be accomplished by calling __save_flags(flags). The current PSW flags are preserved and can be restored by __restore_flags(flags) at a later time.
If the device driver violates this rule while running in a uni-processor environment an interrupt might be presented prior to the
do_IO() routine returning to the device driver main path. In this case we will end in a deadlock situation, as the interrupt handler will try to obtain the IRQ lock the device driver still owns.
The driver must assure to hold the device specific lock. This can be accomplished by
Option (i) should be used if the calling routine is running disabled for I/O interrupts already. Option (ii) obtains the device gate and puts the CPU into I/O disabled state by preserving the current PSW flags.
See the descriptions of s390irq_spin_lock() or s390irq_spin_lock_irqsave() for more details.
The device driver is allowed to issue the next
do_IO() call from within its interrupt handler already. It is not required to schedule a bottom-half, unless an non deterministically long running error recovery procedure or similar needs to be scheduled. During I/O processing the Linux/390 generic I/O device driver support has already obtained the IRQ lock, that is, the handler must not try to obtain it again when calling do_IO() or we end in a deadlock situation. Anyway, the device driver's interrupt handler must only call do_IO() if the handler itself can be entered recursively if do_IO(), for example, it finds a status pending and needs to all the interrupt handler itself.
Device drivers should not rely on
DOIO_WAIT_FOR_INTERRUPT synchronous I/O request processing too heavily. All I/O devices, but the console device are driven using a single shared interrupt subclass (ISC). For synchronous processing the device is temporarily mapped to a special ISC while the calling CPU waits for I/O completion. As this special ISC is gated, all synchronous requests in an SMP environment are serialized which may cause other CPUs to spin. This service is primarily meant to be used during device driver initialization for ease of device setup.
The lpm input parameter might be used for multi-path devices shared among multiple systems as the Linux/390 CDS is not grouping channel paths. Therefore, its use might be required if multiple access paths to a device are available and the device was reserved by means of a reserve device command (for devices supporting this technique). When issuing this command the device driver needs to extract the
dev_id->lpum value and restrict all subsequent channel programs to this channel path until the device is released by a device release command. Otherwise a deadlock may occur.
If a device driver relies on an I/O request to be completed prior to start the next it can reduce I/O processing overhead by chaining a no-op I/O command
CCW_CMD_NOOP to the end of the submitted CCW chain. This will force Channel-End and Device-End status to be presented together, with a single interrupt.
However, this should be used with care as it implies the channel will remain busy, not being able to process I/O requests for other devices on the same channel. Therefore, for example, read commands should never use this technique, as the result will be presented by a single interrupt anyway.
In order to minimize I/O overhead, a device driver should use the
DOIO_REPORT_ALL only if the device can report intermediate interrupt information prior to device-end the device driver urgently relies on. In this case all I/O interruptions are presented to the device driver until final status is recognized.
If a device is able to recover from asynchronously presented I/O errors, it can perform overlapping I/O using the
DOIO_EARLY_NOTIFICATION flag. While some devices always report channel-end and device-end together, with a single interrupt, others present primary status (channel-end) when the channel is ready for the next I/O request and secondary status (device-end) when the data transmission has been completed at the device.
The previously mentioned flag allows exploitation of this feature, for example, for communication devices that can handle lost data on the network to allow for enhanced I/O processing.
Unless the channel subsystem at any time presents a secondary status interrupt, exploiting this feature will cause only primary status interrupts to be presented to the device driver while overlapping I/O is performed. When a secondary status without error (alert status) is presented, this indicates successful completion for all overlapping
do_IO() requests that have been issued since the last secondary (final) status.
During interrupt processing the device specific interrupt handler should avoid basing its processing decisions on the interruption response block (IRB) that is part of the dev_id buffer area. The IRB area represents the interruption parameters from the last interrupt received. Unless the device driver has specified
DOIO_REPORT_ALL or is called with a pending status (DEVSTAT_STATUS_PENDING), the IRB information may or may not show the complete interruption status, but the last interrupt only. Therefore the device driver should usually base its processing decisions on the values of dev_id->cstat and dev_id->dstat that represent the accumulated subchannel and device status information gathered since do_IO() request initiation.
Sometimes a device driver might need a possibility to stop the processing of a long-running channel program or the device might require to initially issue a halt subchannel (HSCH) I/O command. For those purposes the
halt_IO() command is provided.
int halt_IO( int IRQ, /* subchannel number */ int intparm, /* dummy intparm */ unsigned int flag); /* operation mode */
|
|
IRQ (subchannel) the halt operation is requested for |
|
Interruption parameter; value is only used if no I/O is outstanding, otherwise the intparm associated with the I/O request is returned |
|
0 (zero) or DOIO_WAIT_FOR_INTERRUPT |
The
halt_IO() function returns:
|
|
Successful completion or request successfully initiated |
|
The device is currently performing a synchronous I/O operation: do_IO() with flag DOIO_WAIT_FOR_INTERRUPT or an error was encountered and the device is currently be sensed |
|
The IRQ specified does not specify a valid subchannel, the device is not operational (check dev_id.flags) or the IRQ is not owned. |
A device driver may write a never-ending channel program by writing a channel program that at its end loops back to its beginning by means of a transfer in channel (TIC) command (CCW_CMD_TIC). Usually network device drivers perform this by setting the PCI CCW flag (CCW_FLAG_PCI). Once this CCW is executed a program controlled interrupt (PCI) is generated. The device driver can then perform an appropriate action. Prior to interrupt of an outstanding read to a network device (with or without PCI flag) a
halt_IO() is required to end the pending operation.
We do not allow the stopping of synchronous I/O requests by means of a
halt_IO() call. The function will return -EBUSY instead.
This section describes various routines to be used in a Linux/390 device driver programming environment.
s390irq_spin_lock() / s390irq_spin_unlock()
These two macro definitions are required to obtain the device specific IRQ lock. The lock needs to be obtained if the device driver intends to call
do_IO() or halt_IO() from anywhere but the device interrupt handler (where the lock is already owned). Those routines must only be used if running disabled for interrupts already. Otherwise use s390irq_spin_lock_irqsave() and the corresponding unlock routine instead.
s390irq_spin_lock( int IRQ); s390irq_spin_unlock( int IRQ);
s390irq_spin_lock_irqsave() / s390_IRQ_spin_unlock_irqrestore()
These two macro definitions are required to obtain the device specific IRQ lock. The lock needs to be obtained if the device driver intends to call do_IO() or halt_IO() from anywhere but the device interrupt handler (where the lock is already owned). Those routines should only be used if running enabled for interrupts. If running disabled already, the driver should use s390irq_spin_lock() and the corresponding unlock routine instead.
s390irq_spin_lock_irqsave( int IRQ, unsigned long flags); s390irq_spin_unlock_irqrestore( int IRQ, unsigned long flags);
This section describes the special interface routines required for system console processing. Though they are an extension to the Linux/390 device driver interface concept, they base on the same principles. It was necessary to build those extensions to assure a deterministic behavior in critical situations, for example,
printk() messages by other device drivers running disabled for interrupts during I/O interrupt handling or in case of a panic() message being raised.
This routine allows specification of the system console device. This is necessary as the console is not driven by the same ESA/390 interrupt subclass as are other devices, but it is assigned its own interrupt subclass. Only one device can act as system console. See
wait_cons_dev() for details.
int set_cons_dev( int IRQ);
|
IRQ |
Subchannel identifying the system console device |
The
set_cons_dev() function returns
|
|
Successful completion |
|
An unhandled interrupt condition is pending for the specified subchannel (IRQ) - status pending |
|
IRQ does not specify a valid subchannel or the device is not operational |
|
The console device is already defined |
This routine allows for resetting the console device specification. See "2.12.1 set_cons_dev() - Set Console Device" on page
* for details.int reset_cons_dev( int IRQ);
|
IRQ |
Subchannel identifying the system console device |
The
reset_cons_dev() function returns
|
|
Successful completion |
|
An unhandled interrupt condition is pending for the specified subchannel (IRQ) - status pending |
|
IRQ does not specify a valid subchannel or the device is not operational |
The
wait_cons_dev() routine is used by the console device driver when its buffer pool for intermediate request queuing is exhausted and a new output request is received. In this case the console driver uses the wait_cons_dev() routine to synchronously wait until enough buffer space is gained to enqueue the current request. Any pending interrupt condition for the console device found during wait_cons_dev() processing causes its interrupt handler to be called.
int wait_cons_dev( int IRQ);
|
IRQ |
Subchannel identifying the system console device |
The wait_cons_dev() function returns :
|
|
Successful completion |
|
The IRQ specified does not match the IRQ configured for the console device by set_cons_dev() |
The function should be used carefully. Especially in a SMP environment the
wait_cons_dev() processing requires that all but the special console ISC are disabled. In a SMP system this requires the other CPUs to be signaled to disable/enable those ISCs.
Linux/390 uses the following major and minor device numbers.
0 = /dev/dasd0 First DASD device, major 1 = /dev/dasd0a First DASD device, block 1 2 = /dev/dasd0b First DASD device, block 2 3 = /dev/dasd0c First DASD device, block 3 4 = /dev/dasd1 Second DASD device, major 5 = /dev/dasd1a Second DASD device, block 1 6 = /dev/dasd1b Second DASD device, block 2 7 = /dev/dasd1c Second DASD device, block 3
0 = /dev/mnd0 First VM/ESA minidisk 1 = /dev/mnd1 Second VM/ESA minidisk
The following section was copied from the
Documentation/390 directory of the Linux distribution. It was written by Indo Adlung and is copyright IBM 1999, under the GNU Public License.
Linux manages S/390_s disk devices (DASD) via the DASD device driver. It is valid for all types of DASDs and represents them to Linux as block devices, namely "DASD". Currently the DASD driver uses a single major number (94) and 4 minor numbers per volume (1 for the physical volume and 3 for partitions). With respect to partitions see the following discussion. Thus you may have up to 64 DASD devices in your system.
The kernel parameter 'dasd=from-to,...' may be issued arbitrary times in the kernel's parameter line or not at all. The 'from' and 'to' parameters are to be given in hexadecimal notation without a leading 0x.
If you supply kernel parameters the different instances are processed in order of appearance and a minor number is reserved for any device covered by the supplied range up to 64 volumes. Additional DASDs are ignored. If you do not supply the 'dasd=' kernel parameter at all, the DASD driver registers all supported DASDs of your system to a minor number in ascending order of the subchannel number.
The driver currently supports ECKD-devices and there are stubs for support of the FBA and CKD architectures. For the FBA architecture only some smart data structures are missing to make the support complete.
We performed our testing on 3380 and 3390 type disks of different sizes, under VM and on the bare hardware (LPAR), using internal disks of the Multiprise as well as a RAMAC virtual array. Disks exported by an Enterprise Storage Server (Seascape) should work fine as well.
We currently implement one partition per volume, which is the whole volume, skipping the first blocks up to the volume label. These are reserved for IPL records and IBM's volume label to assure accessibility of the DASD from other operating systems. In a later stage we will provide support of partitions, maybe VTOC oriented or using a kind of partition table in the label record.
For using an ECKD-DASD as a Linux hard disk you have to low-level format the tracks by issuing the
BLKDASDFORMAT-ioctl on that device. This will erase any data on that volume including IBM volume labels, VTOCs etceteras. The ioctl may take a 'struct format_data *' or 'NULL' as an argument.
typedef struct {
int start_unit;
int stop_unit;
int blksize;
} format_data_t;
When a NULL argument is passed to the
BLKDASDFORMAT ioctl the whole disk is formatted to a blocksize of 1024 bytes. Otherwise start_unit and stop_unit are the first and last track to be formatted. If stop_unit is -1 it implies that the DASD is formatted from start_unit up to the last track. blksize can be any power of two between 512 and 4096. We recommend no blksize lower than 1024 because the ext2fs uses 1kB blocks anyway and you gain approximately 50% of capacity increasing your blksize from 512 byte to 1kB.
Then you can mk??fs the filesystem of your choice on that volume or partition. For reasons of sanity you should build your filesystem on the partition
/dev/dd?1 instead of the whole volume. You only lose 3kB but may be sure that you can reuse your data after introduction of a real partition table.
The following is a list of files, and their functions, which were added to the Linux distribution by the Linux/390 developers.
|
File |
Description |
|
Perform low level format of DASD |
|
Code to support IPL from ECKD device |
|
Code to support IPL from FBA device |
|
S/390 support of SILO |
|
Various bitmaps used by test/set functions |
|
Issue CP command from Linux (DIAG 8) |
|
Header file for CP command support |
|
EBCDIC/ASCII translation tables and conversion routines |
|
S/390 Low-level entry points |
|
LIBGCC for software floating point |
|
Enable debugger support within kernel |
|
Routine to handle boot and kernel setup |
|
Header file for IEEE floating point support |
|
Initial task structure |
|
S/390 IRQ instantiation |
|
Header file for IRQ support |
|
Channel support code |
|
Mapping of S/390 low-core areas |
|
Handle IEEE floating point on S/390 |
|
Handle the S/390-dependent parts of process handling |
|
Kernel tracing support |
|
Floating point support code |
|
I/O support routines (such as read device chars/DIAG 210) |
|
Header file for S/390 I/O support routines |
|
Kernel symbols |
|
Handles the architecture-dependent parts of initialization |
|
Signal handling (not SIGP but software signals) |
|
SMP support (the SIGP stuff) |
|
Handle system calls that use non-standard call sequences |
|
Time support routines (for example, gettimeofday()) |
|
Handles hardware traps and faults after initial save |
|
Network checksum routines (uses CKSM instruction) |
|
Delay routines |
|
Fast memset routine (uses MVCLE) |
|
Fast strcmp routine (uses CLST) |
|
Fast strncpy routine |
|
Page fault exception table processing |
|
Page fault handling |
|
Memory initialization routines |
|
Re-map IO memory to kernel address space |
|
Header that maps the a.out object format |
|
Atomic operations that C cannot guarantee |
|
Various bit-operation macros and definitions |
|
Included by main.c to check for S/390-dependent bugs |
|
Various byte ordering/reordering routines |
|
Level 1 cache definitions |
|
Fast network checksum routines |
|
S/390 definition of the "current" variable |
|
Delay routine header file |
|
DMA header file (dummy I guess) |
|
EBCDIC/ASCII translate table & routine header file |
|
ELF-390 definitions |
|
Error number definitions |
|
File control routine, structure, and variable definitions |
|
Debugger stub support definitions |
|
I/O interrupt definitions, structures and variables |
|
init.c support definitions |
|
Low-level I/O support definitions |
|
IOCTL command support definitions |
|
IOCTL related definitions |
|
Inter-Process Communication definitions |
|
Interrupt routine definitions |
|
Channel related definitions |
|
Map of low core |
|
IEEE floating point emulation support definitions |
|
Machine-specific definitions |
|
Miscellaneous alignment definitions |
|
Memory Map ( mmap()) related definitions |
|
Memory management context definitions |
|
Support definitions for namei() |
|
Page and paging related definitions |
|
System parameters |
|
Page table definitions (3 tier + 2 tier model mapping) |
|
|
|
POSIX type definitions |
|
CPU type and hardware definitions |
|
Processor trace related definitions |
|
Queuing related definitions |
|
|
|
S/390-dependent debugging definitions |
|
Designed to keep compatibility between gdb's & the kernels representation of registers |
|
Code/Data segment definitions (dummy for S/390) |
|
Additional semaphore support definitions |
|
Semaphore routine support definitions |
|
Initial system setup support definitions |
|
Shared memory parameter definitions |
|
Signal context definitions |
|
Signal information definitions |
|
Signal routine support definitions |
|
Signal processor (SIGP) support definitions |
|
SMP routine support definitions |
|
SMP locking routine support definitions |
|
Socket routine support definitions |
|
Socket IOCTL related definitions |
|
|
|
Interrupt routine support definitions |
|
Spin/read/write lock routine support definitions |
|
|
|
String routine support definitions (e.g. memchr()) |
|
System routine support definitions (e.g. cli(), sti()) |
|
Additional termios related definitions |
|
Terminal I/O routine support definitions |
|
Clock cycle related definitions |
|
C types used by Linux/390 |
|
User space memory access support definitions |
|
User context definitions |
|
Unaligned memory access definitions |
|
Standard UNIX definitions |
|
Core file layout definitions |
|
DASD I/O routines |
|
DASD I/O routine support definitions |
|
DASD I/O CCW related processing ([en|de]queuing) |
|
DASD I/O CCW support definitions |
|
ECKD I/O routines |
|
|
|
DASD profiling |
|
DASD type definitions (ECKD, CKD, FBA) |
|
VM minidisk I/O routines |
|
VM minidisk I/O routine support definitions |
|
3215 line-mode console I/O routines |
|
Hardware console I/O routine support definitions |
|
Hardware line-mode console I/O routines |
|
Reading/writing from/to system console via HWC |
|
HWC read/write support definitions |
|
HWC line-mode console driver |
|
EBCDIC/ASCII tables and conversion routines |
|
CTCA network driver |
|
IUCV network driver |
|
IUCV network driver support definitions |
The Linux/390 system calls are implemented via SVC. Each call corresponds to a different SVC.
|
# |
Function |
# |
Function |
|
1 |
|
97 |
|
|
2 |
|
99 |
|
|
3 |
|
100 |
|
|
4 |
|
101 |
|
|
5 |
|
102 |
|
|
6 |
|
103 |
|
|
7 |
|
104 |
|
|
8 |
|
105 |
|
|
9 |
|
106 |
|
|
10 |
|
107 |
|
|
11 |
|
108 |
|
|
12 |
|
109 |
|
|
13 |
|
111 |
|
|
14 |
|
112 |
|
|
15 |
|
114 |
|
|
16 |
|
115 |
|
|
18 |
|
116 |
|
|
19 |
|
117 |
|
|
20 |
|
118 |
|
|
21 |
|
119 |
|
|
22 |
|
120 |
|
|
23 |
|
121 |
|
|
24 |
|
122 |
|
|
25 |
|
124 |
|
|
26 |
|
125 |
|
|
27 |
|
126 |
|
|
28 |
|
127 |
|
|
29 |
|
128 |
|
|
30 |
|
129 |
|
|
33 |
|
130 |
|
|
34 |
|
131 |
|
|
36 |
|
132 |
|
|
37 |
|
133 |
|
|
38 |
|
134 |
|
|
39 |
|
135 |
|
|
40 |
|
136 |
|
|
41 |
|
138 |
|
|
42 |
|
139 |
|
|
43 |
|
140 |
|
|
45 |
|
141 |
|
|
46 |
|
142 |
|
|
47 |
|
143 |
|
|
48 |
|
144 |
|
|
49 |
|
145 |
|
|
50 |
|
146 |
|
|
51 |
|
147 |
|
|
54 |
|
148 |
|
|
55 |
|
149 |
|
|
57 |
|
150 |
|
|
59 |
|
151 |
|
|
60 |
|
152 |
|
|
61 |
|
153 |
|
|
62 |
|
154 |
|
|
63 |
|
155 |
|
|
64 |
|
156 |
|
|
65 |
|
157 |
|
|
66 |
|
158 |
|
|
67 |
|
159 |
|
|
68 |
|
160 |
|
|
69 |
|
161 |
|
|
70 |
|
162 |
|
|
71 |
|
163 |
|
|
72 |
|
164 |
|
|
73 |
|
165 |
|
|
74 |
|
167 |
|
|
75 |
|
168 |
|
|
76 |
|
169 |
|
|
77 |
|
170 |
|
|
78 |
|
171 |
|
|
79 |
|
172 |
|
|
80 |
|
173 |
|
|
81 |
|
174 |
|
|
82 |
|
175 |
|
|
83 |
|
176 |
|
|
84 |
|
177 |
|
|
85 |
|
178 |
|
|
86 |
|
179 |
|
|
87 |
|
180 |
|
|
88 |
|
181 |
|
|
89 |
|
182 |
|
|
90 |
|
183 |
|
|
91 |
|
184 |
|
|
92 |
|
185 |
|
|
93 |
|
186 |
|
|
94 |
|
187 |
|
|
95 |
|
190 |
|
|
96 |
|
255 |
|
Linux/390 uses the following control register settings.

Notes:

These registers are used for linkage-stack and address space operations. The registers are saved and restored for each task but never set.


This register is used for Monitor Calls (MC). The register is saved and restored for each task but never set.
These registers are used for Program Event Recording (PER). The registers saved and restored for each task but never set.
This register is used for tracing. The register is saved and restored for each task but never set.


This register is used for linkage-stack operations. The register is saved and restored for each task but never set.
Control Register 7 (secondary space control) and Control Register 13 (Home space control) are set to the user pgdir. The Kernel is running in its own, disjunct address space, and running in primary address space. A "Copy to/from user" is done via access register mode with access registers (AR2 and AR4) set to 0 or 1. For that purpose we need set up CR 7 with the user pgd.
The following section illustrates the IPL process from the VM reader.
When you first download the kernel image you will need to load it, the boot parameters and the RAMDISK from the VM reader. The RAMDISK contains just enough of a normal filesystem to complete the boot process. It will allow you to mount and configure "real" filesystems which can then take over as the root filesystem.

The initial boot parameters are as follows:

These parameters have the following meaning:

When Linux has a "real" root filesystem and booted, it requires only the kernel code and parameters to reside in the VM reader.

The parameters for this boot are more complex. They describe VM minidisks, CTC devices, the location of the root file system and the DASD to be included.

For a "full-functioned" Linux system IBM recommend a 128MB virtual machine:


As part of the boot process the network is brought online. The following netstat display shows the routing table for the system:

The network definitions responsible for the network activation are found in /etc/sysconfig/network:

And in /etc/sysconfig/network-scripts/ifcfg-ctc0:
