Java Specs -
Data Types
Like the Java programming language, the Java Virtual Machine operates on two kinds of types: primitive types and reference types. There are, correspondingly, two kinds of values that can be stored in variables, passed as arguments, returned by methods, and operated upon: primitive values and reference values.
An object is either a dynamically allocated class instance or an array. A reference to an object is considered to have Java Virtual Machine type reference. Values of type reference can be thought of as pointers to objects.
Reference Types and Values
There are three kinds of reference types: class types, array types, and interface types. Their values are references to dynamically created class instances, arrays, or class instances or arrays that implement interfaces, respectively.
A reference value may also be the special null reference, a reference to no object, which will be denoted here by null. The null reference initially has no run-time type, but may be cast to any type. The default value of a reference type is null.
The Java Virtual Machine specification does not mandate a concrete value encoding null.
Java Virtual Machine Stacks
Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread. A Java Virtual Machine stack stores frames (§2.6). A Java Virtual Machine stack is analogous to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return. Because the Java Virtual Machine stack is never manipulated directly except to push and pop frames, frames may be heap allocated. The memory for a Java Virtual Machine stack does not need to be contiguous.
Heap
The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.
The heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous.
Method Area
The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads. The method area is analogous to the storage area for compiled code of a conventional language or analogous to the "text" segment in an operating system process. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods (§2.9) used in class and instance initialization and interface initialization.
The method area is created on virtual machine start-up. Although the method area is logically part of the heap, simple implementations may choose not to either garbage collect or compact it. This version of the Java Virtual Machine specification does not mandate the location of the method area or the policies used to manage compiled code. The method area may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger method area becomes unnecessary. The memory for the method area does not need to be contiguous.
Program Counter (PC) Register
In general computer architecture terms, program counter (PC) register keeps track of the current instruction executing at any moment. That is like a pointer to the current instruction in sequence of instructions in a program. This is same in Java JVM terms also. We have multithreaded architecture as Java supports multithreading and so, a program counter (PC) Register is created every time a new thread is created. PC keeps a pointer to the current statement that is being executed in its thread. If the current executing method is ‘native’, then the value of program counter register will be undefined.
Run-Time Constant Pool
A run-time constant pool is a per-class or per-interface run-time representation of the constant_pool table in a class file (§4.4). It contains several kinds of constants, ranging from numeric literals known at compile-time to method and field references that must be resolved at run-time. The run-time constant pool serves a function similar to that of a symbol table for a conventional programming language, although it contains a wider range of data than a typical symbol table.
Each run-time constant pool is allocated from the Java Virtual Machine's method area (§2.5.4). The run-time constant pool for a class or interface is constructed when the class or interface is created (§5.3) by the Java Virtual Machine.
Native Method Stacks
An implementation of the Java Virtual Machine may use conventional stacks, colloquially called "C stacks," to support native methods (methods written in a language other than the Java programming language). Native method stacks may also be used by the implementation of an interpreter for the Java Virtual Machine's instruction set in a language such as C. Java Virtual Machine implementations that cannot load native methods and that do not themselves rely on conventional stacks need not supply native method stacks. If supplied, native method stacks are typically allocated per thread when each thread is created.
Frames
A frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions.
A new frame is created each time a method is invoked. A frame is destroyed when its method invocation completes, whether that completion is normal or abrupt (it throws an uncaught exception). Frames are allocated from the Java Virtual Machine stack (§2.5.2) of the thread creating the frame. Each frame has its own array of local variables (§2.6.1), its own operand stack (§2.6.2), and a reference to the run-time constant pool (§2.5.5) of the class of the current method.
Note that a frame created by a thread is local to that thread and cannot be referenced by any other thread.
Local Variables
Each frame (§2.6) contains an array of variables known as its local variables. The length of the local variable array of a frame is determined at compile-time and supplied in the binary representation of a class or interface along with the code for the method associated with the frame (§4.7.3).
A single local variable can hold a value of type boolean, byte, char, short, int, float, reference, or returnAddress. A pair of local variables can hold a value of type long or double.
Dynamic Linking
Each frame (§2.6) contains a reference to the run-time constant pool (§2.5.5) for the type of the current method to support dynamic linking of the method code. The class file code for a method refers to methods to be invoked and variables to be accessed via symbolic references. Dynamic linking translates these symbolic method references into concrete method references, loading classes as necessary to resolve as-yet-undefined symbols, and translates variable accesses into appropriate offsets in storage structures associated with the run-time location of these variables.
This late binding of the methods and variables makes changes in other classes that a method uses less likely to break this code.
Java JVM Run-time Data Areas
Understanding Java Virtual Machine (JVM) run-time data areas are critical to better Java programming. One of the most dreaded errors in Java is OutOfMemoryError and it is related to Java Virtual Machine (JVM) memory areas. We should have better understanding of JVM internals, how its data area works so that we will have better grip over these kind of JVM errors. In this article, we shall learn about types of run-time memory areas in JVM and how they work.
Types of Java JVM Run-time Memory Areas,
- Program Counter (PC) Register
- Java Virtual Machine Stacks
- Heap Memory
- Method Area
- Run-time Constant Pool
- Native Method Stacks
Symbol Tables and Static Checks
Introduction
The parser ensures that the input program is syntactically correct, but there are other kinds of correctness that it cannot (or usually does not) enforce. For example:
- A variable should not be declared more than once in the same scope.
- A variable should not be used before being declared.
- The type of the left-hand side of an assignment should match the type of the right-hand side.
- Methods should be called with the right number and types of arguments.
The next phase of the compiler after the parser, sometimes called the static semantic analyzer is in charge of checking for these kinds of errors. The checks can be done in two phases, each of which involves traversing the abstract-syntax tree created by the parser:
- For each scope in the program: Process the declarations, adding new entries to the symbol table and reporting any variables that are multiply declared; process the statements, finding uses of undeclared variables, and updating the "ID" nodes of the abstract-syntax tree to point to the appropriate symbol-table entry.
- Process all of the statements in the program again, using the symbol-table information to determine the type of each expression, and finding type errors.
Below, we will consider how to build symbol tables and how to use them to find multiply-declared and undeclared variables. We will then consider type checking.
Symbol Tables
The purpose of the symbol table is to keep track of names declared in the program. This includes names of classes, fields, methods, and variables. Each symbol table entry associates a set of attributes with one name; for example:
- which kind of name it is
- what is its type
- what is its nesting level
- where will it be found at runtime.
One factor that will influence the design of the symbol table is what scoping rules are defined for the language being compiled. Let's consider some different kinds of scoping rules before continuing our discussion of symbol tables.
Scoping
A language's scoping rules tell you when you're allowed to reuse a name, and how to match uses of a name to the corresponding declaration. In most languages, the same name can be declared multiple times under certain circumstances. In Java you can use the same name in more than one declaration if the declarations involve different kinds of names. For example, you can use the same name for a class, a field of the class, a method of the class, and a local variable of the method (this is not recommended, but it is legal):
- class Test {
- int Test;
- void Test( ) {
- double Test; // could also be declared int
- }
- }
In Java (and C++), you can also use the same name for more than one method as long as the number and/or types of parameters are unique (this is called overloading).
In C and C++, but not in Java, you can declare variables with the same name in different blocks. A block is a piece of code inside curly braces; for example, in an if or a loop. The following is a legal C or C++ function, but not a legal Java method:
- void f(int k) {
- int x = 0; /* x is declared here */
- while (...) {
- int x = 1; /* another x is declared here */
- ...
- if (...) {
- float x = 5.5; /* and yet another x is declared here! */
- ...
- }
- }
- }
As mentioned above, the scopinge rules of a language determine which declaration of a named object corresponds to each use. C, C++, and Java use what is called static scoping; that means that the correspondence between uses and declarations is made at compile time. C and C++ use the "most closely nested" rule to match nested declarations to their uses: a use of variable x matches the declaration in the most closely enclosing scope such that the declaration precedes the use. In C and C++, there is one, outermost scope that includes the names of the global variables (the variables that are declared outside the functions) and the names of the functions that are not part of any class. Each function has one or more scopes. Both C and C++ have one scope for the parameters and the "top-level" declarations, plus one for each block in the function (delimited by curly braces). In addition, C++ has a scope for each for loop: in C++ (but not in C) you can declare variables in the for-loop header.
In the example given above, the outermost scope includes just the name "f", and function f itself has three scopes:
- The outer scope for f includes parameter k, and the variable x that is initialized to 0.
- The next scope is for the body of the while loop, and includes the variable x that is initialized to 1.
- The innermost scope is for the body of the if, and includes the variable x that is initialized to 5.5.
So a use of variable x inside the loop but not inside the if matches the declaration in the loop (has the value 1), while a use of x outside the loop (either before or after the loop) matches the declaration at the beginning of the function (has the value 0).
Comments
Post a Comment