Version 76 (modified by 6 months ago) ( diff ) | ,
---|
Java to Python (j2p)
This tool can translate java programs into python programs. This is a complex mechanism and still experimental/in development. This documentation is very incomplete. This code is not yet published on the artifactory.
Some features:
- "core translator" for translating built-in java functionality and libraries.
- Handles java method overloading using the @dispatch annotation from plum.
- Minimalistic code generation: if your java code does not use external libraries, no additional libraries are needed to run generated python code, except maybe the plum library if you use java method overloading.
- both single-file as multi-file projects
- translates calls to external libraries
- "translator plugins": a translator plugin is a plugin on the java side that handles translating calls to external java libraries such as jackson, junit etc. You select the required translator plugins as needed.
- There may be different translator plugins available for the same java library, translating to a different python library, depending on your needs.
- can generate pip-installable zip file that automatically (pip) installs all required dependencies (external python libraries) depending on your chosen library translators.
- PyRunner supports running python zip files from java.
The repo contains the translator in the module named "core". The other modules, ending with "-t", are translator plug-ins described below.
Internal mechanism
For normal use, you do not need to know exactly how the translation is done. But the way the translator works becomes highly relevant if you want to inject your own python translations into the code (see the following section).
Translation is done on a per-java-file basis.
There are two main translation components in the core:
- The translator that parses java file with Javaparser and creates equivalent python code.
- per-java-class translators that know how to translate all calls to any java class method or field into equivalent python code. This code is using java-side code introspection to determine argument types, and therefore requires all referred java classes are actually available in compiled form, either through libraries or from the java compiler.
(1) is generic and used for all java programs. It is currently pretty complete but details will be filled in over time as the need arises to support more java syntax.
(2) is currently very partial. The reason is that there are a huge number of java classes and even more 3rd party libraries, and almost every field and function in it will need a specialized translator. This will grow slowly over time as needed.
The mechanism is flexible in its mapping from java classes to python classes. The general approach is to assume a fixed mapping from java classes to python classes, as in the table below. Also it is assumed that the equivalent classes have roughly the same functionality, even if the actual function names and arguments may differ. This fixed mapping simplifies translation.
But the translator can intercept special classes and make dedicated translations. For instance if a class C is implementing Iterator
and it is of class C, translating it.next()
gives next(it)
, and the definition void iterator()
translates to __iter__(self)
. These are all handled in the translator classes, in this example in tudelft.utilities.j2p.t.java.util.Iterator
.
constructs
The private
keyword is reflected by the python convention of prefixing private fields function names and method names with __
.
The static
keyword results in fields being set in the class definition object.
Fields in Java have to be initialized in the init function in the python translation.
Overloaded methods can not be handled by default python and need to be handled using an external library plum-dispatch==2.2.2
and result in additional @dispatch
annotations in the translated code. If you use overloading, you need to have plum installed to run your code. Check also the #@NonNull section.
built-in "primitives"
java | python | remarks |
---|---|---|
String | str | |
int | int | |
float,Float,double,Double | float | in Java, float and double overflow at different places. Translation may fail if code depends on overflow behaviour |
BigDecimal | Decimal | |
Map, HashMap | dict | |
Set | set |
Classes
Classes are by default translated using the "Stub" translator which assumes assumes the class names and modules are identical in Python and Java, and that all functions have the same functions with the same arguments and types.
References to classes in the code (SomeClass.class in java) translates to SomeClass in Python. It is assumed the proper imports are done on the python side to enable this.
This default translation can be overridden using custom translators.
Translated classes
The number of translated classes is too big to even list here. Please check the source code
If function calls in python deliver a wrong object (not matching the above mapping), the translator has to inject additional code to convert it to the proper object. For example, Map.getKeys()
in python would be set(map.keys())
where the extra set
converts the dict_keys into a proper set.
Stream
The Stream
class is currently not translated. It might be possible to translate it automatically but it looks pretty complex because the python way to deal with streams is list comprehension and the translation is quite far from straightforward.
Inner classes
Inner classes (classes inside classes) are not supported. This includes anonymous inner classes. You can not import, call or use inner classes. Inner classes can not be translated. This is for many reasons but the brief summary:
- The useful variant of inner class is the non-static inner class. This class can access fields in the enclosing class.
- Python does not support accessing fields of the enclosing class unless some tricks are applied. But these tricks would break uniformity of constructors.
- Many deep technical reasons #175 that largely complicate dealing with these.
We recommend now:
- manually override translation when you really need to call an inner class
- use package-private classes to replace your own inner classes.
- For the non-static inner class: pass the enclosing class as argument to the child class constructor. You may have to use more tricks to get around cyclic dependencies.
Exceptions
There are a very few exceptions. If some particular inner classes, particularly constants, are used in very specific translators, these translators may opt to attempt to recognise these and process them separately. One example is the JsonDeserializer (jackson-t package) that recognises use of "com.fasterxml.jackson.databind.JsonDeserializer.None". Check the documentation of the translators for details.
Auto Boxing
Autoboxing is the automatic conversion that the Java compiler makes between the primitive types and their corresponding object wrapper classes. For example, converting an int to an Integer, a double to a Double, and so on. If the conversion goes the other way, this is called unboxing.
The translator currently does not support auto boxing. That means you will get an error on the python side if you for instance do this on the java side
int n=1; return "number "+ n;
You will have to call an explicit converter, typically toString(), on the arguments that need to be converted into the proper argument type.
Another example is when varargs are involved:
public sum(int... values) { return sumlist(values); } public sumList(int[] values) { return ....; }
The call sumlist(values) uses auto boxing, but here the translation is from int...
to int[]
. A similar situation is when you do a for loop over varargs, like
public sum(int... values) { for (v: values) { .... } }
All these complications around varargs, including more eg around dispatching vararg-typed methods led us to currently not support varargs.
@NonNull
Java variables like String val
can contain a null value, and functions like String f() { ... }
can return null. Therefore they are translated to val:Optional[str]
and def f(self)->Optional[str]
.
You can annotate the java code with @NonNull
(from org.eclipse.jdt.annotation.NonNull
) to indicate the value/return value will not be null, like this
@NonNull String val
or @NonNull String f() { ... }
. Java primitive types like boolean
and int
can never be null and do not need @NonNull
The dependency needed for this is (unfortunately this annotation is not built in anymore in the JRE)
<dependency> <groupId>org.eclipse.jdt</groupId> <artifactId>org.eclipse.jdt.annotation</artifactId> <version>2.3.0</version> </dependency>
Java code X instanceof C
is translated as in Java, so null/None is not an instance of C - C is not "Optional" and you do not need to write X instanceof @NonNull C
.
Also be aware of the subtleties of this notation. For instance
@NonNull Set<String>
is a set that can not be null but that contain null values.
@NonNull Set<@Nonnull String>
is a set that can not be null and also can not contain null values.
|
|
More notes:
- some classes like
Arrays
orOverride
do not have a direct python equivalent. calls to static functions in these classes can still be translated. - Different java classes, eg
Map
AbstractMap
andHashMap
, may translate to the same python class (map
). This can be done because translation is only 1 way.
Usage
For usage you have the choice between translating a single file or an entire package. We generally recommend translating the entire package.
Single File
If you want to translate a single java file or even just a string containing your program, that does not need any specialized libraries, you can do the approach as many tests in core do. Check the examples. The heart of the code will look like this, where most boilerplate is about setting up the javaparser:
ParserConfiguration conf = new ParserConfiguration(); CombinedTypeSolver typeSolver = new CombinedTypeSolver(); typeSolver.add(new ReflectionTypeSolver(false)); JavaSymbolSolver symbolSolver = new JavaSymbolSolver(typeSolver); conf.setSymbolResolver(symbolSolver); JavaParser parser = new JavaParser(conf); ParseResult<CompilationUnit> res = parser.parse(new File(javaFile)); Block translation = Translator.translate(res.getResult().get());
After you get the translation, you can print it or put it in a .py
file.
Entire Package
If you want to translate an entire package, first create a separate source directory containing all the java code that you want to translate. Also make sure that your code is compiled (eg, using the build-helper-maven-plugin , check the pom files). Then you just do
PyProgram program = PyProgram.fromDirectory(Paths.get("src/test/myprogram"));
This gives you a fully translated program that you can print or get a zip file from. To get a zip file, call
program.getZip()
The zip file is ready for a pip install (eg in your own virtual environment) or running from java through our PythonVenv.
Overriding the translation
Comments can contain python code to override the automatic code, if the block starts with #PY
. This code replaces the entire object (if/case/while block; statement) that follows the comment.
Place the #PY block directly before the object you want to override. For example if you place a #PY block before an annotation, you override only the annotation, not the actual method/class after the annotation. This unfortunately will look a bit non-java but it seems the best way to accurately target the override.
Python has strict requirements regarding indentation. To make this possible, we need to be strict about indentation as well. In a single line python comment, the code must look exactly like
//#PY single_python_line
Note the single whitespace after the #PY. Your code starts after this single whitespace.
In a multi line comment the code must look exactly like
/*#PY * codeline1 * codeline2 * ... */
Your code lines each start with "* ", note the whitespace after the star. You are free to indent before the "*".
Your code is automatically indented to the level needed at the insertion place in the code.
Code must be placed in either a standard block comment or a single line
comment. Starting a javadoc with #PY
is not
allowed. This is to encourage proper use of javadoc.
A comment block overrides also annotations.
If the code block contains no code at all, it is translated as
pass
, to ensure that the code is a proper statement.
Translator plugins
J2P has a modular translation mechanism. A translator plugin can be plugged in as needed, to add support for translating external libraries that are used in the java code. Also this allows to customize the translation process, for instance to use another python library for the translation.
A number of translation modules are already available
module | what it translates | details | limitations |
---|---|---|---|
jackson-t | jackson serialization annotations | translates jackson to pyson annotations that look very similar to jackson | Covers what we need for translation of GeniusWeb |
junit-t | junit calls assertTrue,assertEquals | Very limited, no support yet for junit annotations | |
tudutils-t | translates calls to utilities package | limited, currently mainly to support parts of immutablelist |
The translators are all in standardized directories. If there is a translator for class a.b.C
then this must be in the class tudelft.utilities.j2p.t.a.b.C
. This makes it easy to find the translator class, makes everything pluggable through the maven dependency plugins, and requires minimum extra naming conventions.
The translators generally follow the same inheritance hierarchy as the original classes. So if a.b.P
is the parent class of a.b.C
, then there usually is a glass tudelft.utilities.j2p.t.a.b.P
that is the parent translator of tudelft.utilities.j2p.t.a.b.C
. The latter forwards shared translation matters to the parent class as much as possible. A simple example can be seen in tudelft.utilities.j2p.t.java.lang.ArithmeticException
, which forwards almost everything to the RuntimeException translator which in turn forwards everything to Exception etc.
Translators should all be public classes, even if the real class is abstract. This is because java will handle calls to abstract classes and interfaces and these will be translated with the translator for that abstract class. For example, if we have
List<String> l=...some expression or function call... l.get(0)
then l is of type List (the Interface) and l.get is a function in the interface which needs to be translated. Note that in general any type of List can end up in l at runtime, the compiler has to compile it such that the compiled code will work regardless (thus using only functions available in python's equivalent of the List class).
Resulting file
FAQs
Question | Explanation |
---|---|
I'm getting "No translator found for X". But X is a class that I'm trying to translate | The java files you are trying to translate are probably not compiled by the java compiler. When the translator finds a method call, it needs the compiled java to determine the proper signature (function name and arguments). You can use the build-helper-maven-plugin to add your additional sources to the standard maven build path. |