Version 175 (modified by 9 days ago) ( diff ) | ,
---|
Java to Python (j2p)
This tool can translate java programs into python programs. This is a complex mechanism and still experimental/in development. This documentation is not complete yet. This code is not yet published on the artifactory.
Some features:
- "core translator" for translating java and the built-in libraries.
- Handles java method overloading using the @dispatch annotation from plum.
- Minimalistic code generation: only those external libraries are needed that are actually needed for the specific translated code. Also we attempt to make minimal use of external libraries.
- both single-file as multi-file projects
- translates calls to external java libraries
- translator plugins: plugins on the java side that handle translating calls to external java libraries such as jackson, junit etc. You select the required translator plugins as needed.
- There may be different translator plugins available for the same java library, translating to a different python library, depending on your needs.
- can generate pip-installable zip file that automatically (pip) installs all required dependencies (external python libraries) depending on your chosen library translators.
- PyRunner supports running python zip files from java.
The repo contains the translator in the module named "core". The other modules, ending with "-t", are translator plug-ins described below.
- Tested compatibility with python 3.8 3.9 3.10 and 3.11.
Core Translation
For normal use, you do not need to know exactly how the translation is done. But the way the translator works becomes highly relevant if you want to inject your own python translations into the code (see the following section).
Translation is done on a per-java-file basis.
There are two main translation components in the core:
- The translator that parses java file with Javaparser and creates equivalent python code.
- per-java-class translators that know how to translate all calls to any java class method or field into equivalent python code. This code is using java-side code introspection to determine argument types, and therefore requires all referred java classes are actually available in compiled form, either through libraries or from the java compiler.
(1) is generic and used for all java programs. It is currently pretty complete but details will be filled in over time as the need arises to support more java syntax.
(2) is currently very partial. The reason is that there are a huge number of java classes and even more 3rd party libraries, and almost every field and function in it will need a specialized translator. This will grow slowly over time as needed.
The mechanism is flexible in its mapping from java classes to python classes. The general approach is to assume a fixed mapping from java classes to python classes, as in the table below. Also it is assumed that the equivalent classes have roughly the same functionality, even if the actual function names and arguments may differ. This fixed mapping simplifies translation.
But the translator can intercept special classes and make dedicated translations. For instance if a class C is implementing Iterator
and it is of class C, translating it.next()
gives next(it)
, and the definition void iterator()
translates to __iter__(self)
. These are all handled in the translator classes, in this example in tudelft.utilities.j2p.t.java.util.Iterator
.
Comments
The exact location of comments is very important as it determines to which java element the comment is attached to. And where it is attached to determines how it is translated, and which element it modifies if you use #PY annotations to override the translation (see below).
Generally, a comment attaches to the element that comes next to it.
element after comment | placement of comment in translation |
@Annotation | before the annotation |
method declaration, method modifier | first item in the method |
class declaration | before the class |
Therefore, we recommend to place method @Annotations before the javadoc, like this
@NonNull @Override /** @return a string representation of the object ....*/ public String toString() { ... }
If you place the comment before the @Annotation, the comment will end up before the method, instead of inside it as it custom in python.
This allows you to override just the annotation of an element, like this:
//#PY #annotation not needed in python @JavaSpecificAnnotation private final String values...
constructs
The private
keyword is reflected by the python convention of prefixing private fields function names and method names with __
.
The static
keyword results in fields being set in the class definition object.
Fields in Java have to be initialized in the __init__
function in the python translation.
Overloaded methods
Overloaded methods can not be handled by default python. To support it, an external library plum-dispatch==2.2.2
is used. If you use overloading, you need to have plum installed to run your code. Check also the #@NonNull section.
When a method is overloaded, the translator adds @dispatch
annotations in the translated code as required by plum.
If you have two methods overloading each other, and both methods end up with the same signature in python, then plum will NOT warn you and just use the last definition only. Therefore your translated code will not work as expected. An example is if your code has both |
built-in "primitives"
java | python | remarks |
---|---|---|
String | str | |
int | int | |
float,Float,double,Double | float | in Java, float and double overflow at different places. Translation may fail if code depends on overflow behaviour |
BigDecimal | Decimal | |
Map, HashMap | dict | |
Set | set |
Classes
Classes are by default translated using the "Stub" translator which assumes assumes the class names and modules are identical in Python and Java, and that all functions have the same functions with the same arguments and types.
References to classes in the code (SomeClass.class in java) translates to SomeClass in Python. It is assumed the proper imports are done on the python side to enable this.
This default translation can be overridden using custom translators.
Custom translators
The number of custom translated classes is too big to even list here. Many built in java classes are already supported. Please check the source code
If function calls in python deliver a wrong object (not matching the above mapping), the translator has to inject additional code to convert it to the proper object. For example, Map.getKeys()
in python would be set(map.keys())
where the extra set
converts the dict_keys into a proper set.
Stream
The Stream
class is currently not translated. It might be possible to translate it automatically but it looks pretty complex because the python way to deal with streams is list comprehension and the translation is quite far from straightforward.
Inner classes
Inner classes (classes inside classes) have very limited support.
When an inner class translator is needed, eg p.A$B where B is an inner class of A, the translator searches for a translator for class p.a/B where p/a is used as a *directory* and is the lower case version of A. If it is found, then that translator B is used.
Note that that translated class will be in package p.a, rather than inside class p.A. Therefore this mechanism, has limited use in practice.
Otherwise, no support is available. So if you try to translate a class that has an inner class, the translator will fail. This includes anonymous inner classes.
This is for many reasons but the brief summary:
- The useful variant of inner class is the non-static inner class. This class can access fields in the enclosing class.
- Python does not support accessing fields of the enclosing class unless some tricks are applied. But these tricks would break uniformity of constructors.
- Many deep technical reasons #175 that largely complicate dealing with these.
We recommend now:
- manually override translation when you really need to call an inner class
- use package-private classes to replace your own inner classes.
- For the non-static inner class: pass the enclosing class as argument to the child class constructor. You may have to use more tricks to get around cyclic dependencies.
Example workaround
If the private class is a functional (eg Runnable, Predicate or Function), then a workaround method is to define an extraMethod in your class with the same signature (in/output) as the functional. Eg if you need a Function<String, Integer>, then define private Integer extraMethod(String) {...}
, like this:
class Parent { ... Function<String, Integer> f = new Function<String, Integer>() { @Override public Integer apply(String t) { return 1; }; ... use(f) }
you can now replace this with
class Parent { ... private Integer extraMethod(String t) { return 1 } ... use(this::extraMethod) }
If the class you want to override is not a nice Functional, then it becomes more tricky. We recommend to create a separate class then Java has the arrow notation (x,y) -> methodcall(x,y) to write lambda expressions. This notation can also be translated. This feature is especially useful in the parent class to make references to methods the inner class needs. For example, suppose you have a class like this
class Parent { public void f() {...} public String g(int x) {...} public X h() { .... return new X{ f() g(1) } }
The "new X" call has to be converted into an explicit class, but somehow it needs to be able to call f and g in the Parent class. We suggest to do it like this:
class MyX extends X { public MyX(Runnable f, Function<Integer,String> g) { f.run(); g.apply(1); } } class Parent { public void f() {...} public String g(int x) {...} public X h() { .... return new MyX{()->f(),(x)->g(x)); }
Note that the notation |
If the lambda notation gives a UnsupportedOperationException from the javaparser, you can try adding explicit types to the input arguments of the lambda, eg |
Special cases
If some particular inner classes, particularly constants, are used in very specific translators, these translators may opt to attempt to recognise these and process them separately (thus avoiding calling the translator for them). One example is the JsonDeserializer (jackson-t package) that recognises use of "com.fasterxml.jackson.databind.JsonDeserializer.None".
Enum
Enum classes are translated to normal Python classes. For instance if you have a class MyEnum and it has items P and Q, we assume on the python side that a class MyEnum is there with static public fields P and Q. As usual the translator initializes P and Q in the _static_init}} method. This approach matches with the traditional way python supported enumerations, for example {{{math.pi
directly delivers the value of pi, not some placeholder.
This approach ignores the more recent Python Enum class. This new approach is incompatible with the traditional approach, it gives an extra layer of indirection, it does not allow the custom constructors that we need, it is very complex, and all this complicates our translation job.
Auto Boxing
Autoboxing is the automatic conversion that the Java compiler makes between the primitive types and their corresponding object wrapper classes. For example, converting an int to an Integer, a double to a Double, and so on. If the conversion goes the other way, this is called unboxing.
boolean b=false; List<Boolean> l=new ArrayList<>(); l.add(b);
Here in line 3 the b is auto-boxed from boolean to Boolean. In many cases, this is no problem, because python has no equivalent objects for primitive types, so there are no 2 variants of boolean, float, str. So both int and Integer are translated to python int anyway.
But keep reading, the automatic casting is closely related and is more problematic.
Casting
Translation of explicit casting has resonable support. Eg if you write
long x=(long)3;
then the translator will convert the int to long.
Java does automatic casting to silently convert between bytes, int, long etc. These can involve narrowing, widening, and potentially lead to conversions or loss of precision etc. There is partial support for these. The translator recognises when such conversion is done and tries to insert extra code to implement the expected narrowing as needed. But the specification has lots of special cases and the implementation is partial.
The translator does recognise differences between left and right side in assignments and tries to cast properly. Eg,
float x=3;
it will recognise the right side of the assignment is an int, and convert it to float before assigning into the x. Note that python itself will not give any warning on type errors, this would be fine in python:
x:float=3
but it would be wrong as x now will contain an int (regardless of the type hint).
A more tricky case is this
Float f = Float.valueOf(3);
The call argument, an integer, is in java automatically cast to a float, because that's the actual type required by valueOf. The translator currently does not recognise this case and this call would pass an integer into the call.
Another example:
int n=1; return "number "+ n;
Java will automatically (1) convert n to Integer (2) call toString. Currently, in this special case (+ operator) the translator does not support either, so you will have to do both yourself. eg write in java
return "number "+ ((Integer)n).toString();
Another example is when varargs are involved:
public sum(int... values) { return sumlist(values); } public sumList(int[] values) { return ....; }
The call sumlist(values) uses auto boxing, but here the translation is from int...
to int[]
. A similar situation is when you do a for loop over varargs, like
public sum(int... values) { for (v: values) { .... } }
All these complications around varargs, including more eg around dispatching vararg-typed methods led us to currently not support varargs.
@NonNull
Java variables like String val
can contain a null value, and functions like String f() { ... }
can return null. Therefore they are translated to val:Optional[str]
and def f(self)->Optional[str]
.
You can annotate the java code with @NonNull
(from org.eclipse.jdt.annotation.NonNull
) to indicate the value/return value will not be null, like this
@NonNull String val
or @NonNull String f() { ... }
. Java primitive types like boolean
and int
can never be null and do not need @NonNull
The dependency needed for this is
<dependency> <groupId>org.eclipse.jdt</groupId> <artifactId>org.eclipse.jdt.annotation</artifactId> <version>2.3.0</version> </dependency>
Unfortunately this annotation is not built in anymore in the JRE. But this dependency is very lightweight (the jar is 11kb) and is completely separate from the Eclipse IDE platform.
Java code X instanceof C
is translated as in Java, so null/None is not an instance of C - C is not "Optional" and you do not need to write X instanceof @NonNull C
.
Also be aware of the subtleties of this notation. For instance
@NonNull Set<String>
is a set that can not be null but that contain null values.
@NonNull Set<@Nonnull String>
is a set that can not be null and also can not contain null values.
|
|
@Defer
@Defer
is a qualifier for Type specifications. The type that has this annotation is suggested to be imported later. The translator will double quote such annotated types and suppress import of that class.
For example, consider the cyclicref testcode in the core:
public class P { private @Defer Q q = null; public void join(@Defer Q q) { this.q = q; } }
The class Q that is referred from P refers back to Q. Without the @Defer annotations, class Q would have to be imported, resulting in a cyclic import. With the annotation however, the P class is translated as
class P: def join(self,q:"Optional[Q]") -> None: self.__q=q def __init__(self): self.__q:"Optional[Q]" = None
This avoids the cyclic import, at the expense of some loss in python typing. Of course the type checking is still done at the java side.
Reserved keywords
Python has a number of reserved keywords:
and except lambda with as finally nonlocal while assert false None yield break for not class from or continue global pass def if raise del import return elif in True else is try
If you have these in your java program, as method name or variable name, then j2p will change the name by (de)capitalizing the first letter of the name. Do not use both the lower and upper case variant at the same time, because the translated names will then collide.
More notes:
- some classes like
Arrays
orOverride
do not have a direct python equivalent. calls to static functions in these classes can still be translated. - Different java classes, eg
Map
AbstractMap
andHashMap
, may translate to the same python class (map
). This can be done because translation is only 1 way. - Other classes, like PriorityQueue, have no equivalent python class, and use a special class to help the translation
- Some methods, return a 'second class citizen' which leads to compatibility issues later in the code. We may replace some of these calls with alternatives. For example, in python dict.keys() returns an object that somewhat resembles a set" but can't be used as such. In this case we use
Keys(dict)
from the utilitiespy package as a replacement. - The imported libraries in python have a different name than those in java. If the python library name equals the name of some variable in your code, this will cause a runtime error, typically something like
local variable 'json' referenced before assignment
. We can not (yet?) detect these issues at compile time. To solve it, change the variable name in java.
Usage
For usage you have the choice between translating a single file or an entire package. We generally recommend translating the entire package.
Single File
If you want to translate a single java file or even just a string containing your program, that does not need any specialized libraries, you can do the approach as many tests in core do. Check the examples. The heart of the code will look like this, where most boilerplate is about setting up the javaparser:
ParserConfiguration conf = new ParserConfiguration(); CombinedTypeSolver typeSolver = new CombinedTypeSolver(); typeSolver.add(new ReflectionTypeSolver(false)); JavaSymbolSolver symbolSolver = new JavaSymbolSolver(typeSolver); conf.setSymbolResolver(symbolSolver); JavaParser parser = new JavaParser(conf); ParseResult<CompilationUnit> res = parser.parse(new File(javaFile)); Block translation = Translator.translate(res.getResult().get());
After you get the translation, you can print it or put it in a .py
file.
Entire Package
If you want to translate an entire package, first create a separate source directory containing all the java code that you want to translate. Also make sure that your code is compiled (eg, using the build-helper-maven-plugin , check the pom files). Then you just do
PyProgram program = PyProgram.fromDirectories(Arrays.asList(Paths.get("src/test/myprogram")));
This gives you a fully translated program that you can print or get a zip file from. To get a zip file, call
program.getZip()
The zip file is ready for a pip install (eg in your own virtual environment) or running from java through our PythonVenv.
Resources
If you use the PyProgram translator, all non-java files are copied verbatim to the python program, to the same directory. Just make sure that if you have a separate resources directory, that you add that aswell to the list of root directories.
THe resource can be accessed like usual in java:
final InputStream stream = getClass() .getResourceAsStream("yourresource.txt"); ...stream.read()...
Overriding the translation
Comments can contain python code to override the automatic code, if the block starts with #PY
. This code replaces the entire object (if/case/while block; statement) that follows the comment.
Place the #PY block directly before the object you want to override. So you can override also single @Annotations
with this.
If you place a #PY block before an annotation, you override only the annotation. So put the override after the annotations if you want to override the method/field/class. |
If you want to insert a statement in python, not overriding an existing java statement, then insert an empty statement ;
in java, and annotate it with #PY as usual.
Do not add a line-comment AFTER the empty statement - it will override the earlier comment |
Python has strict requirements regarding indentation. To make this possible, we need to be strict about indentation as well. In a single line python comment, the code must look exactly like
//#PY single_python_line
Note the single whitespace after the #PY. Your code starts after this single whitespace.
In a multi line comment the code must look exactly like
/*#PY * codeline1 * codeline2 * ... */
Your code lines each start with "* ", note the whitespace after the star. You are free to indent before the "*".
Your code is automatically indented to the level needed at the insertion place in the code.
Code must be placed in either a standard block comment or a single line
comment. Starting a javadoc with #PY
is not
allowed. This is to encourage proper use of javadoc.
A comment block overrides also annotations.
If the code block contains no code at all, it is translated as
pass
, to ensure that the code is a proper statement.
Any line containing an import (from ... import ...
or import ...
)
without leading whitespace is interpreted as a global import. If you place an INDENT before the import, it's not a global import anymore and kept in-line with the code.
You can also annotate import statements with any python code. Note that in the translated python code, these snips will be placed after the imports.
Avoid using manual translation where possible. Manual translations are more prone to errors. They are also not refactored if you refactor the java code. |
The translator filters out duplicate imports only if there is one import done per line. So avoid multi imports like |
#PY always goes before a java object. If you really have to add #PY after your last line of java code, you will have to add a dummy class at the end, and annotate that. |
Auto formatting (eg when using "save actions" in Eclipse) may corrupt the format of #PY blocks. We recommend turning off auto comment formatting. |
Translator plugins
J2P has a modular translation mechanism. A translator plugin can be plugged in as needed, to add support for translating external libraries that are used in the java code. Also this allows to customize the translation process, for instance to use another python library for the translation.
A number of translation modules are already available
module | what it translates | details | limitations |
---|---|---|---|
jackson-t | jackson serialization annotations | translates jackson to pyson annotations | Covers what we need for translation of GeniusWeb |
junit-t | junit calls assertTrue,assertEquals | Very limited | |
tudutils-t | translates calls to utilities package | limited, currently mainly to support parts of immutablelist | |
mockito-t | mockito translations | very limited |
The translators are all in standardized directories. If there is a translator for class a.b.C
then this must be in the class tudelft.utilities.j2p.t.a.b.C
. This makes it easy to find the translator class, makes everything pluggable through the maven dependency plugins, and requires minimum extra naming conventions.
The translators generally follow the same inheritance hierarchy as the original classes. So if a.b.P
is the parent class of a.b.C
, then there usually is a glass tudelft.utilities.j2p.t.a.b.P
that is the parent translator of tudelft.utilities.j2p.t.a.b.C
. The latter forwards shared translation matters to the parent class as much as possible. A simple example can be seen in tudelft.utilities.j2p.t.java.lang.ArithmeticException
, which forwards almost everything to the RuntimeException translator which in turn forwards everything to Exception etc.
Translators should all be public classes, even if the real class is abstract. This is because java will handle calls to abstract classes and interfaces and these will be translated with the translator for that abstract class. For example, if we have
List<String> l=...some expression or function call... l.get(0)
then l is of type List (the Interface) and l.get is a function in the interface which needs to be translated. Note that in general any type of List can end up in l at runtime, the compiler has to compile it such that the compiled code will work regardless (thus using only functions available in python's equivalent of the List class).
Unit Tests, @Test
The junit-t plugin helps translation of org.junit components.
All tests must be annotates with @RunWith
if this plugin is used. Use the following, depending on what kind of test you want to run:
test type in java | use java annotation |
---|---|
normal @Test methods | @RunWith(JUnit4ClassRunner.class)
|
parametric test | @RunWith(Parameterized.class)
|
The junit-t translator depends on, and uses unitpy. It also adds some extra code to each test class, so that the standard python test discovery works as usual.
GeniusWebTranslator
The GeniusWebTranslator module is at this moment a demo project. It shows that the translator can automatically translate a big part of GeniusWeb. The GeniusWeb code has been modified in a number of ways
- at a few places, a manual translation has been added. Most often this involves a single line of code, particularly because the translator does not support java stream.
- A lot of @NonNull annotations have been added to make the state of Java variables more clear, also cleaning up the resulting python code.
- various small fixes, to work around a few restrictions of the translator.
FAQs
Question | Explanation |
---|---|
I'm getting No translator found for X . But X is a class that I'm trying to translate and provided source code for | The java files you are trying to translate are probably not compiled by the java compiler. When the translator finds a method call, it needs the compiled java to determine the proper signature (function name and arguments). You can use the build-helper-maven-plugin to add your additional sources to the standard maven build path. |
I get an error UnboundLocalError: local variable ... referenced before assignment (where ... is a variable name) | Usually this happens if the variable has a name that collides with a method name or library in python, eg "list". Rename the variable so that it is not colliding |
I get "Python can not handle overloaded methods that use type vars" | This happens if you overloaded a method (you have two or more methods with the same method name) and one of them uses a type var (you have <T> and use T in the method declaration). Python does not support overloading. none of the existing libraries to add this functionality can handle type variables. |
It seems the python code uses the wrong values in my lambda expression | There is a subtle but essential scoping difference between java and python. In python, a local variable continues living even if it goes outside of scope and can be changed. The lambda will use the latest value assigned to the variable. To copy the current value, consider creating the lambda in a dedicated method |
I get UnsupportedOperationException ... ResolvedWildcard | This happens sometimes in complex method calls with complex parameters. Usually this happens in a stream() context. We suggest using manual translation in such case |
I get No translator found for java.util.stream.Stream | Stream translation is quite complex and currently not supported. Either use a loop in java, or manually translate using #PY |
I get Inner classes can not be translated | Check the section on inner classes |
I get something like from com.X.Y.Z import Z: ModuleNotFoundError: No module named 'com' . com.X.Y.Z is a java package that does not exist in python. | You did not provide a library or code translate com.X.Y.Z and the translator now uses a "stub" translator that assumes the package on the python side is identical. To fix, you have to add a translator for the com.X.Y.Z package. Or avoid using the package in java
|
I get POSTFIX_INCREMENT: not supported and the same for POSTFIX_DECREMENT. Why is this not supported? | The general prefix- and postfix-increments are tricky, because they postpone the actual change of the variable. In python multiple statements would have to be used. However this can also occur in situations where python does not allow such multiple statements. We recommend to manually unroll and use extra statements +=1 and -=1 before or after the statement. The general case may require more advanced refactoring. |
Everything compiles fine but I get ImportError: cannot import name ... from partially initialized module when running in python | Typically this happened if your java code has a cyclic reference: class A refers to B and class B refers to A. Python can not handle such cyclic references. We suggest using the @Defer annotation to help the translator, or to refactor the code.
|
My java for loop seems to be translated to a weird looking while loop | That's correct. The java for loop is quite special, and you can use continue and break statements inside it that require us to do some tricks |
I get UnsolvedSymbolException ...... Method '...' cannot be resolved in context ... with a long stacktrace pointing to the javaparser | The javaparser has problems resolving the method when the provided argument does exactly not match the actual argument, eg when your method call uses an int while the actual method takes a long. It may help to change your call so that the argument type is exactly matching the method type. |
I get UnsolvedSymbolException{context='null', name='We are unable to find the method declaration corresponding to ... | See the previous issue. |
I get an error in python on an overloaded method (@dispatch ), incorrect number of arguments. | @dispatch has a problem with Optional arguments. Try making the arguments @NonNull in java.
|
I get an UnsupportedOperationException inside javaparser LambdaExprContext | Javaparser seems to sometimes have trouble with resolving argument types inside lambda expressions. Try explicitly typing the left side of your lambda expression |
I get mockito.invocation.InvocationError: You tried to stub a method 'len' the object (<class 'mockito.mocking.mock.<locals>.Dummy'>) doesn't have. | Python typing around the built-in types (list, set) is messy. If you stub a java List/Set, it translates to mock(typing.List/typing.Set). And those types do not have ___len___ causing the error.
|