prkz.de - Blog


Keep it simple

Statically setting Flink JobManager port inside a YARN cluster

written on 12 April 2018
Recently, I tried to start a long running Flink Yarn session (via `yarn-session.sh` script) and automatically deploy a job to it. However, by default, the YARN ApplicationMaster (which is the same process as the JobManager) will be listening on a random port. This is not useful for automatic job deployment. This may also not be useful in scenarios where you are running your YARN cluster behind firewall and don't want to open up a large range of ports. The same holds for docker environments that require to specify published ports. To fix this, you can make the ApplicationMaster listen on a specific port or port range by setting `yarn.application-master.port` property when starting your yarn session: ``` bin/yarn-session.sh -Dyarn.application-master.port=9123 -n 2 -jm 1024 -tm 4096 ``` Now the `jobmanager.rpc.port` will be reported as 9123 in the Flink dashboard and you can access the jobmanager at this port. ### Automatically submitting a Flink job to the running cluster. The problem with the solution above is that in a distributed environment, we still don't know on which of the nodes the JobManager is running. However, we don't need this information as flink stores necessary YARN connection information in a temporary file on the host that you ran `yarn-session.sh` on. By default this file is located in the `/tmp` directory, but you can change this path with the `yarn.properties-file.location` Flink property. In my case, I had the session running in a separate docker container. I can now submit the application through this container: ```java docker exec -ti /flink/bin/flink run /flink/examples/batch/WordCount.jar ``` Alternatively, if you want to run the job client in a separate container as well, you could create a shared volume between those containers and set the `yarn.properties-file.location` property described above.