Crash happens sometimes when launching several simulator in parallel

Typically: "How do I... ", "How can I... " questions
Post Reply
Mildred34
Posts: 19
Joined: 12 May 2023, 16:03

Crash happens sometimes when launching several simulator in parallel

Post by Mildred34 »

Working Environment:
  • Ubuntu 22.04
  • CoppeliaSim v4.5.1 (rev4)
  • ZmQRemoteApi
Context
I launch 5 programs simultaneously in headless mode like:

Code: Select all

coppeliaSim.sh -GzmqRemoteApi.rpcPort=port -h -gnamespace model_path
The port are : 23000+2*i (i being the id of the program)
I'm using ROS2 in each of my program, so I gave my node namespace to the simulation with the -g argument.

All of the programs are launched following this type of code below. One thread for the ROS node, one thread to communicate with the simulator API.

Code: Select all

    rclpy.init(args=args)

    simulator_node = simx()
    executor = MultiThreadedExecutor()
    executor.add_node(simulator_node)

    # Start the ROS2 node on a separate thread
    # thread = Thread(target=spin_node,args=(executor,)) # communinication not working if doing like that
    thread = Thread(target=executor.spin)

    # Let the app running on the main thread
    try:
        thread.start()
        simulator_node.get_logger().info("Spinned ROS2 Node . . .")
        simulator_node.exec()

    except SystemExit as e:
        rclpy.logging.get_logger("Quitting").error(
            "Error happened of type: {}\n msg: {} \nEnd of testing grasp!".format(
                type(e), e
            )
        )
    except KeyboardInterrupt:
        rclpy.logging.get_logger("Quitting").info("End of testing grasp!")
    finally:
        simulator_node.get_logger().info("Shutting down ROS2 Node . . .")
        simulator_node.destroy_node()
        executor.shutdown()

    try:
        thread.join()
    except KeyboardInterrupt:
        pass
They're following their routine during ~10 hours. 2 to 3 programs are generally ending, the remaining end before the end. Because, they can't communicate with the simulator anymore.

I created a service that is called every 30sec to check if the server is UP:

Code: Select all

        
        sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
        open = sock.connect_ex((SimCheckerService.ip,request.port))
That's why I know that the simulator crashed.

Have you already seen something like that ? Or am I using the Remote Api in a wrong manner ?

coppelia
Site Admin
Posts: 10375
Joined: 14 Dec 2012, 00:25

Re: Crash happens sometimes when launching several simulator in parallel

Post by coppelia »

Hello,

as long as you are calling ZeroMQ remote API functions from within the same thread (within the same process), you should be fine. But a crash can always happen. It would help to know why or where CoppeliaSim crashed... do you think it is specifically related to the ZeroMQ remote API functionality? If yes, can you provide a simplistic Python script that triggers the crash, when run in parallel so that we can test this?

edit: but if 2-3 programms are ending, this means that the ZeroMQ remote API clients are crashing, not CoppeliaSim... or did I misunderstand something?

Cheers

Mildred34
Posts: 19
Joined: 12 May 2023, 16:03

Re: Crash happens sometimes when launching several simulator in parallel

Post by Mildred34 »

coppelia wrote: 04 Oct 2023, 12:34 Hello,

as long as you are calling ZeroMQ remote API functions from within the same thread (within the same process), you should be fine. But a crash can always happen. It would help to know why or where CoppeliaSim crashed... do you think it is specifically related to the ZeroMQ remote API functionality? If yes, can you provide a simplistic Python script that triggers the crash, when run in parallel so that we can test this?

edit: but if 2-3 programms are ending, this means that the ZeroMQ remote API clients are crashing, not CoppeliaSim... or did I misunderstand something?

Cheers
Hello,

No, I forced my ZeroMQ remote API clients to end when I can't communicate with ZMQ server.
I tried to launch only 2 simulators in parallel,(headless mode disabled) in 2 days of workings there were no crash.

Is the ZMQ API multi-thread safe ? If no, it might be the cause, I'd check when I'll have time to see if I'm not calling it from another thread.

Best regards,

Alex

fferri
Posts: 1230
Joined: 09 Sep 2013, 19:28

Re: Crash happens sometimes when launching several simulator in parallel

Post by fferri »

Mildred34 wrote: 09 Oct 2023, 09:16 Is the ZMQ API multi-thread safe ?
No.

You need to call it always from the same thread (or use multiple clients one for each thread).

Mildred34
Posts: 19
Joined: 12 May 2023, 16:03

Re: Crash happens sometimes when launching several simulator in parallel

Post by Mildred34 »

So, after what we discussed, I checked that the ZMQ_remote API was called within the same thread and same process for each running simulation.
And with the new coppelia version that got uploaded recently, I wanted to see if the crash disappeared but unfortunately not.


But at least, I got some logs with the new one, here what I got:

Code: Select all

[robotiq-assembly-v7_c.ttt-6] terminate called after throwing an instance of 'std::filesystem::filesystem_error'
[robotiq-assembly-v7_c.ttt-6] what(): filesystem error: cannot remove: Directory not empy [../CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/mujoco]
[robotiq-assembly-v7_c.ttt-6] /../CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/coppeliaSim.sh: line 33: 37668 Aborted "$dirname/$appname" "${PARAMETERS[@]}"
Do you know what can produce that kind of error ?

Update Got this error another time slightly different:

Code: Select all

what(): filesystem error: cannot remove all: No such file or directory
[../CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/mujoco/object__80__15.stl]
And another one different:

Code: Select all

 Error: signal 11:
[robotiq-assembly-V7_c.ttt-10]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libcoppeliaSim.so(_Z11_segHandleri+0x30)[0x7f3111f84e60]
[robotiq-assembly-V7_c.ttt-10] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f3114bad520]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libsimMujoco.so(+0x4cca3)[0x7f30ff27cca3]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libsimMujoco.so(+0x4de75)[0x7f30ff27de75]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libcoppeliaSim.so(_ZN16CPluginContainer8dyn_stepEdd+0x37)[0x7f3111dc2507]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libcoppeliaSim.so(_ZN18CDynamicsContainer14handleDynamicsEd+0x10c)[0x7f3111da32ac]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libcoppeliaSim.so(_Z26simHandleDynamics_internald+0x8e)[0x7f3111e6f1be]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/libcoppeliaSim.so(_Z18_simHandleDynamicsPv+0x26e)[0x7f3111ec191e]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/liblua5.3.so.0(+0x11ad6)[0x7f311580dad6]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/liblua5.3.so.0(+0x1bf84)[0x7f3115817f84]
[robotiq-assembly-V7_c.ttt-10] /home/alex/CoppeliaSim_Edu_V4_6_0_rev6_Ubuntu22_04/coppeliaSim.sh: line 33: 47756 Segmentation fault      "$dirname/$appname" "${PARAMETERS[@]}"
Error happens when I'm trying to reset the scene ( stop simulation then restart, reset object from the scene...)
Last edited by Mildred34 on 17 Nov 2023, 17:18, edited 1 time in total.

coppelia
Site Admin
Posts: 10375
Joined: 14 Dec 2012, 00:25

Re: Crash happens sometimes when launching several simulator in parallel

Post by coppelia »

There is only this location that the MuJoCo plugin (not necessarily the MuJoCo engine itself) is using std::filesystem.

Coult it be that you do something in particular with the <coppeliaSimDir>/mujoco folder?

Edit: if you have several instances using the same CoppeliaSim installation, then yes, you might get that error indeed. Right now your best option is to duplicate the installation directory as often as you run parallel instances...

Cheers

Mildred34
Posts: 19
Joined: 12 May 2023, 16:03

Re: Crash happens sometimes when launching several simulator in parallel

Post by Mildred34 »

Ok, I will try to do that, if there is not so much solution right now.

Thanks for your help !

Alexis

Mildred34
Posts: 19
Joined: 12 May 2023, 16:03

Re: Crash happens sometimes when launching several simulator in parallel

Post by Mildred34 »

Just to update the subject.
So doing N install folders solve the problem.

Hope, this bug will be fixed in the next update.

Cheers,

Alexis

coppelia
Site Admin
Posts: 10375
Joined: 14 Dec 2012, 00:25

Re: Crash happens sometimes when launching several simulator in parallel

Post by coppelia »

Hello,

yes, normally with rev. 10 out in a few hours...

Cheers

Post Reply