Does this work with smartphone/tablet via an app from Fritzbox?
No, the app on the mobile device communicates directly with the system. VoIP, so to speak. + stream for video.
Do these apps run via an external server or only within the internal LAN?
In the own LAN, if you will. But you can also use your own or an external SIP server.
If this should only work with pure "standard equipment", do I then have to buy a Fritzbox phone?
You don't need a FritzBox.
If you want to do door intercom with "normal" phones (analog, DECT), you need a telephone system to which VoIP clients can register and where you can connect your normal phones. This can be, for example, a FritzBox, but also other devices.
Only with the FritzBox and a Fritz mobile handset can camera images be transmitted to the handset.
If you only want to do door intercom on one or more mobile devices, then you don't need separate hardware. Only a suitable app.