Frequently Asked Questions

Find answers to the most common questions about our products.

General Questions

What is the Creative Reality™ Studio?

D-ID’s Creative Reality™ Studio is a self-service platform featuring the best generative AI tools to enable users to create videos with moving and talking avatars. Combining the powers of D-ID’s deep-learning face animation technology with LLM text generation, and text-to-image capabilities, the Creative Reality™ Studio is an all-in-one platform for those seeking to create cutting-edge videos with the power of artificial intelligence.
The Creative Reality™ Studio is available on desktop and mobile.
Who is the Creative Reality™ Studio for?

The Creative Reality™ Studio was developed for businesses and individual content creators who want to use avatars to create AI videos featuring digital humans for a wide range of commercial and creative purposes.

Video

What video format and resolution do you support?
All videos are generated in MP4 format
- Output video resolution depends on the AI Presenter you are using
  - Standard AI Presenter output resolution is up to 1280×1280 pixels on all plans
  - Premium AI Presenter (marked with an HQ badge) output resolution
    
    Lite plan – Premium presenters not supported
    
    Trial, Pro, Advanced and Enterprise plans – 1080p
What is the output video length?

When using D-ID Creative Reality Studio or D-ID API, the video length is limited to 5 min.
What are the image upload size & format requirements?
- When using D-ID Creative Reality Studio or D-ID API, the image size is limited to 10 MB.
- Supported formats – JPEG, JPG, PNG

Selecting the face to animate

How do I select the face to be animated?
There are three ways to animate faces on the Creative Reality™ Studio
1. Select from one of the existing pre-made avatars
2. Upload a facial image
3. Use our Stable Diffusion-powered text-to-image portrait generator – Image prompting is a mix of art and science. Our image-generating software is optimized to produce faces that can be animated in the studio, but there is a lot of room for creativity. To get started, we suggest you select one of the pre-created prompts and try out variations of those. Alternatively, try searching for prompts and inspiration on Lexica or numerous prompt-building platforms available online.
How do I make sure I get the right result when I generate a face?

Image prompting is a mix of art and science. Our image-generating software is optimized to produce faces that can be animated in the studio, but there is a lot of room for creativity. To get started, we suggest you select one of the pre-created prompts and try out variations of those. Alternatively, try searching for prompts and inspiration on one of numerous prompt-building platforms available online.

Getting the avatars to talk

How do I determine what the avatar will say?
There are three ways to add voice to your video
1. Type the script you want the avatar to speak in the designated text box. Alternatively, you can use the text generator to write an automated script
2. Upload a voice recording
3. Clone your voice. This service is only offered to Enterprise-level customers
What are Layers and how do they work?

Our studio lets you add various visual elements, such as backgrounds, videos, and texts to your designs. Each element is set as its own layer and you can determine the order by using the “position” button on the top right side of the screen.

The Canvas Layout function lets you choose a layout that best fits your design, letting you create videos for mobile, social media, presentations, or more. The canvas can be set to wide, square, or vertical. in order to shift from one canvas to another, simply click on the canvas and select between the available options in the top left corner of the design window.

Positioning determines where on the canvas you want the presenter to be.

Transparency lets you choose how opaque you want each element to be.
* Layers are not available for mobile app users.
What are Expressions?

You can determine whether your avatar will look happy, serious, surprised, or maintain a neutral expression. Click on the video field in the design window and select which emotion you want your presenter to convey. The chosen facial expression will be implemented for the duration of the video.
* Expressions are not available for mobile app users.
How do I switch back to the legacy version of the studio?

Go to the video library and click on your user profile at the bottom left of the screen. Then click on “Switch back to classic editor.”
Do you offer a voice cloning service?

Pro and API Launch plans receive 1 cloned voice; Advanced and API Scale plans receive 3 voices. The number of voices in Enterprise plans is customizable.

Users can now upload an audio file or record directly from the Studio to create an Instant Cloned Voice. They can also delete a created cloned voice. Users must record a voice consent as part of the audio file they submit (both in the Studio and API).
What audio formats & lengths are supported?

When using the Creative Reality Studio or the D-ID API, audio size is limited to 10MB and up to 5 minutes.

Supported audio formats – MP3, FLAC, M4A, MP4, WAV
Which languages does the Creative Reality™ Studio support?

The studio currently supports 119 Languages, along with a variety of accents & speaking styles
Can I add pauses to the text?

You can add breaks in your script by clicking on the stopwatch icon on the bottom of the text box. Each break is 0.5 seconds long.

Watermark

Why do all my videos have a watermark?

It is important for us, as a company that enables users to create AI-based content, that there is transparency about the synthetic nature of the videos they generate. This is also reflected in our ethical manifesto, available at https://www.d-id.com/ethics , and applicable terms of use.
What does the watermark look like?
Depends on your plan:
- Trial and Lite plan get a D-ID logo watermark. Note, a full-screen watermark appears for trial users.
- Pro and Advanced plan users get a generic AI watermark
- Enterprise users can customize the AI watermark but not remove it.

Image, text and audio

How do I choose the best image for an optimized video output?

Please follow our image guidelines:
– Facing camera, medium shot
– Neutral expression, closed mouth
– Minimum head size 200×200 pixels
– Good and consistent lighting
– Up to 10MB
– No face occlusions (hats, sunglasses, masks, visors, large earrings)
Why was my image rejected?

There are two possible reasons:

A. The image you are trying to use failed to pass our built-in moderation process. Moderation is carried out by a 3rd party tool and bypassing it is only allowed for Advanced and Enterprise customers, provided they use their own moderation solution.

Advanced plan users have the option to request a manual review.

B. Our system did not detect a face in the provided image. This may happen when trying to animate animals, cartoons, anime figures.
Why was my audio/text rejected?

This probably happened because our built-in moderation detected a violation and has therefore blocked the video from being generated. To overcome this, please remove the problematic content and try again.

Payment and Credits

What are credits?

Each credit is worth up to 15 seconds of video. When generating longer videos, credits add up according to the length of the generated video. For example, a 40-second video consumes 3 credits.

For streaming customers using our API, the price of credits is halved.

Visit https://www.d-id.com/pricing/api/ for more details.
I have a subscription but I used up all my credits, how can I get more?

On each plan (Lite, Pro, Advanced) you have 3 packages to choose from, with different amounts of credits, so that if you finish your credits before the end of the month you can choose the bigger package that will allow more credits.
* Bigger packages are not available for mobile app users.
Do unused credits carry over to the next month?

Credits do not accumulate, they are renewed every month and unused credits become void.

Billing

How do I change my credit card details?
1. Click the menu on the bottom left and click “Billing”
2. Click the pen icon and change the credit card details.
How do I switch between the different plans?

If you wish to upgrade your plan, you can do that via the Pricing page.
I think I need the Enterprise plan, where can I get more information?

For details regarding the Enterprise plan please contact our sales team.
How do I cancel my subscription?

You can cancel your monthly subscription any time on the “Account & API” page. To access the page, click the menu on the bottom left and press “Account & API” > Plan and billing > Cancel Plan

Mobile users can unsubscribe through their store settings.
What happens to my data and credits if I unsubscribe?

If you unsubscribe, your videos will still be accessible when you login to the studio. Your remaining credits will remain valid until the end of the current billing period.
How do I delete my account?

To delete your account, please contact our support team at support@d-id.com.

Mobile studio users can delete their accounts on the “account settings” page.

Using D-ID‘s API

How can I get an API key?

Please go to the Account page in the studio, and generate your API key. Note that it is mandatory to have valid credits in your account to use the API.
How do you calculate my credits when I use the API?

Credits used for the API are taken from the same balance as the studio.
Where can I find the API documentation?

API documentation is available at the Developer Hub.
Can I stream the generated video in real-time, similar to Chat D-ID?

We have an API tailored for this purpose – API is available here.

For your reference, we also have a code sample that can be used as a baseline for implementing such a solution – can be viewed here.

D-ID Agents

What are Agents?

Agents are autonomous AI assistants that can answer questions based on the knowledge uploaded by their owner, and perform a specific role or task that’s helpful for business or individual use cases.
Who can create Agents?

Anyone can create an agent, without any knowledge of coding. Creating an agent is as easy as selecting a role, giving the agent instructions and uploading knowledge documents. Users need to sign up or be logged into their D-ID Studio account to create an agent.
What are some common roles for AI agents?

Agents are excellent for roles in marketing, customer engagement or education, and training. Agents can simulate real people and fictional characters, or they can be virtual influencers that represent famous brands or individuals.
What are some examples of agents?

Agents can help companies boost sales, answer their customers’ questions or chat with their followers. Each agent is an expert in a different area, with access to a specific knowledge base. You can talk with an agent to find out exactly who they are and what their role is.
How can I talk with Agents?

You can talk with Agents by typing in your question in the text input box, or by clicking the microphone icon and talking with the Agent just like you would talk with another person (available on Chrome/Safari browsers or most mobile devices).
Can I speak with Agents in any language?

Yes, agents support many major languages such as Hindi, Spanish, French, German, Portuguese etc. Just start talking with an agent in another language, and it will reply back in that language, if it has a multilingual voice enabled.
What voices can I use when I create an Agent?

You can use standard voices, as well as high quality (Pro) voices from ElevenLabs, which are identified by the Pro icon in the Voices selection menu. You can also select a number of native voices for other languages, as well as multilingual voices that can speak several languages. You can also clone your own voice by uploading an audio recording.
Can I share my agent with other people?

Certainly, you can have many other people talk with your agent. You can either share a link to your agent, hosted by D-ID, or you can embed an agent on your own website. Keep in mind that when you share an agent with other users, their conversations with your agent will be charged against your account.
How do Agents work?

Agents use natural language processing and generative AI to understand your text or voice input and then provide relevant responses. They use RAG technology to retrieve accurate answers to queries from a knowledge base of uploaded documents.
Why do I need to provide knowledge documents for my Agent?

The documents that you upload will provide a knowledge base for your Agent to draw from that is not available to the LLM used by the agent. For example, your documents may have proprietary or non-public information.
What types of knowledge documents can I upload to my Agent?

Your documents can be PDF or TXT or PPTX (Powerpoint) files that add to the expertise of your Agent. Website URLs are also supported, so you can upload the text content from a web page. For optimal results, you should upload documents that contain paragraphs of text, in the style of an article or FAQ document.
What are the limits for uploading knowledge documents?

You can upload up to 5 documents, and each document can have a maximum of 500,000 text characters.
Are my source documents private when I upload them?

Your documents can only be accessed by you and your agents. If you share your agent with other users, then they can also learn about the content of your documents by talking with the agent. For more detailed information, please read our privacy policy.
Can I update my agent with new knowledge?

Yes, you can edit the agent details and text settings and update the knowledge base of your agent.
Are there any limits on chatting with agents?

D-ID is offering everyone 200 free conversation sessions that you can use to get started. After that, the number of conversation sessions depends on the price plan you have selected.
How is a conversation session with an agent measured?

Users can receive up to 5 messages from an agent in each conversation session. The sixth message onwards counts as a new session in your price plan.
How much does it cost to use Agents?

You can start on a free trial plan to try out Agents, and then select a price plan that suits you from the D–ID pricing page.
Is there an API that I can use to create Agents?

Yes, an API is available to everyone who has a D-ID Studio account, and the corresponding price plans are on the D–ID pricing page.

D-ID Video Translate

What is D-ID Video Translate?

D-ID Video Translate allows effortless production of multilingual content, without the hassle and expense of traditional video production. It allows users to upload a video in one language and receive a translated version in multiple other languages, including accurate lip-syncing to match and voice cloning.
How does D-ID Video Translate work?

After uploading a video, our AI-powered system processes the content, translates the spoken words into the selected language, and then re-synchronizes the lip movements to match the new audio generated with the original voice, ensuring the video looks as natural as possible in the new language.
Read more about Video Translate here
What are the best practices to get a great translation?

For optimal results, the video should feature only one person in the frame, with the speaker facing forward and their face clearly visible at all times. To maintain clear audio, it’s important to minimize background noise and music.
How long does it take to translate a video?

Translation times vary depending on the length of the video. Typically, the process takes anywhere from a few minutes to an hour for longer videos. An email will be sent once your translation is ready.
Can I edit the translated video after it’s processed?

Once the video is translated, you can download it and make further edits using your preferred video editing software.
Is there a limit to the length or size of the video I can upload?

Yes. The video should be between 10 seconds and 5 minutes in length, and the file size should not exceed 2GB.
How do you calculate credit usage for Video Translation?

For trial users, credits used for Video Translation are taken from the same balance as the Studio. Each credit is worth up to 15 seconds of translated video, same as Studio credits. For example, translating a 15 sec video to two languages will cost 2 credits.
Read more about the pricing here.

Security Measures

What security and data protection methods do you implement?

All data communications into and from our services are SSL encrypted (TLS 1.3 Protocol). Data at rest is encrypted in a Transparent Data Encryption S3 Storage. Servers and storage are protected by a firewall and WAF. Transient and temporary information is additionally erased automatically after 24 hours or by the customer using the API delete endpoint. All our workstations are encrypted and further protected by Anti Virus with Endpoint Detection & Response (EDR).
ISO certificates: 27018:2019, 27017:2015, 27001:2013
How do you handle data and security breaches?

In case of a data or security breach, we may, at our discretion and as required by the applicable law and regulations, update a user about the relevant details of such an event.
Do you maintain backups?

Yes. The servers and relevant data are mirrored and / or backed-up in real time as part of the AWS platform service.
How do you manage permissions?

Access Permissions are handled by D-ID. There is a complete separation between our development and testing environments and the production environment. Only certified personnel may access the production environment. Further, based on their credentials, users may only access their own data. Credentials are revoked on a regular basis when/if needed.

What is the Creative Reality™ Studio?

Who is the Creative Reality™ Studio for?

What video format and resolution do you support?

What is the output video length?

What are the image upload size & format requirements?

How do I select the face to be animated?

How do I make sure I get the right result when I generate a face?

How do I determine what the avatar will say?

What are Layers and how do they work?

What are Expressions?

How do I switch back to the legacy version of the studio?

Do you offer a voice cloning service?

What audio formats & lengths are supported?

Which languages does the Creative Reality™ Studio support?

Can I add pauses to the text?

Why do all my videos have a watermark?

What does the watermark look like?

How do I choose the best image for an optimized video output?

Why was my image rejected?

Why was my audio/text rejected?

What are credits?

I have a subscription but I used up all my credits, how can I get more?

Do unused credits carry over to the next month?

How do I change my credit card details?

How do I switch between the different plans?

I think I need the Enterprise plan, where can I get more information?

How do I cancel my subscription?

What happens to my data and credits if I unsubscribe?

How do I delete my account?

How can I get an API key?

How do you calculate my credits when I use the API?

Where can I find the API documentation?

Can I stream the generated video in real-time, similar to Chat D-ID?

What are Agents?

Who can create Agents?

What are some common roles for AI agents?

What are some examples of agents?

How can I talk with Agents?

Can I speak with Agents in any language?

What voices can I use when I create an Agent?

Can I share my agent with other people?

How do Agents work?

Why do I need to provide knowledge documents for my Agent?

What types of knowledge documents can I upload to my Agent?

What are the limits for uploading knowledge documents?

Are my source documents private when I upload them?

Can I update my agent with new knowledge?

Are there any limits on chatting with agents?

How is a conversation session with an agent measured?

How much does it cost to use Agents?

Is there an API that I can use to create Agents?

What is D-ID Video Translate?

How does D-ID Video Translate work?

What are the best practices to get a great translation?

How long does it take to translate a video?

Can I edit the translated video after it’s processed?

Is there a limit to the length or size of the video I can upload?

How do you calculate credit usage for Video Translation?

What security and data protection methods do you implement?

How do you handle data and security breaches?

Do you maintain backups?

How do you manage permissions?