The torch.Tensor.to() method moves a tensor to a different device from CPU to GPU and/or convert its data type (float32, int64, etc.), ensuring compatibility with the target environment for smooth operations.
It returns a new tensor and does not modify the original tensor.
Syntax
Tensor.to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format)
Parameters
Name | Value |
device | It specifies the target device. It can be “cpu”, “cuda”, or “torch.device”. |
dtype | It specifies the data type. It can be anything from a torch.float16, torch.float32, torch.float64, torch.int64, etc. |
non_blocking | By default, it is False. But if True, it executes data transfer asynchronously from CPU to GPU, or vice versa. |
copy | By default, it is False, but if True, it returns a new tensor copy. |
memory_format | It specifies the memory layout. |
Moving a tensor to the GPU

If the GPU(device=”cuda”) is available in PyTorch, you can move your tensor from CPU to GPU using the Tensor.to() method.
import torch # Create a tensor on CPU tensor = torch.tensor([11.0, 21.0, 19.0]) # Move to GPU (if available) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") tensor_gpu = tensor.to(device) print(tensor_gpu.device) # Output: # For CPU: cpu # For GPU: cuda:0

Changing the Data Type

Let’s initialize a float tensor and then convert it into an integer tensor using the .to(dtype) method.
import torch # Create a float tensor float_tensor = torch.tensor([2.1, 1.9, 21.19]) print(float_tensor.dtype) # Output: torch.float32 # Convert to int64 int_tensor = float_tensor.to(dtype=torch.int64) print(int_tensor.dtype) # Output: torch.int64
Combining device and dtype
We can move the device and change the data type simultaneously without getting any error.
import torch # Create a tensor tensor = torch.tensor([1, 2, 3], dtype=torch.int32) # Move to GPU and convert to float32 tensor_float_gpu = tensor.to(device="cuda", dtype=torch.float32) print(tensor_float_gpu) # Output: tensor([1., 2., 3.], device='cuda:0') print(tensor_float_gpu.dtype) # Output: torch.float32
This approach is helpful when you are preparing for GPU-based tensors.
Asynchronous Transfer with non_blocking
To get high performance, you need to set the non_blocking argument to True and enable asynchronous GPU transfer.
import torch tensor = torch.tensor([1.0, 2.0], device="cpu") # Move to GPU asynchronously tensor_async = tensor.to(device="cuda", non_blocking=True) print(tensor_async.device) # Output: cuda:0It improves the performance in the pipeline (while transferring the tensors).
Forcing a Copy with copy=True
If you want to create a new tensor without modifying the original tensor, you need to pass the copy=True argument.
import torch tensor = torch.tensor([1.0, 2.0], device="cpu") # Force a copy on the same device tensor_copy = tensor.to(device="cpu", copy=True) print(tensor_copy is tensor) # Output: False
Moving a model
We can move the model and its tensors to the GPU to ensure the input compatibility, a typical pattern in training loops.
import torch from torch import nn # Define a simple model model = nn.Linear(3, 2) tensor = torch.tensor([[1.0, 2.0, 3.0]]) # Move model and tensor to GPU device = torch.device("cuda") model.to(device) tensor = tensor.to(device) # Perform forward pass output = model(tensor) print(output.device) # Output: cuda:0
Align all the models and their parameters on the same device. If the device alignment mismatches, you will get a RuntimeError.